Introduction To Matrix Theory
Introduction To Matrix Theory
Introduction to
Matrix Theory
Introduction to Matrix Theory
Arindama Singh
Introduction to Matrix
Theory
Arindama Singh
Department of Mathematics
Indian Institute of Technology Madras
Chennai, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publishers, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Practising scientists and engineers feel that calculus and matrix theory form the
minimum mathematical requirement for their future work. Though it is recommended
to spread matrix theory or linear algebra over two semesters in an early stage, the
typical engineering curriculum allocates only one semester for it. In addition, I found
that science and engineering students are at a loss in appreciating the abstract methods
of linear algebra in the first year of their undergraduate programme. This resulted
in a curriculum that includes a thorough study of system of linear equations via
Gaussian and/or Gauss–Jordan elimination comprising roughly one month in the
first or second semester. It needs a follow-up of one-semester work in matrix theory
ending in canonical forms, factorizations of matrices, and matrix norms.
Initially, we followed the books such as Leon [10], Lewis [11], and Strang [14]
as possible texts, referring occasionally to papers and other books. None of these
could be used as a textbook on its own for our purpose. The requirement was a
single text containing development of notions, one leading to the next, and without
any distraction towards applications. It resulted in creation of our own material. The
students wished to see the material in a book form so that they might keep it on their
lap instead of reading it off the laptop screens. Of course, I had to put some extra
effort in bringing it to this form; the effort is not much compared to the enjoyment
in learning.
The approach is straightforward. Starting from the simple but intricate problems
that a system of linear equations presents, it introduces matrices and operations
on them. The elementary row operations comprise the basic tools in working with
most of the concepts. Though the vector space terminology is not required to study
matrices, an exposure to the notions is certainly helpful for an engineer’s future
research. Keeping this in view, the vector space terminology is introduced in a
restricted environment of subspaces of finite-dimensional real or complex spaces.
It is felt that this direct approach will meet the needs of scientists and engineers.
Also, it will form a basis for abstract function spaces, which one may study or use
later.
Starting from simple operations on matrices, this elementary treatment of matrix
theory characterizes equivalence and similarity of matrices. The other tool of Gram–
Schmidt orthogonalization has been discussed leading to best approximations and
v
vi Preface
least squares solution of linear systems. On the go, we discuss matrix factorizations
such as rank factorization, QR-factorization, Schur triangularization, diagonaliza-
tion, Jordan form, singular value decomposition, and polar decomposition. It includes
norms on matrices as a means to deal with iterative solutions of linear systems and
exponential of a matrix. Keeping the modest goal of an introductory textbook on
matrix theory, which may be covered in a semester, these topics are dealt with in a
lively manner.
Though the earlier drafts were intended for use by science and engineering
students, many mathematics students used those as supplementary text for learning
linear algebra. This book will certainly fulfil that need.
Each section of the book has exercises to reinforce the concepts; problems have
been added at the end of each chapter for the curious student. Most of these problems
are theoretical in nature, and they do not fit into the running text linearly. Exercises
and problems form an integral part of the book. Working them out may require some
help from the teacher. It is hoped that the teachers and the students of matrix theory
will enjoy the text the same way I and my students did.
Most engineering colleges in India allocate only one semester for linear algebra
or matrix theory. In such a case, the first two chapters of the book can be covered
in a rapid pace with proper attention to elementary row operations. If time does not
permit, the last chapter on matrix norms may be omitted or covered in numerical
analysis under the veil of iterative solutions of linear systems.
I acknowledge the pains taken by my students in pointing out typographical errors.
Their difficulties in grasping the notions have contributed a lot towards the contents
and this particular sequencing of topics. I cheerfully thank my colleagues A. V.
Jayanthan and R. Balaji for using the earlier drafts for teaching linear algebra to
undergraduate engineering and science students at IIT Madras. They pointed out
many improvements, which I cannot pinpoint now. Though the idea of completing
this work originated five years back, time did not permit it. IIT Madras granted me
sabbatical to write the second edition of may earlier book on Logics for Computer
Science. After sending a draft of that to the publisher, I could devote the stop-gap for
completing this work. I hereby record my thanks to the administrative authorities of
IIT Madras.
It will be foolish on my part to claim perfection. If you are using the book, then
you should be able to point out improvements. I welcome you to write to me at
[email protected].
1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Examples of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Transpose and Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Row Reduced Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7 Computing Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Determining Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Solvability of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Gauss–Jordan Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Matrix as a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Subspace and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Coordinate Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Coordinate Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 Change of Basis Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.7 Equivalence and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Gram–Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 QR-Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
vii
viii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
About the Author
ix
Chapter 1
Matrix Operations
x1 + x2 = 3
x1 − x2 = 1
Subtracting the first from the second, we get −2x2 = −2. It implies x2 = 1. That
is, the original system is replaced with the following:
x1 + x2 = 3
x2 = 1
x1 + x2 = 3
x1 − x2 = 1
2x1 − x2 = 3
The first two equations have a unique solution, and that satisfies the third. Hence,
this system also has a unique solution x1 = 2, x2 = 1. Geometrically, the third equa-
tion represents the straight line that passes through (0, −3) and has slope 2. The
intersection of all the three lines is the same point (2, 1). So, the extra equation does
not put any constraint on the solutions that we obtained earlier.
But what about our systematic solution method? We aim at eliminating the first
unknown from all but the first equation. We replace the second equation with the one
obtained by second minus the first. We also replace the third by third minus twice
the first. It results in
x1 + x2 = 3
−x2 = −1
−3x2 = 3
Notice that the second and the third equations coincide, hence the conclusion. We
give another twist. Consider the system
x1 + x2 = 3
x1 − x2 = 1
2x1 + x2 = 3
The first two equations again have the solution x1 = 2, x2 = 1. But this time, the
third is not satisfied by these values of the unknowns. So, the system has no solution.
Geometrically, the first two lines have a point of intersection (2, 1); the second
and the third have the intersection point as (4/3, 1/3); and the third and the first have
the intersection point as (0, 3). They form a triangle. There is no point common to
all the three lines. Also, by using our elimination method, we obtain the equations
as:
x1 + x2 = 3
−x2 = −1
−x2 = −3
The last two equations are not consistent. So, the original system has no solution.
Finally, instead of adding another equation, we drop one. Consider the linear
equation
x1 + x2 = 3
The old solution x1 = 2, x2 = 1 is still a solution of this system. But there are other
solutions. For instance, x1 = 1, x2 = 2 is a solution. Moreover, since x1 = 3 − x2 ,
by assigning x2 any real number, we get a corresponding value for x1 , which together
give a solution. Thus, it has infinitely many solutions.
1.1 Examples of Linear Equations 3
Geometrically, any point on the straight line represented by the equation is a solu-
tion of the system. Notice that the same conclusion holds if we have more equations,
which are multiples of the only given equation. For example,
x1 + x2 = 3
2x1 + 2x2 = 6
3x1 + 3x2 = 9
We see that the number of equations really does not matter, but the number of
independent equations does matter. Of course, the notion of independent equations
is not yet precise; we have some working ideas only.
It is not also very clear when does a system of equations have a solution, a unique
solution, infinitely many solutions, or even no solutions. And why not a system
of equations has more than one but finitely many solutions? How do we use our
elimination method for obtaining infinite number of solutions?
To answer these questions, we will introduce matrices. Matrices will help us in
representing the problem in a compact way and will lead to a definitive answer.
We will also study the eigenvalue problem for matrices which come up often in
applications. These concerns will allow us to represent matrices in elegant forms.
Exercises for Sect. 1.1
1. For each of the following system of linear equations, find the number of solutions
geometrically:
(a) x1 + 2x2 = 4, −2x1 − 4x2 = 4
(b) −x1 + 2x2 = 3, 2x1 − 4x2 = −6
(c) x1 + 2x2 = 1, x1 − 2x2 = 1, −x1 + 6x2 = 3
2. Show that the system of linear equations a1 x1 + x2 = b1 , a2 x1 + x2 = b2 has a
unique solution if a1 = a2 . Is the converse true?
x1 + x2 = 3 x1 + x2 = 3 x1 = 2
x1 − x2 = 1 ⇒ x2 = 1 ⇒ x2 = 1
We can minimize writing by ignoring the unknowns and transform only the num-
bers in the following way:
4 1 Matrix Operations
1 1 3 1 1 3 1 0 2
1 −1 1 ⇒ 0 1 1 ⇒ 0 1 1
To be able to operate with such array of numbers and talk about them, we require
some terminology. First, some notation:
A = [ai j ], ai j ∈ F for i = 1, . . . , m, j = 1, . . . , n.
Thus, the scalar ai j is the (i, j)th entry of the matrix [ai j ]. Here, i is called the row
index and j is called the column index of the entry ai j .
The set of all m × n matrices with entries from F will be denoted by Fm×n .
A row vector of size n is a matrix in F1×n . Similarly, a column vector of size
n is a matrix in Fn×1 . The vectors in F1×n (row vectors) will be written as (with or
without commas)
[a1 , . . . , an ] or as [a1 · · · an ]
bn
for scalars b1 , . . . , bn . The second way of writing is the transpose notation; it saves
vertical space. Also, if a column vector v is equal to u t for a row vector u, then we
1.2 Basic Matrix Operations 5
(a1 , . . . , an ).
When Fn is F1×n , you should read (a1 , . . . , an ) as [a1 , . . . , an ], a row vector, and
when Fn is Fn×1 , you should read (a1 , . . . , an ) as [a1 , . . . , an ]t , a column vector.
The ith row of a matrix A = [ai j ] ∈ Fm×n is the row vector
[ai1 , . . . ain ].
We also say that the row index of this row is i. Similarly, the jth column of A is
the column vector
[a1 j , . . . am j ]t .
A = B iff ai j = bi j for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Here, 1 is the first diagonal entry, 3 is the second diagonal entry, and 5 is the third
and the last diagonal entry.
The super-diagonal of a matrix consists of entries above the diagonal. That is, the
entries ai,i+1 comprise the super-diagonal of an n × n matrix A = [ai j ]. Of course,
i varies from 1 to n − 1 here. In the following matrix, the super-diagonal is shown
6 1 Matrix Operations
in bold: ⎡ ⎤
1 2 3
⎣2 3 4⎦ .
3 4 0
diag(d1 , . . . , dn ).
The following is a diagonal matrix. We follow the convention of not showing the
non-diagonal entries in a diagonal matrix, which are 0.
⎡ ⎤ ⎤ ⎡
1 1 0 0
diag(1, 3, 0) = ⎣ 3 ⎦ = ⎣0 3 0⎦ .
0 0 0 0
The identity matrix is a diagonal matrix with each diagonal entry as 1. We write
an identity matrix of order m as Im . Sometimes, we omit the subscript m if it is
understood from the context.
I = Im = diag(1, . . . , 1).
We write ei for a column vector whose ith component is 1 and all other compo-
nents 0. The jth component of ei is δi j . Here,
1 if i = j
δi j =
0 if i = j
A scalar matrix is a diagonal matrix with equal diagonal entries. For instance,
the following is a scalar matrix: ⎡ ⎤
3
⎢ 3 ⎥
⎢ ⎥.
⎣ 3 ⎦
3
ci j = ai j + bi j for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Thus, we informally say that matrices are added entry-wise. Matrices of different
sizes can never be added. It is easy to see that
A + B = B + A, A+0=0+ A = A
For A = [ai j ], the matrix −A ∈ Fm×n is taken as one whose (i, j)th entry is −ai j .
Thus,
−A = (−1)A, (−A) + A = A + (−A) = 0.
Mark the sizes of A and B. The matrix product AB is defined only when the number
of columns in A is equal to the number of rows in B. The result AB has number of
rows as that of A and the number of columns as that of B.
A particular case might be helpful. Suppose u is a row vector in F1×n and v is a
column vector in Fn×1 . Then, their product uv ∈ F1×1 . It is a 1 × 1 matrix. Often,
we identify such matrices with scalars. The product now looks like:
⎡ ⎤
b1
⎢ .. ⎥
a1 · · · an ⎣ . ⎦ = [a1 b1 + · · · + an bn ].
bn
The ith row of A multiplied with the jth column of B gives the (i, j)th entry in AB.
Thus to get AB, you have to multiply all m rows of A with all r columns of B, taking
one from each in turn. For example,
⎡ ⎤⎡ ⎤ ⎡ ⎤
3 5 −1 2 −2 3 1 22 −2 43 42
⎣ 4 0 2⎦ ⎣5 0 7 8⎦ = ⎣ 26 −16 14 6 ⎦ .
−6 −3 2 9 −4 1 1 −9 4 −37 −28
1 2 0 1 4 7 0 1 1 2 2 3
= but = .
2 3 2 3 6 11 2 3 2 3 8 13
Here, e j is the standard jth basis vector, the jth column of the identity matrix of
order n; its jth component is 1, and all other components are 0. The above identity
can also be seen by directly multiplying A with e j , as in the following:
10 1 Matrix Operations
⎡ ⎤⎡ ⎤ ⎡ ⎤
a11 · · · a1 j · · · a1n 0 a1 j
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢.⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
Ae j = ⎢ a
⎢ i1 · · · a ij · · · a ⎥⎢ ⎥ ⎢ ⎥
in ⎥ ⎢1⎥ = ⎢ ai j ⎥ = jth column of A.
⎢ .. ⎥ ⎢.⎥ ⎢ . ⎥
⎣ . ⎦ ⎣ .. ⎦ ⎣ .. ⎦
am1 · · · am j · · · amn 0 am j
Unlike numbers, product of two nonzero matrices can be a zero matrix. For
instance,
1 0 0 0 0 0
= .
0 0 0 1 0 0
Let A ∈ Fm×n . We write its ith row as Ai and its kth column as Ak .
We can now write A as a row of columns and also as a column of rows in the
following manner:
⎡ ⎤
A1
⎢ ⎥
A = [aik ] = A1 · · · An = ⎣ ... ⎦ .
Am
Then, their product AB can now be written in block form as (ignoring extra brackets):
⎡ ⎤
A1 B
⎢ ⎥
AB = AB1 · · · ABr = ⎣ ... ⎦ .
Am B
C = C I = C(AB) = (C A)B = I B = B.
A A−1 = I = A−1 A.
We talk of invertibility of square matrices only; all square matrices are not invert-
ible. For example, I is invertible but 0 is not. If AB = 0 for nonzero square matrices
A and B, then neither A nor B is invertible. Why?
If both A, B ∈ Fn×n are invertible, then (AB)−1 = B −1 A−1 . Reason:
Invertible matrices play a crucial role in solving linear systems uniquely. We will
come back to the issue later.
Exercises for Sect. 1.2
1. Compute AB, C A, DC, DC AB, A2 , D⎡2 and A3⎤
B 2 , where⎡ ⎤
−1 2 3 2 1
2 3 4 −1
A= , B= , C = ⎣ 2 −1⎦ , D = ⎣4 −6 0 ⎦.
1 2 4 0
1 3 1 −2 − 2
12 1 Matrix Operations
2. Let A = [ai j ] ∈ F2×2 with a11 = 0. Let b = a21 /a11 . Show that there exists c ∈ F
1 0 a11 a12
such that A = . What could be c?
b1 0 c
3. Let A ∈ Fm×n , and let B ∈ Fn×k . For 1 ≤ i ≤ m, 1 ≤ j ≤ k, show that
(a) (AB)i = Ai B (b) (AB) j = A B j
4. Construct two 3 × 3 matrices A and B such that AB = 0 but B A = 0.
5. Can you construct invertible 2 × 2 matrices A and B such that AB = 0?
6. Let A, B ∈ Fn×n be such that AB = 0. Then which of the following is/are true,
and why?
(a) At least one of A or B is the zero matrix.
(b) At least one of A or B is invertible.
(c) At least one of A or B is non-invertible.
(d) If A = 0 and B = 0, then neither is invertible.
7. Prove all properties of multiplication of matrices mentioned in the text.
8. Let A be the 4 × 4 matrix whose super-diagonal entries are all 1, and all other
entries 0. Show that An = 0 for n ≥ 4.
9. Let A be the 4 × 4 matrix with each diagonal entry as 21 , and each non-diagonal
entry as − 21 . Compute An for n ∈ N.
Am Atn
On the other side, the ( j, i)th entry in B t At is obtained by multiplying the jth
row of B t with the ith column of At . This is same as multiplying the entries in the
jth column of B with the corresponding entries in the ith row of A and then taking
the sum. Thus, it is
b j1 ai1 + · · · + b jn ain .
Recall that while solving linear equations in two or three variables, we try to eliminate
a variable from all but one equation by adding an equation to the other, or even adding
a constant times one equation to another. We do similar operations on the rows of a
1.4 Elementary Row Operations 15
E
A −→ B
to mean that the matrix B has been obtained from A by an elementary row operation
E, that is, when B = E A.
16 1 Matrix Operations
Often, we will apply elementary row operations in a sequence. In this way, the
above operations could be shown in one step as E −3 [3, 1], E −2 [2, 1]. However,
remember that the result of application of this sequence of elementary row operations
on a matrix A is E −2 [2, 1] E −3 [3, 1] A; the products are in reverse order.
Observe that each elementary matrix is invertible. In fact, E[i, j] is its own inverse,
E 1/α [i] is the inverse of E α [i], and E −α [i, j] is the inverse of E α [i, j]. Therefore, any
product of elementary matrices is invertible. It follows that if B has been obtained
from A by applying a sequence of elementary row operations, then A can also be
obtained from B by a sequence of elementary row operations.
Exercises for Sect. 1.4
1. Show the following:
(a) E i j E jm = E im ; if j = k, then E i j E km = 0.
n n
(b) Each A = [ai j ] ∈ Fn×n can be written as A = i=1 j=1 ai j E i j .
2. Compute E[2, 3]A, E i [2]A, E −1/2 [1, 3]A⎡and E i [1, 2]A, where⎤
⎡ ⎤ 1 −2 + i 3 − i
−1 2 3 1 ⎢ i −1 − i 2i ⎥
(a) A = ⎣ 2 −1 0 3 ⎦ (b) A = ⎢ ⎣ 1 + 3i −i
⎥
−3 ⎦
0 −1 −3 1
−2 0 −i
3. Take an invertible 2 × 2 matrix. Bring it to the identity matrix of order 2 by
applying elementary row operations.
4. Take a non-invertible 2 × 2 matrix. Try to bring it to the identity matrix of order
2 by applying elementary row operations.
5. Argue in general terms why the following in Observation 1.1 are true:
(a) E[i, j] A is obtained from A by exchanging its ith and jth rows.
(b) E α [i] A is obtained from A by multiplying its ith row with α.
(c) E α [i, j] A is obtained from A by adding to its ith row α times the jth row.
6. How can the elementary matrices be obtained from the identity matrix?
7. Let α be a nonzero scalar. Show the following:
(a) (E[i, j])t = E[i, j], (E α [i])t = E α [i], (E α [i, j])t = E[ j, i].
(b) (E[i, j])−1 = E[i, j], (E α [i])−1 = E 1/α [i], (E α [i, j])−1 = E −α [i, j].
8. For each of the following pairs of matrices, find an elementary matrix E such that
B = E A.
⎡ ⎤ ⎡ ⎤
2 1 3 2 1 3
(a) A = ⎣ 3 1 4 ⎦ , B = ⎣ −2 4 5 ⎦
−2 4 5 1 5 9
1.4 Elementary Row Operations 17
⎡ ⎤ ⎡ ⎤
4 −2 3 3 −2 1
(b) A = ⎣ 1 0 2 ⎦ , B = ⎣1 0 2⎦
0 3 5 0 3 5
9. For each of the following pairs of matrices, find an elementary matrix E such that
B = AE. [Hint: The requirement is B t = E t At .]
⎡ ⎤ ⎡ ⎤
3 1 4 4 5 3
(a) A = ⎣ 4 1 2 ⎦ , B = ⎣ 2 3 4 ⎦
2 3 1 1 4 2
⎡ ⎤ ⎡ ⎤
2 −2 3 4 −2 1
(b) A = ⎣ −1 4 2 ⎦ , B = ⎣ −2 4 −4 ⎦
3 1 −2 6 1 8
113
−1 1 1
where the third column is the right-hand side of the equality sign in our equations.
Our method of solution suggests that we proceed as follows:
In the final result, the first 2 × 2 block is an identity matrix. Thus, we could obtain
a unique solution as x = 1, y = 2, which are the respective entries in the last column.
As you have seen, we may not be able to bring any arbitrary square matrix to the
identity matrix of the same order by elementary row operations. On the other hand,
if two rows of a matrix are same, then one of them can be made a zero row after
a suitable elementary row operation. We may thus look for a matrix with as many
zero entries as possible and somewhat closer to the identity matrix. Moreover, such
a matrix if invertible must be the identity matrix. We would like to define such a
matrix looking at our requirements on the end result.
Recall that in a matrix, the row index of the ith row is i, which is also called the
row index of the (i, j)th entry. Similarly, j is the column index of the jth column
and also of the (i, j)th entry.
In a nonzero row of a matrix, the nonzero entry with minimum column index (first
from left) is called a pivot. We mark a pivot by putting a box around it. A column
where a pivot occurs is called a pivotal column.
18 1 Matrix Operations
The row index of the pivot 1 is 1, and its column index is 2. The row index of the
pivot 2 is 3, and its column index is 4. The column indices of the pivotal columns
are 2 and 4.
A matrix A ∈ Fm×n is said to be in row reduced echelon form (RREF) iff the
following conditions are satisfied:
1. Each pivot is equal to 1.
2. In a pivotal column, all entries other than the pivot are zero.
3. The row index of each nonzero row is smaller than the row index of each zero
row.
4. Among any two pivots, the pivot with larger row index also has larger column
index.
Example 1.3 The following matrices are in row reduced echelon form:
⎡ ⎤ ⎡ 0⎤ ⎡ ⎤
1
1 2 0 0
⎢ ⎥ ⎢ 0⎥ ⎢ 0 ⎥
⎣ 0 0 1 0 ⎦,⎢ ⎥ ⎢
⎣ 0⎦ , ⎣
⎥, 0 0 0 0 , 0 1 0 2 .
⎦
0
0 0 0 1 0 0
In a row reduced echelon form matrix, all zero rows (if there are any) occur at the
bottom of the matrix. Further, the pivot in a latter row occurs below and to the right
of any former row.
A column vector (an n × 1 matrix) in row reduced echelon form is either the zero
vector or e1 .
If a matrix in RREF has k pivotal columns, then those columns occur in the matrix
as e1 , . . . , ek , read from left to right, though there can be other columns between these
pivotal columns.
Further, if a non-pivotal column occurs between two pivotal columns ek and ek+1 ,
then the entries of the non-pivotal column beyond the kth entry are all zero.
In a row reduced echelon form matrix, all entries below and to the left of any
pivot are zero. Ignoring such zero entries and drawing lines below and to the left of
pivots, a pattern of steps emerges, thus the name echelon form.
1.5 Row Reduced Echelon Form 19
Any matrix can be brought to a row reduced echelon form by using elementary
row operations. We first search for a pivot and make it 1; then using elementary row
operations, we zero-out all entries except the pivot in a column and then use row
exchanges to take the zero rows to the bottom. Following these guidelines, we give
an algorithm to reduce a matrix to its RREF.
Reduction to Row Reduced Echelon Form
1. Set the work region R to the whole matrix A.
2. If all entries in R are 0, then stop.
3. If there are nonzero entries in R, then find the leftmost nonzero column. Mark it
as the pivotal column.
4. Find the topmost nonzero entry in the pivotal column. Box it; it is a pivot.
5. If the pivot is not on the top row of R, then exchange the row of A which contains
the top row of R with the row where the pivot is.
6. If the pivot, say, α is not equal to 1, then replace the top row of R in A by 1/α
times that row. Mark the top row of R in A as the pivotal row.
7. Zero-out all entries, except the pivot, in the pivotal column by replacing each row
above and below the pivotal row using an elementary row operation (of Type 3)
in A with that row and the pivotal row.
8. If the pivot is the rightmost and the bottommost entry in A, then stop. Else, find
the sub-matrix to the right and below the pivot. Reset the work region R to this
sub-matrix, and go to Step 2.
We will refer to the output of the above reduction algorithm as the row reduced
echelon form (the RREF) of a given matrix.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 120 1 1 20 1 1 2 0
⎢ 3 5 7 1⎥ R1 ⎢ 0 2 1⎥ E 1/2 [2] ⎢
1 1/2 1/2⎥
⎢
Example 1.4 A = ⎣ ⎥ −→ ⎣⎢ 1 ⎥ −→ ⎢ 0 ⎥
1 5 4 5⎦ 0 4 25⎦ ⎣ 0 4 2 5⎦
2 879 0 6 39 0 6 3 9
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 3/2 −1/2 1 0 3/2 −1/2 1 0 3/2 0
⎢ ⎥ ⎢ ⎥
R2 ⎢ 0 1 1/2 1/2⎥ E 1/3 [3] ⎢ 0 1 1/2 1/2 ⎥ R3 ⎢ 0 1 1/2 0 ⎥
−→ ⎢⎣ 0 0 0
⎥ −→
⎦ ⎢ ⎥ −→ ⎢ ⎥ = B.
3 ⎣ 0 0 0 1 ⎦ ⎣ 0 0 0 1⎦
0 0 0 6 0 0 0 6 0 0 0 0
Here, R1=E −3 [2, 1], E −1 [3, 1], E −2 [4, 1], R2 = E −1 [2, 1], E −4 [3, 2], E −6 [4, 2]
and R3 = E 1/2 [1, 3], E −1/2 [2, 3], E −6 [4, 3]. Notice that the matrix B, which is the
RREF of A, is given by
B = E −6 [4, 3] E −1/2 [2, 3] E 1/2 [1, 3] E 1/3 [3] E −6 [4, 2] E −4 [3, 2]
E −1 [2, 1] E 1/2 [2] E −2 [4, 1] E −1 [3, 1] E −3 [2, 1] A.
The number of pivots in the RREF of a matrix A is called the rank of the matrix;
we denote it by rank(A).
Let A ∈ Fm×n . Suppose A has the columns as u 1 , . . . , u n ; these are column vectors
from Fm×1 . Thus, we write A = u 1 · · · u n .
20 1 Matrix Operations
E A = E u 1 · · · u n = B.
vi = a1 Eu k1 + · · · + a j Eu k j .
7 2 8
Theorem 1.1 Let A ∈ Fm×n . There exists a unique matrix in Fm×n in row reduced
echelon form obtained from A by elementary row operations.
Proof Suppose B, C ∈ Fm×n are matrices in RREF such that each has been obtained
from A by elementary row operations. Since elementary matrices are invertible,
B = E 1 A and C = E 2 A for some invertible matrices E 1 , E 2 ∈ Fm×m . Now, B =
E 1 A = E 1 (E 2 )−1 C. Write E = E 1 (E 2 )−1 to have B = EC, where E is invertible.
Assume, on the contrary, that B = C. Then, there exists a column index, say
k ≥ 1, such that the first k − 1 columns of B coincide with the first k − 1 columns
of C, respectively; and the kth column of B is not equal to the kth column of C. Let
u be the kth column of B, and let v be the kth column of C. We have u = Ev and
u = v.
Suppose the pivotal columns that appear within the first k − 1 columns in C are
e1 , . . . , e j . Then, e1 , . . . , e j are also the pivotal columns in B that appear within the
first k − 1 columns. Since B = EC, we have C = E −1 B; consequently,
e1 = Ee1 = E −1 e1 , . . . , e j = Ee j = E −1 e j .
u = Ev = β1 Ee1 + · · · + β j Ee j = β1 e1 + · · · + β j e j = v.
v = E −1 u = α1 E −1 e1 + · · · + α j E −1 e j = α1 e1 + · · · + α j e j = u.
Theorem 1.1 justifies our use of the term the RREF of a matrix. Thus, the rank of
a matrix does not depend on which algorithm we have followed in reducing it to its
RREF.
00000 0
If A1 = [1, 1, 3, 4]t and A3 = [2, −1, 1, 3]t , then determine A6 .
6. Consider the row vectors v1 = [1, 2, 3, 4], v2 = [2, 0, 1, 1], v3 = [−3, 2, 1, 2],
and v4 = [1, −2, −2, −3]. Construct a row vector v ∈ R4×1 which is not express-
ible as av1 + bv2 + cv3 + dv4 for any a, b, c, d ∈ R.
[Hint: Compute the RREF of A = [v1t v2t v3t v4t ]. ]
1.6 Determinant
There are two important quantities associated with a square matrix. One is the trace,
and the other is the determinant.
The sum of all diagonal entries of a square matrix is called the trace of the matrix.
That is, if A = [ai j ] ∈ Fm×m , then
n
tr(A) = a11 + · · · + ann = akk .
k=1
In addition to tr(Im ) = m and tr(0) = 0, the trace satisfies the following properties:
1. tr(α A) = α tr(A) for each α ∈ F.
2. tr(At ) = tr(A) and tr(A∗ ) = tr(A).
3. tr(A + B) = tr(A) + tr(B).
4. tr(AB) = tr(B A).
5. tr(A∗ A) = 0 iff tr(A A∗ ) = 0 iff A = 0.
Observe that (4) does not assert
that the trace of a product is equal to the product
m m
of their traces. Further, tr(A∗ A) = i=1 j=1 |ai j | proves (5).
2
1.6 Determinant 23
In general, the determinant of any triangular matrix (upper or lower) is the product
of its diagonal entries. In particular, the determinant of a diagonal matrix is also the
product of its diagonal entries. Thus, if I is the identity matrix of order n, then
det(I ) = 1 and det(−I ) = (−1)n .
Our definition of determinant expands the determinant in the first row. In fact,
the same result may be obtained by expanding it in any other row or even in any
column. To mention some similar properties of the determinant, we introduce some
terminology.
Let A ∈ Fn×n . The sub-matrix of A obtained by deleting the ith row and the jth
column is called the (i, j)th minor of A and is denoted by Ai j .
24 1 Matrix Operations
Here, R1 = E 1 [2, 1], E 1 [3, 1], E 1 [4, 1]; R2 = E 1 [3, 2], E 1 [4, 2]; R3 = E 1 [4, 3].
Example 1.6 See that the following is true, for verifying Property (6) as mentioned
above:
3 1 2 4 1 0 0 1 2 1 2 3
−1 1 0 1
= −1 1 0 1 + −1 1 0 1 .
−1 −1 1 1 −1 −1 1 1 −1 −1 1 1
−1 −1 −1 1 −1 −1 −1 1 −1 −1 −1 1
1.6 Determinant 25
−1 −1 −1 1
6. Let A, B ∈ F3×3 with det(A) = 2 and det(B) = 3. Determine
(a) det(4 A) (b) det(AB) (c) det(5AB) (d) det(2 A−1 B)
7. Let A, B ∈ F , and let E = [ei j ] with e11 = e22 = 0, e12 = e21 = 1. Show that
2×2
The adjugate property of the determinant provides a way to compute the inverse
of a matrix, provided it is invertible. However, it is very inefficient. We may use
elementary row operations to compute the inverse. Our computation of the inverse
bases on the following fact.
Proof Since elementary matrices are invertible, so is their product. Conversely, sup-
pose that A is an invertible matrix. Let E A−1 be the RREF of A−1 . If E A−1 has a
zero row, then E = E A−1 A also has a zero row. But E is a product of elementary
matrices, which is invertible; it does not have a zero row. Therefore, E A−1 does not
have a zero row. Then, each row in the square matrix E A−1 has a pivot. But the only
square matrix in RREF having a pivot at each row is the identity matrix. Therefore,
E A−1 = I. That is, A = E, a product of elementary matrices.
This suggests applying the same elementary row operations on both A and I
simultaneously so that A is reduced to its RREF. For this purpose, we introduce
the notion of an augmented matrix. If A ∈ Fm×n and B ∈ Fm×k , then the matrix
[A | B] ∈ Fm×(n+k) obtained from A and B by writing first all the columns of A and
then the columns of B, in that order, is called an augmented matrix. The vertical bar
shows the separation of columns of A and of B, though conceptually unnecessary.
For computing the inverse of a matrix, we start with the augmented matrix [A | I ].
We then apply elementary row operations for reducing A to its RREF, while simulta-
neously applying the same operations on the entries of I. This means we pre-multiply
the matrix [A | I ] with a product E of elementary matrices so that E A is in RREF.
In block form, our result is the augmented matrix [E A | E I ]. If A is invertible, then
E A = I and then E I = A−1 . Once the A portion has been reduced to I, the I portion
is A−1 . On the other hand, if A is not invertible, then its RREF will have at least one
zero row.
Next, we use elementary row operations in order to reduce A to its RREF. Since
a11 = 1, it is a pivot. To zero-out the other entries in the first column, we use the
sequence of elementary row operations E 1 [2, 1], E −2 [3, 1], E −1 [4, 1], and obtain
⎡ ⎤
1 −1 2 0 1 0 0 0
⎢ 0 −1 2 2 1 1 0 0⎥
⎢ ⎥
⎣ 0 3 −5 −2 −2 0 1 0⎦
0 −1 2 2 −1 0 0 1
The pivot is −1 in (2, 2) position. Use E −1 [2] to make the pivot 1. And then, use
E 1 [1, 2], E −3 [3, 2], E 1 [4, 2] to zero-out all non-pivot entries in the pivotal column:
⎡ ⎤
1 0 0 −2 0 −1 0 0
⎢ 0 1 −2 −2 −1 −1 0 0⎥
⎢ ⎥
⎣ 0 0 1 4 1 3 1 0⎦
0 0 0 0 −2 −1 0 1
1.7 Computing Inverse of a Matrix 27
Since a zero row has appeared in the A portion of the augmented matrix, we
conclude that A is not invertible. You see that the second portion of the augmented
matrix has no meaning now. However, it records the elementary row operations which
were carried out in the reduction process. Verify that this matrix is equal to
Next, the pivot is −1 in (2, 2) position. Use E −1 [2] to get the pivot as 1. And
then, E 1 [1, 2], E −3 [3, 2], E 2 [4, 2] gives
⎡ ⎤
1 0 0 −2 0 −1 0 0
⎢ 0 1 −2 −2 −1 −1 0⎥
⎢ 0 ⎥
⎣ 0 0 1 4 1 3 1 0⎦
0 0 −4 −2 −2 −2 0 1
Next pivot is 14 in (4, 4) position. Use E 1/14 [4] to get the pivot as 1. Use
E 2 [1, 4], E −6 [2, 4], E −4 [3, 4] to zero-out the entries in the pivotal column:
⎡ ⎤
1 0 0 0 2/7 3/7 4/7 1/7
⎢ 1/7 5/7 2/7 ⎥
⎢ 0 1 0 0 −3/7 ⎥
⎢ ⎥
⎣ 0 0 1 0 3/7 1/7 −1/7 −2/7 ⎦
Observe that if a matrix is not invertible, then our algorithm for reduction to RREF
produces a pivot in the I portion of the augmented matrix.
Exercises for Sect. 1.7
1. Compute the inverses of the following matrices, if possible:
⎡ ⎤
⎡ ⎤ ⎡ ⎤ 3 1 1 2
2 1 2 1 4 −6 ⎢ 1 2 0 1⎥
(a) ⎣ 1 3 1 ⎦ (b) ⎣ −1 −1 3 ⎦ (c) ⎢⎣ 1 1 2 −1 ⎦
⎥
−1 1 2 1 −2 3
−2 1 −1 3
21 −1
2. Let A = . Express A and A as products of elementary matrices.
64
52 34
3. Given matrices A = and B = , find matrices X and Y such that
31 12
AX = B and Y A = B. [Hint: Both A and B are invertible.]
⎡ ⎤
0 1 0
4. Let A = ⎣ 0 0 1 ⎦ , where b, c ∈ C. Show that A−1 = bI + c A.
1 −b −c
5. Show that if A is an upper triangular invertible matrix, then so is A−1 .
6. Show that if A is a lower triangular invertible matrix, then so is A−1 .
7. Can every square matrix be written as a sum of two invertible matrices?
8. Can every invertible matrix be written as a sum of two non-invertible matrices?
1.8 Problems
19. Let A, E ∈ Fm×m , B, F ∈ Fm×n , C, G ∈ Fn×m , and let D, H ∈ Fn×n . Show that
A B E F AE + BG AF + B H
= .
C D G H C E + DG C F + D H
In the reduction to RREF, why some rows are reduced to zero rows and why the
others are not reduced to zero rows? Similarly, in the RREF of a matrix, why some
columns are pivotal and others are not?
Recall that a row vector in F1×n and a column vector in Fn×1 are both written
uniformly as an n-tuple (a1 , . . . , an ) in Fn . Such an n-tuple of scalars from F is
interpreted as either a row vector with n components or a column vector with n
components, as the case demands. Thus, an n-tuple of scalars is called a vector in
Fn .
The sum of two vectors from Fn and the multiplication of a vector from Fn
by a scalar follow those of the row and/or column vectors. That is, for β ∈
F, (a1 , . . . , an ), (b1 , . . . , bn ) ∈ Fn , we define
This linear combination evaluates to [3, 1]. We say that [3, 1] is a linear combination
of v1 and v2 . Is [4, −2] a linear combination of v1 and v2 ? Yes, since
[a, b] = a+b
2
[1, 1] + a−b
2
[1, −1].
However, every vector in F1×2 is not a linear combination of [1, 1] and [2, 2].
Reason? Any linear combination of these two vectors is a scalar multiple of [1, 1].
Then, [1, 0] is not a linear combination of these two vectors.
Let A ∈ Fm×n be a matrix of rank r. Let u i1 , . . . , u ir be the r columns of A that
correspond to the pivotal columns in its RREF. Then among these r columns, no u i j
is a linear combination of others, and each other column in A is a linear combination
of these r columns.
Similarly, let wk1 , . . . , wkr be the r rows in A that correspond to the r nonzero
rows in its RREF (monitoring the row exchanges). Then among these r rows, no wk j
is a linear combination of others, and each other row of A is a linear combination of
these r rows.
The vectors v1 , . . . , vm in Fn are called linearly dependent iff at least one of
them is a linear combination of others. The vectors are called linearly independent
iff none of them is a linear combination of others.
For example, [1, 1], [1, −1], [3, 1] are linearly dependent vectors since [3, 1] =
2[1, 1] + [1, −1], whereas [1, 1], [1, −1] are linearly independent vectors in F1×2 .
If α1 = · · · = αm = 0, then the linear combination α1 v1 + · · · + αm vm evaluates
to 0. That is, the zero vector can always be written as a trivial linear combination.
However, a non-trivial linear combination of some vectors can evaluate to 0. For
instance, 2[1, 1] + [1, −1] − [3, 1] = 0. We guess that this can happen for linearly
dependent vectors, but may not happen for linearly independent vectors.
Suppose the vectors v1 , . . . , vm are linearly dependent. Then, one of them, say, vi
is a linear combination of others. That is, for some scalars α j ,
Then,
Here, we see that a linear combination becomes zero, where at least one of the
coefficients, that is, the ith one, is nonzero. That is, a non-trivial linear combination
of v1 , . . . , vm exists which evaluates to 0.
Conversely, suppose that we have scalars β1 , . . . , βm not all zero such that
β1 v1 + · · · + βm vm = 0.
1
vk = − β1 v1 + · · · + βk−1 vk−1 + βk+1 vk+1 + · · · + βm vm .
βk
2.1 Linear Independence 33
Theorem 2.1 provides a way to determine whether a finite number of vectors are
linearly independent or not. You start with a linear combination of the given vectors
and equate it to 0. Then, use the laws of addition and scalar multiplication to derive
that each coefficient in that linear combination is 0. Once you succeed, you conclude
that the given vectors are linearly independent. On the other hand, if it is not possible
to derive that each coefficient is 0, then from the proof of this impossibility you will
be able to express one of the vectors as a linear combination of the others. And this
would prove that the given vectors are linearly dependent.
Example 2.1 Are the vectors [1, 1, 1], [2, 1, 1], [3, 1, 0] linearly independent?
We start with an arbitrary linear combination and equate it to the zero vector. Then,
we solve the resulting linear equations to determine whether all the coefficients are
necessarily 0 or not. So, let
a + 2b + 3c = 0, a + b + c = 0, a + b = 0.
The last two equations imply that c = 0. Substituting in the first, we see that
a + 2b = 0. This and the equation a + b = 0 give b = 0. Then, it follows that a = 0.
We conclude that the given vectors are linearly independent.
Example 2.2 Are the vectors [1, 1, 1], [2, 1, 1], [3, 2, 2] linearly independent?
Clearly, the third one is the sum of the first two. So, the given vectors are linearly
dependent.
To illustrate our method, we start with an arbitrary linear combination and equate
it to the zero vector. We then solve the resulting linear equations to determine whether
all the coefficients are necessarily 0 or not. As earlier, let
a + 2b + 3c = 0, a + b + 2c = 0, a + b + 2c = 0.
The last equation is redundant. From the first and the second, we have
b + c = 0.
We may choose b = 1, c = −1 to satisfy this equation. Then from the second equa-
tion, we have a = 1. Our starting equation says that the third vector is the sum of the
first two.
Be careful with the direction of implication here. Your workout must be in the
following form:
Assume α1 v1 + · · · + αm vm = 0. Then · · · Then α1 = · · · = αm = 0.
And that would prove linear independence.
To see how linear independence is helpful, consider the following system of linear
equations:
x1 +2x2 −3x3 = 2
2x1 −x2 +2x3 = 3
4x1 +3x2 −4x3 = 7
Here, we find that the third equation is redundant, since 2 times the first plus the
second gives the third. That is, the third one linearly depends on the first two. You can
of course choose any other equation here as linearly depending on the other two, but
that is not important and that may not be always possible. Now, take the row vectors
of coefficients of the unknowns along with the right-hand side, as in the following:
We see that v3 = 2v1 + v2 , as it should be. That is, the vectors v1 , v2 , v3 are linearly
dependent. But the vectors v1 , v2 are linearly independent. Thus, solving the given
system of linear equations is the same thing as solving the system with only first two
equations.
For solving linear systems, it is of primary importance to find out which equations
linearly depend on others. Once determined, such equations can be thrown away, and
the rest can be solved.
As to our opening questions, now we know that in the RREF of a matrix, a column
that corresponds to a non-pivotal column is a linear combination of the columns
that correspond to the pivotal columns. Similarly, a zero row corresponds to one
(monitoring row exchanges) which is a linear combination of the nonzero (pivotal)
rows in the RREF. We will see this formally in the next section.
Exercises for Sect. 2.1
1. Check whether the given vectors are linearly independent:
(a) (1, 2, 6), (−1, 3, 4), (−1, −4, 2) in R3
(b) (1, 0, 2, 1), (1, 3, 2, 1), (4, 1, 2, 2) in C4
2.1 Linear Independence 35
Now, the rows of E β [i] A are v1 , . . . , vi−1 , βvi , vi+1 , . . . , vn for some β = 0.
If 1 ≤ i ≤ r, then vk = α1 v1 + · · · αi−1 vi−1 + αβii βi vi + αi+1 vi+1 + · · · + αr vr .
If i = k, then βvk = α1 βv1 + · · · + αr βvr .
If i > r, i = k, then vk = α1 v1 + · · · + αr vr .
In any case, the kth row of E β [i] A is a linear combination of the first r rows of
E β [i] A.
Similarly, the rows of E β [i, j] A are v1 , . . . , vi−1 , vi + βv j , vi+1 , . . . , vm . Sup-
pose that 1 ≤ j ≤ r.
If 1 ≤ i ≤ r, then vk = α1 v1 + · · · + αi (vi + βv j ) + · · · αr vr − αi βv j .
If i = k > r, then vk + βv j = α1 v1 + · · · + αi (vi + βv j ) + · · · αr vr .
If i > r, i = k, then vk = α1 v1 + · · · + αr vr .
In any case, if 1 ≤ j ≤ r, then the kth row of E β [i, j] A is a linear combination of
the first r of its rows. We summarize these facts as follows.
36 2 Systems of Linear Equations
Observation 2.1 Let A ∈ Fm×n . Let v1 , . . . , vm be the rows of A in that order. Let
E ∈ Fm×m be a product of elementary matrices of the types E β [i] and/or E β [i, j],
where 1 ≤ i, j ≤ r. Let w1 , . . . , wm be the rows of E A in that order. Let k > r. If the
kth row vk of A is a linear combination of the vectors v1 , . . . , vr , then the kth row
wk in E A is also a linear combination of w1 , . . . , wr .
In the notation of Observation 2.1, you can also show that if any row vector v is
a linear combination of v1 , . . . , vr , then it is a linear combination of w1 , . . . , wr and
vice versa.
Theorem 2.2 Let A ∈ Fm×n be a matrix of rank r. Let vi denote the ith row of
A. Suppose that the rows vi1 , . . . , vir have become the pivotal rows w1 , . . . , wr ,
respectively, in the RREF of A. Let v be any row of A other than vi1 , . . . , vir . Then,
the following are true:
(1) The vectors vi1 , . . . , vir are linearly independent.
(2) The row v of A has become a zero row in the RREF of A.
(3) The row v is a linear combination of vi1 , . . . , vir .
(4) The row v is a linear combination of w1 , . . . , wr .
Proof Monitoring the row exchanges, it can be found out which rows have become
zero rows and which rows have become the pivotal rows. Assume, without loss of
generality, that no row exchanges have been performed during reduction of A to
its RREF B. Then, B = E A, where E is a product of elementary matrices of the
forms E β [i] or E β [i, j]. The first r rows in B are the pivotal rows, i.e. i 1 = 1, i 2 =
2, . . . , ir = r. So, suppose that v1 , . . . , vr have become the pivotal rows w1 , . . . , wr
in B, respectively.
(1) If one of v1 , . . . , vr , say, vk is a linear combination of the others, then by Obser-
vation 2.1, the pivotal row wk is a linear combination of other pivotal rows. But
this is not possible. Hence among the vectors v1 , . . . , vr , none of them is a linear
combination of the others. Therefore, v1 , . . . , vr are linearly independent.
(2) Since rank(A) = r and v1 , . . . , vr have become the pivotal rows, no other row is
a pivotal row. That is, all other rows of A, including v, have become zero rows.
(3) The vectors wr +1 , . . . , wm are the zero rows in B. Each of them is a linear
combination of the pivotal rows w1 , . . . , wr . Now, the vectors w1 , . . . , wr are rows
in B, and v1 , . . . , vr are the corresponding rows in A = E −1 B, where E −1 is a
product of elementary matrices of the forms E β [i] or E β [i, j] with 1 ≤ j ≤ r. By
Observation 2.1, each of vr +1 , . . . , vm is a linear combination of v1 , . . . , vr .
(4) During row reduction, elementary row operations use the pivotal rows. Therefore,
each of the vectors v1 , . . . , vr is a linear combination of w1 , . . . , wr ; and each of the
vectors w1 , . . . , wr is a linear combination of v1 , . . . , vr . Then, it follows from (3)
that vr +k is a linear combination of w1 , . . . , wr also.
dependent. Else, the rank of A turns out to be m; consequently, the vectors are linearly
independent.
Example 2.3 To determine whether the vectors [1, 1, 0, 1], [0, 1, 1, −1], and
[1, 3, 2, −1] are linearly independent or not, we form a matrix with the given vectors
as its rows and then reduce it to its RREF. It is as follows.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 E [3,1] 1 1 0 1 1 0 −1 2
⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦
−1 R1
1 3 2 −1 0 2 2 −2 0 0 0 −4
⎡ ⎤
1 0 −1 0
R2 ⎢ ⎥
−→ ⎣ 0 1 1 0 ⎦ .
0 0 0 1
Here, R1 = E −1 [1, 2], E −2 [3, 2] and R2 = E −1/4 [3], E −2 [1, 3], E 1 [2, 3].
The last matrix is in RREF in which there is no zero row; each row has a pivot.
So, the original vectors are linearly independent.
Example 2.4 Are the vectors [1, 1, 0, 1], [0, 1, 1, −1] and [2, −1, −3, 5] linearly
independent?
We construct a matrix with the vectors as rows and reduce it to RREF.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 E [3,1] 1 1 0 1 1 0 −1 −2
⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ .
−2 R1
2 −1 −3 5 0 −3 −3 3 0 0 0 0
Here, R1 = E −1 [1, 2], E 3 [3, 2]. Since a zero row has appeared, the original vectors
are linearly dependent. Also, notice that no row exchanges were carried out in the
reduction process. So, the third vector is a linear combination of the first two, which
are linearly independent.
Does reduction to RREF change the linear dependence or linear independence of
columns of a matrix?
Theorem 2.3 Let E ∈ Fm×m be invertible. Let v1 , . . . , vk , v ∈ Fm×1 and
α1 , . . . , αk ∈ F.
(1) v = α1 v1 + · · · αk vk iff Ev = α1 Ev1 + · · · + αk Evk .
(2) v1 , . . . , vk are linearly independent iff Ev1 , . . . , Evk are linearly independent.
Proof (1) If v = α1 v1 + · · · + αk vk , then multiplying E on the left, we have Ev =
α1 Ev1 + · · · + αk Evk . Conversely, if Ev = α1 Ev1 + · · · + αk Evk , then multiplying
E −1 on the left we have v = α1 v1 + · · · + αk vk .
(2) Vectors v1 , . . . , vk are linearly dependent iff there exist scalars β1 , . . . , βk not
all zero such that β1 v1 + · · · βk vk = 0. By (1), this happens iff there exist scalars
β1 , . . . , βk not all zero such that β1 Ev1 + · · · βk Evk = 0 iff the vectors Ev1 , . . . , Evk
are linearly dependent.
38 2 Systems of Linear Equations
Here, R1 = E −1 [2, 1], E −1 [4, 1] and R2 = E −1 [3, 2] E 1 [4, 2]. The first and second
columns are pivotal, and the third is non-pivotal. The components of the third column
in the RREF show that u t3 = 2 u t1 − 3 u t2 . Thus, u 3 = 2 u 1 − 3 u 2 .
We also use the phrases such as linear dependence or independence for sets of vec-
tors. Given a set of vectors A = {v1 , . . . , vm }, we say that A is a linearly independent
set iff the vectors v1 , . . . , vm are linearly independent. Our method of determining
linear independence gives rise to the following useful result.
Theorem 2.5 Any set containing more than n vectors from Fn is linearly dependent.
2.2 Determining Linear Independence 39
Proof Without loss of generality, let S ⊆ Fn×1 have more than n vectors. So, let
v1 , . . . , vn+1 be distinct vectors from S. Consider the (augmented) matrix
A = [v1 · · · vn | vn+1 ].
Let B be the RREF of A. The reduction to RREF shows that there can be at most n
pivots in B and the last column in B is a non-pivotal column. By Observation 1.3,
vn+1 is a linear combination of the pivotal columns, which are among v1 , . . . , vn .
Therefore, A is linearly dependent.
Recall that the rank of a matrix A, denoted by rank(A), is the number of pivots in the
RREF of A. If rank(A) = r, then there are r number of linearly independent columns
in A and other columns are linear combinations of these r columns. The linearly
40 2 Systems of Linear Equations
independent columns correspond to the pivotal columns. Also, there exist r number
of linearly independent rows of A such that other rows are linear combinations of
these r rows. The linearly independent rows correspond to the pivotal rows, assuming
that we have monitored the row exchanges during the reduction of A to its RREF.
It raises a question. Suppose for a matrix A, we find r number of linearly inde-
pendent rows such that other rows are linear combinations of these r rows. Can it
happen that there are also k rows which are linearly independent and other rows are
linear combinations of these k rows, and that k = r ?
Theorem 2.6 Let A ∈ Fm×n . There exists a unique number r with 0 ≤ r ≤ min{m, n}
such that A has r number of linearly independent rows, and other rows of A are
linear combinations of these r ones. Moreover, such a number r is equal to the rank
of A.
Proof Let r = rank(A). Theorem 2.2 implies that there exist r rows of A which are
linearly independent and the other rows are linear combinations of these r rows.
Conversely, suppose there exist r rows of A which are linearly independent and
the other rows are linear combinations of these r rows. So, let i 1 , . . . , ir be the indices
of these r numbers of linearly independent rows of A. Consider the matrix
In the matrix B, those r linearly independent rows of A have become the first r
rows, and other rows are now placed as (r + 1)th row onwards. In the RREF of B,
the first r rows are pivotal rows, and other rows are zero rows. The matrix B has
been obtained from A by elementary row operations. By the uniqueness of RREF
(Theorem 1.1), A also has the same RREF as does B. The number of pivots in this
RREF is r. Therefore, rank(A) = r.
Moreover, when A = 0, the zero matrix, the number of pivots in A is 0. And if A is
a nonzero matrix, then in the RREF of A, there exists at least one pivot. The number
of pivots cannot exceed the number of rows; also, it cannot exceed the number of
columns. Therefore, r = rank(A) is a number between 0 and min{m, n}.
Theorem 2.7 Let A ∈ Fm×n . Then, rank(At ) = rank(A). Consequently, both the
row rank of A and the column rank of A are equal to rank(A).
2.3 Rank of a Matrix 41
Here, R1 = E −1 [2, 1], E −3 [3, 1], E 1 [4, 1]; R2 = E −1 [1, 2], E −2 [3, 2],
E −1 [4, 2].
Thus, rank(A) = 2.
Also, we see that the first two rows of A are linearly independent, and
r ow(3) = r ow(1) + 2 × r ow(2), r ow(4) = r ow(2) − 2 × r ow(1).
Thus, the row rank of A is 2.
From the RREF of A, we observe that the first two columns of A are linearly
independent, and
col(3) = col(1), col(4) = 3 × col(1) − 2 × col(2), col(4) = col(1).
Therefore, the column rank of A is also 2.
Example 2.7 Determine the rank of the matrix A in Example 1.4, and point out
which rows of A are linear combinations of other rows and which columns are linear
combinations of other columns, by reducing A to its RREF.
From Example 1.4, we have seen that
⎡ ⎤ ⎡ ⎤
1 0 3/2 0
1 1 2 0
⎢3 5 7 1⎥ E ⎢ 0 1 1/2 0 ⎥
⎢
A=⎣ ⎥ −→ ⎢ ⎥.
1 5 4 5 ⎦ ⎣ 0 0 0 1⎦
2 8 7 9 0 0 0 0
42 2 Systems of Linear Equations
E = E −3 [2, 1], E −1 [3, 1], E −2 [4, 1] E −1 [2, 1], E −4 [3, 2], E −6 [4, 2],
E 1/2 [1, 3], E −1/2 [2, 3], E −6 [4, 3].
We see that rank(A) = 3, the number of pivots in the RREF of A. In this reduction,
no row exchanges have been used. Thus, the first three rows of A are the required
linearly independent rows. The fourth row is a linear combination of these three
rows. In fact,
r ow(4) = 3 r ow(1) + (−1) r ow(2) + 2 r ow(3).
The RREF also says that the third column is a linear combination of first and second.
Notice that the coefficients in such a linear combination are given by the entries of
the third column in the RREF. As we have seen earlier,
col(3) = 23 col(1) + 21 col(2).
∗
Since each entry of A is the complex conjugate of the corresponding entry of
At , Theorem 2.7 implies that
Theorem 2.3 also implies that the column rank is well defined and is equal to the
rank. We generalize this theorem a bit.
Theorem 2.8 Let A ∈ Fm×n . Let P ∈ Fm×m and Q ∈ Fn×n be invertible matrices.
Then,
rank(P AQ) = rank(P A) = rank(AQ) = rank(A).
Proof Theorem 2.3 implies that the column rank of P AQ is same as the column
rank of AQ. Therefore, rank(P AQ) = rank(AQ). Also, since Q t is invertible, we
have rank(AQ) = rank(Q t At ) = rank(At ) = rank(A).
In general, when the matrix product P AQ is well defined, we have
We now use matrices to settle some issues regarding solvability of systems of lin-
ear equations, also called linear systems. A linear system with m equations in n
unknowns looks as follows:
Ax = b.
Here, A ∈ Fm×n , b ∈ Fm×1 , and x is an unknown vector in Fn×1 . We also say that
the matrix A is the system matrix of the linear system Ax = b.
There is a slight deviation from our accepted symbolism. In case of linear systems,
we write b as a column vector and xi are unknown scalars.
Notice that if the system matrix A ∈ Fm×n , then the linear system Ax = b has m
number of equations and n number of unknowns.
A solution of the system Ax = b is any vector y ∈ Fn×1 such that Ay = b. In
such a case, if y = [c1 , . . . , cn ]t , then ci is called as the value of the unknown xi in
the solution y. A solution of the system is also written informally as
x1 = c1 , . . . , xn = cn .
c1 v1 + · · · + cn vn = b.
Conversely, if b can be written this way for some scalars c1 , . . . , cn , then the vector
y = [c1 , . . . , cn ]t is a solution of Ax = b. So, we conclude that
44 2 Systems of Linear Equations
Ax = 0.
Theorem 2.9 Let A ∈ Fm×n , and let b ∈ Fm×1 . Then, the following statements are
true:
(1) If [A | b ] is obtained from [A | b] by applying a finite sequence of elementary
row operations, then each solution of Ax = b is a solution of A x = b and vice
versa.
(2) (Consistency) Ax = b has a solution iff rank([A | b]) = rank(A).
(3) If u is a (particular) solution of Ax = b, then each solution of Ax = b is given
by u + y, where y is a solution of the homogeneous system Ax = 0.
(4) If r = rank([A | b]) = rank(A) < n, then there are n − r unknowns which can
take arbitrary values and other r unknowns can be determined from the values
of these n − r unknowns.
(5) If m < n, then the homogeneous system has infinitely many solutions.
(6) Ax = b has a unique solution iff rank([A | b]) = rank(A) = n.
(7) If m = n, then Ax = b has a unique solution iff det(A) = 0.
(8) (Cramer’s Rule) If m = n and det(A) = 0, then the solution of Ax = b is given
by x j = det( A j (b) )/det(A) for each j ∈ {1, . . . , n}.
Proof (1) If [A | b ] has been obtained from [A | b] by a finite sequence of elementary
row operations, then A = E A and b = Eb, where E is the product of corresponding
elementary matrices. The matrix E is invertible. Now, A x = b iff E Ax = Eb iff
Ax = E −1 Eb = b.
(2) Due to (1), we assume that [A | b] is in RREF. Suppose Ax = b has a solution. If
there is a zero row in A, then the corresponding entry in b is also 0. Therefore, there
is no pivot in b. Hence, rank([A | b]) = rank(A).
Conversely, suppose that rank([A | b]) = rank(A) = r. Then, there is no pivot in
b. That is, b is a non-pivotal column in [A | b]. Thus, b is a linear combination of
pivotal columns, which are some columns of A. Therefore, Ax = b has a solution.
(3) Let u be a solution of Ax = b. Then, Au = b. Now, z is a solution of Ax = b
iff Az = b iff Az = Au iff A(z − u) = 0 iff z − u is a solution of Ax = 0. That is,
each solution z of Ax = b is expressed in the form z = u + y for a solution y of the
homogeneous system Ax = 0.
(4) Let rank([A | b]) = rank(A) = r < n. By (2), there exists a solution. Due to (3),
we consider solving the corresponding homogeneous system. Due to (1), assume
that A is in RREF. There are r number of pivots in A and m − r number of zero
rows. Omit all the zero rows; it does not affect the solutions. Write the system as
2.4 Solvability of Linear Equations 45
This gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 (y j a1 j − b1 ) a1n
⎢ ⎥ ⎢ ⎥
y1 ⎣ ... ⎦ + · · · + ⎣ ..
. ⎦ + · · · + yn
⎣ · · · ⎦ = 0.
an1 (y j an j − bn ) ann
In this sum, the jth vector is a linear combination of other vectors, where −y j s are
the coefficients. Therefore,
a11 · · · (y j a1 j − b1 ) · · · a1n
.. = 0.
.
an1 · · · (y j an j − bn ) · · · ann
Example 2.8 Does the following system of linear equations have a solution?
We take the augmented matrix and reduce it to its row reduced echelon form by
elementary row operations.
⎡ ⎤ ⎡ ⎤
5 2 −3 1 7 1 2/5 −3/5 1/5 7/5
⎣ 1 −3 2 −2 11 ⎦ −→ ⎣ 0 −17/5 13/5 −11/5 48/5 ⎦
R1
Here, R1 = E 1/5 [1], E −1 [2, 1], E −3 [3, 1] and R2 = E −5/17 [2], E −2/5 [1, 2], E −34/5 [3, 2].
Since an entry in the b portion has become a pivot, the system is inconsistent. In
fact, you can verify that the third row in A is simply first row minus twice the second
row, whereas the third entry in b is not the first entry minus twice the second entry.
Therefore, the system is inconsistent.
We write the set of all solutions of the system Ax = b as Sol(A, b). That is,
Sol(A, b) = y ∈ Fn×1 : Ay = b .
Example 2.9 To illustrate the proof of Theorem 2.9, we change the last equation in
the previous example to make it consistent. We consider the new system
Computation of the row reduced echelon form of the augmented matrix goes as
follows:
⎡ ⎤ ⎡ ⎤
5 2 −3 1 7 1 2/5 −3/5 1/5 7/5
⎣ 1 −3 2 −2 11 ⎦ −→ ⎣ R1
0 −17/5 13/5 −11/5 48/5 ⎦
3 8 −7 5 −15 0 34/5 −26/5 22/5 −96/5
⎡ ⎤
1 0 −5/17 −1/17 43/17
0 0 0 0 0
Here, R1 = E 1/5 [1], E −1 [2, 1], E −3 [3, 1] and R2 = E −5/17 [2], E −2/5 [1, 2], E −34/5 [3, 2], as
earlier. The third row in the RREF is a zero row. Thus, the third equation is redundant.
Now, solving the new system in row reduced echelon form is easier. Writing as linear
equations, we have
1 x1 − 5
x − 17
17 3
1
x4 = 43
17
1 x2 − 13 x + 11
17 3
x
17 4
= − 48
17
The unknowns corresponding to the pivots, that is, x1 and x2 , are the basic variables,
and the other unknowns, x3 , x4 , are the free variables. The number of basic variables is
equal to the number of pivots, which is the rank of the system matrix. By assigning the
free variables xi to any arbitrary values, say, αi , the basic variables can be evaluated
in terms of αi .
We assign x3 to α3 and x4 to α4 . Then, we have
2.5 Gauss–Jordan Elimination 49
x1 = 43
17
+ 5
α
17 3
+ 1
α ,
17 4
x2 = − 48
17
+ 13
α
17 3
− 11
α .
17 4
is a solution of the linear system. Moreover, any solution of the linear system is in
the above form. That is, the set of all solutions is given by
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎫
⎪ 43/17 5/17 1/17 ⎪
⎪
⎨⎢ −48 ⎥ ⎪
⎬
/ ⎢13/17⎥ ⎢−11/17⎥
Sol(A, b) = ⎢ ⎥ + α3 ⎢ ⎥ + α4 ⎢ ⎥ : α3 , α4 ∈ F .
17
⎪ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎪
⎪
⎩ ⎪
⎭
0 0 1
t
Here, the vector 43/17, −48/17, 0, 0 is a particular solution of the original system.
t t
The two vectors 5/17, 13/17, 1, 0 and 1/17, −11/17, 0, 1 are linearly independent
solutions of the corresponding homogeneous system. Notice that the nullity of the
system matrix is 2.
Instead of writing the RREF as a linear system again, we can reach at the set of
all solutions quite mechanically. See the following procedure.
Gauss–Jordan Elimination
1. Reduce the augmented matrix [A | b] to its RREF, say, [A |b ].
2. If a pivot has appeared in b , then Ax = b has no solutions.
3. Else, delete all zero rows from [A | b ].
4. Insert zero rows in [A | b ], if required, so that for each pivot, its row index is
equal to its column index.
5. Insert zero rows at the bottom, if required, to make A a square matrix. Call the
updated matrix [ Ã | b̃].
6. Change the diagonal entries of the zero rows in à from 0 to −1.
7. If the non-pivotal columns in à are u 1 , . . . , u k , then the set of all solutions is
given by Sol(A, b) = {b + α1 u 1 + · · · + αk u k }.
Example 2.10 We apply Gauss–Jordan elimination on the linear system of
Example 2.9. The RREF of the augmented matrix as computed there is
⎡ ⎤
1 0 −5/17 −1/17 43/17
0 0 0 0 0
We delete the zero row at the bottom. For each pivot, the row index is equal to its
column index; so, no new zero row is to be inserted. Next, to make A a square matrix,
50 2 Systems of Linear Equations
we adjoin two zero rows at the bottom. Next, we change the diagonal entries of all
zero rows to −1. It yields the following matrix:
⎡ ⎤
1 0 −5/17 −1/17 43/17
⎢ 0 −13/17 11/17 −48/17 ⎥
[ Ã | b̃] = ⎢
⎣ 0
1 ⎥
⎦.
0 −1 0 0
0 0 0 −1 0
The non-pivotal columns are the third and the fourth columns. According to Gauss–
Jordan elimination, the set of solutions is given by
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎫
⎪ 43/17 −5/17 −1/17 ⎪
⎪
⎨⎢−48 ⎥ ⎪
⎬
/17⎥ ⎢−13/17⎥ ⎢ 11/17 ⎥
Sol(A, b) = ⎣⎢ + α ⎢ ⎥ + α ⎢ ⎥ : α , α ∈ F .
⎪
⎪ 0 ⎦ 1 ⎣ −1 ⎦ 2 ⎣ 0⎦ 1 2
⎪
⎪
⎩ ⎭
0 0 −1
You may match this solution set with that in Example 2.9.
0 0 0 0 0
Here, R1 = E −1/5 [2, 1], E −3/5 [3, 1]. The augmented matrix is now in row echelon
form. It is a consistent system, since no entry in the b portion is a pivot. The pivots
say that x1 , x2 are basic variables and x3 , x4 are free variables. We assign x3 to α3
and x4 to α4 . Writing in equation form, we have
48
x1 = 7 − 2 x2 + 3 α3 − α4 , x2 = − 17
5
5
− 13
α
5 3
+ 11
5 4
α .
x1 = 43
17
+ 5
α
17 3
+ 1
17
, x2 = − 48
17
+ 13
α
17 3
− 11
α ,
17 4
x3 = α3 , x4 = α4 .
2.5 Gauss–Jordan Elimination 51
As you see we end up with the same set of solutions as in Gauss–Jordan elimination.
Exercises for Sect. 2.5
1. Using Gauss–Jordan elimination, and also Gaussian elimination, solve the fol-
lowing linear systems:
(a) 3w + 2x + 2y − z = 2, 2x + 3y + 4z = −2, y − 6z = 6
(b) w + 4x + y + 3z = 1, 2x + y + 3z = 0, w + 3x + y + 2z = 1,
2x + y + 6z = 0
(c) w − x + y − z = 1, w + x − y − z = 1, w − x − y + z = 2,
4w − 2x − 2y = 1
2. For each of the following augmented matrices, determine the solution set of the
corresponding
⎡ ⎤linear system:
⎡ ⎤ ⎡ ⎤
1002 1402 1 −3 0 2
(a) ⎣ 0 0 1 1 ⎦ (b) ⎣ 0 0 1 3 ⎦ (c) ⎣ 0 0 1 −2 ⎦
0013 0001 0 00 0
⎡ ⎤
⎡ ⎤ 3 ⎡ ⎤
1 0 2 0 −1 ⎢2⎥ 0
⎢ 0 1 3 0 −1 ⎥ ⎢ ⎥ ⎢5⎥
3. Let B = ⎢ ⎥ ⎢ ⎥
⎣ 0 0 0 1 5 ⎦ , w = ⎢ 0 ⎥ , and let b = ⎣ 3 ⎦ .
⎢ ⎥
⎣2⎦
0000 0 4
0
Suppose A is a matrix such that A1 = [2, 1, −3, 2]t , A2 = [−1, 2, 3, 1]t , B is
the RREF of A, and Aw = b. Determine the following:
(a) Sol(A, 0) (b) Sol(A, b) (c) A
4. Let A be a matrix with Sol(A, 0) = {α[1 0 − 1]t + β[0 1 3]t : α, β ∈ R}, and let
b = A1 + 2 A2 + A3 . Determine Sol(A, b).
5. Show that the linear system x + y + kz = 1, x − y − z = 2, 2x + y − 2z = 3
has no solution for k = 1 and has a unique solution for each k = 1.
6. Determine, if possible, the values of a, b so that the following systems have (i)
no solutions, (ii) unique solution, and (iii) infinitely many solutions:
(a) x1 + 2x2 + x3 = 1, −x1 + 4x2 + 3x3 = 2, 2x1 − 2x2 + ax3 = 3
(b) x1 + x2 + 3x3 = 2, x1 + 2x2 + 4x3 = 3, x1 + 3x2 + ax3 = b
52 2 Systems of Linear Equations
2.6 Problems
Recall that F stands for either R or C; and Fn is either F1×n or Fn×1 . Also, recall
that a typical row vector in F1×n is written as [a1 , . . . , an ] and a column vector
in Fn×1 is written as [a1 , . . . , an ]t . Both the row and column vectors are written
uniformly as (a1 , . . . , an ); these constitute the vectors in Fn . In Fn , we have a special
vector, called the zero vector, which we denote by 0; that is, 0 = (0, . . . , 0). And if
x = (a1 , . . . , an ) ∈ Fn , then its additive inverse is −x = (−a1 , . . . , −an ).
The operations of addition and scalar multiplication in Fn enjoy the following
properties:
For all u, v, w ∈ Fn , and for all α, β ∈ F,
1. u + v = v + u.
2. (u + v) + w = u + (v + w).
3. u + 0 = 0 + u = u.
4. u + (−u) = −u + u = 0.
5. α(βu) = (αβ)u.
6. α(u + v) = αu + αv.
7. (α + β)u = αu + βu.
8. 1 u = u.
9. (−1)u = −u.
10. If u + v = u + w, then v = w.
11. If αu = 0, then α = 0 or u = 0.
It so happens that the last three properties follow from the earlier ones. Any
nonempty set where the two operations of addition and scalar multiplication are
defined, and which enjoy the first eight properties above, is called a vector space
over F. In this sense, Fn , that is, both F1×n and Fn×1 , are vector spaces over F. In
such a general setting if a nonempty subset of a vector space is closed under both
the operations, then it is called a subspace. We may not need these general notions.
However, we define a subspace of our specific vector spaces.
Let V be a nonempty subset of Fn . We say that V is a subspace of Fn iff for
each scalar α ∈ F and for each pair of vectors u, v ∈ V, we have both u + v ∈ V and
αu ∈ V.
This is the meaning of the informal phrase: V is closed under addition and scalar
multiplication. It is easy to see that a nonempty subset V of Fn is a subspace of Fn
iff the following single condition is satisfied:
for each α ∈ F and for all vectors u, v from V, αu + v ∈ V.
2 × 1/2 + 3 × 1/3 + 5 × 0 = 2 = 1.
Verify that in a subspace V, all the properties (1–8) above hold true. This is
the reason we call such a nonempty subset as a subspace. Further, if U and V are
subspaces of Fn and U ⊆ V, then we say that U is a subspace of V.
A singleton with a nonzero vector is not a subspace. For example, {(1, 1)} is not
a subspace of F2 since 2(1, 1) is not an element of this set.
What about the set {α(1, 1) : α ∈ F}? Take any two vectors from this set, say,
α(1, 1) and β(1, 1). Let γ ∈ F. Now,
is an element of the set. Therefore, the set is a subspace of F2 . Notice that this set is
the set of all linear combinations of the vector (1, 1).
3.1 Subspace and Span 55
α1 v1 + · · · + αm vm
span(S) = ∪∞
m=1 {α1 v1 + · · · + αm vm : α1 , . . . , αm ∈ F, v1 , . . . , vm ∈ S}
= ∪∞
m=1 span{v1 , . . . , vm } for v1 , . . . , vm ∈ S
= ∪∞
m=1 span(A) for A ⊆ S with |A| = m.
Here, |A| means the number of elements of the set A. Further, when we speak of a
set of vectors, it is implicitly assumed that the set is a subset of Fn for some n.
span{(1, 1), (2, 2)} = span{(1, 1)} = F2 , span{(1, 1), (1, 2)} = F2 .
Further, the vectors (1, 1), (1, 2), (1, −1) also span F2 . In fact, since the first two
vectors span F2 , any list of vectors from F2 containing these two will also span F2 .
Similarly, the vectors e1 , . . . , en in Fn×1 span Fn×1 , where ei is the column vector
in Fn×1 whose ith component is 1 and all other components are 0.
In this terminology, vectors v1 , . . . , vn are linearly dependent iff one of the vectors
in this list is in the span of the rest. If no vector in the list is in the span of the rest,
then the vectors are linearly independent.
Exercises for Sect. 3.1
S = {[1, 2, −3], [1, 0, −1], [2, −4, 2], [0, −2, 2]}
On the other hand, the linearly independent subset {[1, 2, −3]} of S does not span
V. For instance,
That is, a spanning subset may be superfluous and a linearly independent set may
be deficient. A linearly independent set which also spans a subspace may be just
adequate in spanning the subspace.
Let V be a subspace of Fn . Let B be a set of vectors from V. We say that the set
B is a basis of V iff B is linearly independent and B spans V. Also, we define ∅ as
the basis for the zero subspace {0}.
In what follows, we consider ordered sets, and the ordering of vectors in a set is
shown by the way they are written. For instance, in the ordered set {v1 , v2 , v3 }, the
vector v1 is the first vector, v2 is the second, and v3 is the third, whereas in {v2 , v3 , v1 },
the vector v2 is the first, v3 is the second, and v1 is the third. We assume implicitly
that each basis is an ordered set.
It follows from Theorem 2.5 that each basis for a subspace of Fn has at most n
vectors.
V = {[a, b, c] : a + b + c = 0, a, b, c ∈ F}.
Of course, you can prove that a maximal linearly independent set is a basis, and a
minimal spanning set is a basis. This would guarantee that each subspace of Fn has
a basis. We take a more direct approach.
The zero subspace {0} has a single basis ∅. But other subspaces do not have a
unique basis. For instance, the subspace V in Example 3.3 has at least two bases.
However, something remains same in all these bases. In that example, both the bases
have exactly two vectors.
Theorem 3.3 Let V be a subspace of Fn . All bases of V have the same number of
vectors.
3.2 Basis and Dimension 59
In view of Theorem 3.3, there exists a unique non-negative number associated with
each subspace of Fn , which is the number of vectors in any basis of the subspace.
Let V be a subspace of Fn . The number of vectors in some (or any) basis for V is
called the dimension of V. We write this number as dim(V ) and also as dim V.
Since {e1 , . . . , en } is a basis for Fn×1 , dim(Fn×1 ) = n. Similarly, dim(F1×n ) = n.
Remember that when we consider Cn×1 or C1×n , the scalars in any linear combination
are complex numbers, and for Rn×1 or R1×n , the scalars are real numbers. Notice
that dim({0}) = dim(span(∅)) = 0; the dimension of any subspace of Fn is at most
n.
The vectors [2, 1, 0, −2] and [−3, 0, 1, 3] are linearly independent. Therefore, U
has a basis {[2, 1, 0, −2], [−3, 0, 1, 3]}. So, dim(U ) = 2.
Recall that |B| stands for the number of elements in a set B. For any subspace V
of Fn , and any subset B of Fn , the following statements should be obvious:
1. If |B| < dim(V ), then span(B) is a proper subspace of V.
2. If |B| > dim(V ), then B is linearly dependent.
3. If |B| = dim(V ) and span(B) = V, then B is a basis of V.
4. If |B| = dim(V ) and B is linearly independent, then B is a basis of V.
5. If U is a subspace of V , then dim(U ) ≤ dim(V ) ≤ n.
6. If B is a superset of a spanning set of V, then B is linearly dependent.
7. If B is a proper subset of a linearly independent subset of V, then B is linearly
independent, and span(B) is a proper subspace of V.
8. Each spanning set of V contains a basis of V.
9. Each linearly independent subset of V can be extended to a basis of V.
For (8)–(9), we may employ the same construction procedure as in the proof of
Theorem 3.2. A statement equivalent to (9) is proved below.
60 3 Matrix as a Linear Map
We can use elementary row operations to extract a basis for a subspace which is
given in the form of span of some finite number of vectors. We write the vectors as
row vectors, form a matrix A, and convert it to its RREF. Then, the pivotal rows of
the RREF form the required basis. Also, those rows of A which have become the
pivotal rows (monitoring row exchanges) form a basis.
Example 3.5 Let U = span{(1, 1, 1, 1), (2, 1, 0, 3), (−1, 0, 1, −2), (0, 3, 2, 1)}.
Find a basis for the subspace U of F4 .
We start with the matrix with these vectors as its rows and convert it to its RREF
as follows:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 1 1 1 0 −1 2
⎢ 2 1 0 3⎥ R1 ⎢ 0 −1 −2 ⎥
1⎥ ⎢ 2 −1⎥
⎢ ⎥ −→ ⎢ −→ ⎢
R2 0 1 ⎥
⎣−1 0 1 −2⎦ ⎣0 1 2 −1⎦ ⎣0 0 0 0⎦
0 3 2 1 0 3 2 1 0 0 −4 4
⎡ ⎤
1 0 0 1
R3 ⎢ 0 1 0 1⎥
−→ ⎢ ⎥
⎣ 0 0 1 −1⎦ .
0 0 0 0
Here, R1 = E −2 [2, 1], E 1 [3, 1]; R2 = E −1 [2], E −1 [1, 2], E −1 [3, 2], E −3 [4, 2];
and R3 = E[3, 4], E −1/4 [3], E 1 [1, 3], E −2 [2, 3].
The pivotal rows show that {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, −1)} is a basis for
the given subspace. Notice that only one row exchange has been done in this reduction
process, which means that the third row in the RREF corresponds to the fourth vector
and the fourth row corresponds to the third vector. Thus, the pivotal rows correspond
to the first, second, and the fourth vector, originally. This says that a basis for the
subspace is also given by {(1, 1, 1, 1), (2, 1, 0, 3), (0, 3, 2, 1)}.
The reduction process confirms that the third vector is a linear combination of the
first, second, and the fourth.
The method illustrated in the above example can be used in another situation.
Suppose B is a basis for U, which is a subspace of V ⊆ Fn . Assume that a basis E
for V is also known. Then, we can extend B to a basis for V. Towards this, we may
form a matrix with the vectors in B as row vectors and then place the vectors from
3.2 Basis and Dimension 61
E as rows below those from B. Then, reduction of this matrix to RREF produces a
basis for V, which is an extension of B.
Linear independence has something to do with invertibility of a square matrix.
Suppose the rows of a matrix A ∈ Fn×n are linearly independent. Then, the RREF
of A has n number of pivots. That is, rank(A) = n. Consequently, A is invertible.
On the other hand, if a row of A is a linear combination of other rows, then this row
appears as a zero row in the RREF of A. That is, A is not invertible.
Considering At instead of A, we conclude that At is invertible iff the rows of A
are linearly independent. However, A is invertible iff At is invertible. Therefore, A
is invertible iff its columns are linearly independent.
Theorem 3.5 A square matrix is invertible iff its rows are linearly independent iff
its columns are linearly independent.
From Theorem 3.5, it follows that an n × n matrix is invertible iff its rows form
a basis for F1×n iff its columns form a basis for Fn×1 .
Exercises for Sect. 3.2
Let A ∈ Fm×n . If x ∈ Fn×1 , then Ax ∈ Fm×1 . Thus, we may view the matrix A as a
function from Fn×1 to Fm×1 which maps x to Ax. That is, the map A : Fn×1 → Fm×1
is given by
A(x) = Ax for x ∈ Fn×1 .
Notice that instead of using another symbol, we write the map obtained this way
from a matrix A as A itself. Due to the properties of matrix product, the following
are true for the map A:
1. A(u + v) = A(u) + A(v) for all u, v ∈ Fn×1 .
2. A(αv) = α A(v) for all v ∈ Fn×1 and for all α ∈ F.
In this manner, a matrix is considered as a linear transformation. In fact, any
function A from a vector space to another (both over the same field) satisfying the
above two properties is called a linear transformation or a linear map.
To see the connection between the matrix as a rectangular array and as a function,
consider the values of the matrix A at the standard basis vectors e1 , . . . , en in Fn×1 .
The column vector e j ∈ Fn×1 has the jth entry as 1 and all other entries 0. Let
A = [ai j ] ∈ Fm×n . The product Ae j is a vector in Fm×1 , whose ith entry is
The range of the linear transformation A is the set R(A) = {Ax : x ∈ Fn×1 }. If
y ∈ R(A), then there exists an x = [α1 , . . . , αn ]t ∈ Fn×1 such that y = Ax. The
vector x can be written as x = α1 e1 + · · · + αn en . Thus, such a y ∈ R(A) is written
as
y = Ax = α1 Ae1 + · · · + αn Aen .
Similarly, the subspace of F1×n which is spanned by the rows of A is called the
row space of A. Notice that the nonzero rows in the RREF of A form a basis for the
row space of A. The dimension of the row space is the row rank of A, which we
know to be equal to rank(A) also.
For an m × n matrix A, viewed as a linear transformation, the set of all vectors
which map to the zero vector is denoted by N (A). That is,
We find that N (A) is the set of all solutions of the linear system Ax = 0. Also,
if u, v ∈ N (A) and α ∈ F, then A(αu + v) = α Au + Av = 0.
Theorem 2.9 (4) implies that dim(R(A)) + dim(N (A)) = n. Since this will be used
often, we mention it as a theorem. An alternate proof of this theorem is given in
Problem 12.
Theorem 3.6 (Rank Nullity) Let A ∈ Cm×n . Then, dim(R(A)) + dim(N (A)) =
rank(A) + null(A) = n.
Example 3.6 Consider the system matrix in Example 2.9. We had its RREF with
(boxed) pivots as shown below:
64 3 Matrix as a Linear Map
⎡ ⎤ ⎡ ⎤
5 2 −3 1 1 0 −5/17 −1/17
The first two columns in RREF(A) are the pivotal columns. So, the first two
columns in A form a basis for R(A). That is,
For a basis of N (A), notice that each pivot has the row index equal to the column
index; so, we do not require to insert zero rows between pivotal rows. To make it a
square matrix, we attach a zero row to the RREF at the bottom:
⎡ ⎤
1 0 −5/17 −1/17
⎢ 0 −13/17 11/17 ⎥
D=⎢
⎣ 0
1 ⎥.
⎦
0 0 0
0 0 0 0
Then, we change the diagonal entries in the non-pivotal columns to −1. These
changed non-pivotal columns form a basis for N (A). That is,
t t
a basis for N (A) is − 5
17
, − 13
17
, −1, 0 , − , , 0, −1
1 11
17 17
.
A basis for R(A) is provided by the columns of A that correspond to the pivotal
columns, that is, the first and the third columns of A. So,
The second and fourth columns are linear combinations of these basis vectors,
where the coefficients are provided by the entries in the non-pivotal columns of the
RREF. That is, the second column of A is 25 times the first column, and the fourth
column is 1 times the first column plus 1 times the second column. Indeed, we may
verify that
3.3 Linear Transformations 65
[2, 4, −2]t = 25 [5, 10, −5]t , [2, 5, −2]t = [5, 10, −5]t + [−3, −5, 3]t .
Towards computing a basis for N (A), notice that the only zero row in the RREF
is the third row. So, we delete it. Next, the pivot on the second row has the column
index as 3. So, we insert a zero row between first and second rows. Next, we adjoin
a zero row at the bottom to make it a square matrix. Finally, we change the diagonal
entry of these new rows to −1. We then obtain the matrix
⎡ ⎤
1 2/5 0 1
⎢ 0 −1 0 0⎥
⎢ ⎥.
⎣ 0 0 1 1⎦
0 0 0 −1
v = a1 v1 + · · · + am vm = b1 v1 + · · · + bm vm .
α1 v1 + · · · + αm vm = 0
0 v1 + · · · + 0 vm = 0.
t
Therefore, [v] B = − 21 , − 23 .
3.4 Coordinate Vectors 67
That is, [αu + v] B = α[u] B + [v] B . In a way, this is the linear property of the
coordinate vector map. Also,
v j = 0 · v1 + · · · + 0 · v j−1 + 1 · v j + 0 · v j+1 + · · · + 0 · vm .
As the columns of P form a basis for Fn×1 , the matrix P is invertible. Therefore,
[u] B = P −1 u.
68 3 Matrix as a Linear Map
Example 3.9 Consider the basis B = {u 1 , u 2 , u 3 } for R3×1 , where u 1 = [1, 1, 1]t ,
u 2 = [1, 0, 1]t , and u 3 = [1, 0, 0]t . The matrix P in Theorem 3.8 and its inverse are
given by
⎡ ⎤ ⎡ ⎤
1 1 1 0 1 0
P = [u 1 u 2 u 3 ] = ⎣1 0 0⎦ , P −1 = ⎣0 −1 1⎦ .
1 1 0 1 0 −1
We see that
e1 = [1, 0, 0]t = 0 · u 1 + 0 · u 2 + u 3
e2 = [0, 1, 0]t = u 1 − u 2 + 0 · u 3
e3 = [0, 0, 1]t = 0 · u 1 + u 2 − u 3
It leads to
That is, P[e j ] B = e j for each j as stated in Theorem 3.8. We verify the other con-
clusion of the theorem for u = [1, 2, 3]t . Here, u = 2u 1 + u 2 − 2u 3 . Therefore,
⎡ ⎤ ⎡ ⎤⎡ ⎤
2 0 1 0 1
[u] B = ⎣ 1⎦ = ⎣0 −1 1⎦ ⎣2⎦ = P −1 u.
−2 1 0 −1 3
It raises a more general problem: if a vector v has the coordinate vector [v] B with
respect to a basis B of Fn×1 , and also a coordinate vector [v]C with respect to another
basis C of Fn×1 , then how are [v] B and [v]C related? We will address this question
in Sect. 3.6.
Exercises for Sect. 3.4
In the following, a basis B for F3×1 and a vector u ∈ F3×1 are given. Compute the
coordinate vector [u] B .
1. B = {[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }, u = [1, 2, 3]t
2. B = {[0, 1, 1]t , [1, 0, 1]t , [1, 1, 0]t }, u = [2, 1, 3]t
3. B = {[1, 1, 1]t , [1, 2, 1]t , [1, 2, 3]t }, u = [3, 2, 1]t
3.5 Coordinate Matrices 69
[Au] E = A [u] D .
If we choose a different pair of bases, that is, a basis B for Fn×1 and a basis C for
Fm×1 , then u has the coordinate vector [u] B and Au has the coordinate vector [Au]C .
Looking at the above equation, we ask:
Does there exist a matrix M such that [Au]C = M[u] B ?
If so, is it unique?
Proof Let j ∈ {1, . . . , n}. Since Au j ∈ Fm×1 and C is a basis for Fm×1 , there exist
unique scalars α1 j , . . . , αm j such that
Au j = α1 j v1 + · · · + αm j vm .
Thus, we have unique mn scalars αi j so that the above equation is true for each
j. Construct the matrix M = [αi j ] ∈ Fm×n . By Observation 3.2,
t
[Au j ]C = α1 j , . . . , αm j = M e j = M [u j ] B for each j ∈ {1, . . . , n}.
We must verify that such an equality holds for each u ∈ Fn×1 . So, let u ∈ Fn×1 .
As B is a basis of Fn×1 , there exist unique scalars β1 , . . . , βn such that u = β1 u 1 +
· · · + βn u n . Now,
In view of Theorem 3.9, we denote the matrix M as [A]C,B and give it a name.
Let A ∈ Fm×n . Let B be a basis for Fn×1 , and let C be a basis for Fm×1 . The unique
m × n matrix [A]C,B that satisfies
Also, notice that if the coordinate matrix is given by (3.2), then the equalities in
(3.1) are satisfied.
⎡ ⎤
0 1
Example 3.10 Compute the coordinate matrix [A]C,B of A = ⎣ 1 1 ⎦ with
1 −1
respect to the bases B = {{[1, 2]t , [3, 1]t } for F2×1 , and C = {[1, 0, 0]t , [1, 1, 0]t ,
[1, 1, 1]t } for F3×1 .
Write u 1 = [1, 2]t , u 2 = [3, 1]t and v1 = [1, 0, 0]t , v2 = [1, 1, 0]t , v3 =
[1, 1, 1]t . We obtain
P = u 1 · · · u n , Q = v1 · · · vm ,
by taking the column vectors u j and vi as columns of the respective matrices. Then,
[A]C,B = Q −1 A P.
To summarize, if A ∈ Fm×n , B is a basis for Fn×1 , and C is a basis for Fm×1 , then
the coordinate matrix [A]C,B can be obtained in two ways.
Due to Theorem 3.9, we express the A-image of the basis vectors in B as linear
combinations of basis vectors in C. The coefficients of each such image form the
corresponding columns of the matrix [A]C,B .
Alternatively, using Theorem 3.10, we construct P as the matrix whose jth column
is the jth vector in B and the matrix Q by taking its ith column as the ith vector in
C. Then, we have [A]C,B = Q −1 A P.
Since inverse of a matrix is computed using the reduction to RREF, the coordinate
matrix may be computed the same way. If B = {u 1 , . . . , u n } and C = {v1 , . . . , vm }
are bases for Fn×1 and Fm×1 , respectively, then
P = u1 · · · un , Q = v1 · · · vm .
0 0 1 −1 2 0 0 1 −1 2
⎡ ⎤
−1 −3
Here, R1 = E −1 [1, 2], E −1 [2, 3]. Therefore, [A]C,B = ⎣ 4 2 ⎦ .
−1 2
It is easy to verify that Au 1 = −v1 + 4v2 − v3 and Au 2 = −3v1 + 2v2 + 2v3 as
required by (3.1)–(3.2).
The linear property of coordinate vectors can be extended to coordinate matrices;
see Exercise 2. Moreover, the product formula [Au]C = [A]C,B [u] B has a similar
looking expression for product of matrices.
Theorem 3.11 Let A ∈ Fm×k and B ∈ Fk×n be matrices. Let C, D and E be bases
for Fn×1 , Fk×1 and Fm×1 , respectively. Then, [AB] E,C = [A] E,D [B] D,C .
Proof For each v ∈ Fn×1 ,
[A] E,D [B] D,C [v]C = [A] E,D [Bv] D = [ABv] E = [AB] E,C [v]C .
Recall that [A]C,B is that matrix in Fm×n which satisfies the equation:
Further, [A]C,B is given by Q −1 A P, where P is the matrix whose jth column is the
jth basis vector of B, and Q is the matrix whose ith column is the ith basis vector
of C. In particular, taking A ∈ Fn×n as the identity matrix, writing B as O, and C as
N , we see that
In (3.3), P is the matrix whose jth column is the jth basis vector of O, and Q is the
matrix whose ith column is the ith basis vector of N for 1 ≤ i, j ≤ n. Both O and
N are bases for Fn×1 .
The formula in (3.3) shows how the coordinate vector changes when a basis
changes. The matrix I N ,O , which is equal to Q −1 P, is called the change of basis
matrix or the transition matrix when the basis changes from an old basis O to a
new basis N .
Observe that the change of basis matrix I N ,O can also be computed by expressing
the basis vectors of O as linear combinations of basis vectors of N as stipulated by
Theorem 3.9, or equivalently, in (3.1)–(3.2). Alternatively, if O = {u 1 , . . . , u n } and
N = {v1 , . . . , vn }, the change of basis matrix I N ,O , which is equal to Q −1 P, may
be computed by using reduction to RREF. Schematically, it is given as follows:
RRE F
O = {u 1 , . . . , u n }, N = {v1 , . . . , vn }, v1 · · · vn | u 1 · · · u n −→ [I | I N ,O ].
O = {[1, 0, 1]t , [1, 1, 0]t , [0, 1, 1]t }, N = {[1, −1, 1]t , [1, 1, −1]t , [−1, 1, 1]t }.
The coefficients from the first equation give the first column of I N ,O and so on.
Thus, we obtain ⎡ ⎤
1 1/2 1/2
I N ,O = ⎣1/2 1 1/2⎦ .
1/2 1/2 1
0 0 2 1 1 2 0 0 1 1/2 1/2 1
Here, R1 = E 1 [2, 1], E −1 [3, 1]; R2 = E 1/2 [2], E −1 [1, 2], E 2 [3, 2]; R3 = E 1/2 [3], E 1 [1, 3].
The matrix in the RREF, to the right of the vertical bar, is the required change of
basis matrix I N ,O .
To verify our result for [1, 2, 3]t , notice that
3.6 Change of Basis Matrix 75
Therefore, [[1, 2, 3]t ] O = [1, 0, 2]t and [[1, 2, 3]t ] N = [2, 3/2, 5/2]t . Then,
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1/2 1/2 1 2 1
I N ,O ⎣2⎦ = ⎣1/2 1 1/2⎦ ⎣0⎦ = ⎣3/2⎦ = ⎣2⎦ .
As to the verification,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 6 1 1 −1
A ⎣2⎦ = ⎣2⎦ = 4 ⎣−1⎦ + 4 ⎣ 1⎦ + 2 ⎣ 1⎦ .
3 6 1 −1 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤⎤
1 1 4 1
We find that [A] N ,O ⎣2⎦ = [A] N ,O ⎣0⎦ = ⎣4⎦ = ⎣ A ⎣2⎦⎦ .
3 O 2 2 3 N
In view of Theorem 3.10, two matrices A, B ∈ Fm×n are called equivalent iff there
exist invertible matrices P ∈ Fn×n and Q ∈ Fm×m such that B = Q −1 A P.
Notice that ‘is equivalent to’ is an equivalence relation on Fm×n . Equivalent matri-
ces represent the same matrix (linear transformation) with respect to possibly differ-
ent pairs of bases.
From Theorem 2.8, it follows that ranks of two equivalent matrices are equal.
We can construct a matrix of rank r relatively easily. Let r ≤ min{m, n}. The
matrix Rr ∈ Fm×n whose first r columns are the standard basis vectors e1 , . . . , er of
Fm×1 and all other columns are zero columns has rank r. This matrix, in block form,
looks as follows:
Ir 0
Rr = .
0 0
where
Ir
B = Q −1 ∈ Fm×r , C = Ir 0 P ∈ Fr ×n .
0
[Av]C = P −1 A P [v]C , P = v1 · · · vn .
That is, the coordinate matrix of A is P −1 A P. This leads to similarity of two matrices.
We say that two matrices A, B ∈ Fn×n are similar iff B = P −1 A P for some
invertible matrix P ∈ Fn×n . Observe that in this case, B is the coordinate matrix of
A with respect to the basis that comprises the columns of P.
Example 3.13 Consider the basis N = {[1, −1, ⎡ 1] , [1, 1,⎤−1] , [−1, 1, 1] } for
t t t
1 1 1
R3×1 . To determine the matrix similar to A = ⎣−1 0 1⎦ when the basis has
0 1 0
changed from the standard basis to N , we construct the matrix P by taking the basis
vectors of N as in the following:
⎡ ⎤
1 1 −1
P = ⎣−1 1 1⎦ .
1 −1 1
This verifies the condition [Au] N = B [u] N for the vector u = [1, 2, 3]t .
We emphasize that if B = P −1 A P is a matrix similar to A, then the matrix A
as a linear transformation on Fn×1 with standard basis and the matrix B as a linear
transformation on Fn×1 with the basis consisting of columns of P are the same linear
transformation. Moreover, if C is the basis whose jth element is the jth column of
P, then for each vector v ∈ Fn×1 , [Av]C = P −1 A P [v]C .
Though equivalence is easy to characterize by the rank, similarity is much more
difficult. And we postpone this to a later chapter.
Exercises for Sect. 3.7
1. Let A ∈ Cm×n . Define T : C1×m → C1×n by T (x) = x A for x ∈ C1×m . Show that
T is a linear transformation. Identify T (etj ).
2. In each of the following cases, show that T is a linear transformation. Find the
matrix A so that T (x) = Ax. Determine rank(A). And then, construct the full rank
factorization of A.
3.8 Problems 79
−1 0
u 2 = [−1, 1]t , v1 = [2, 1]t , and v2 = [1, 0]t . Let A = .
0 1
(a) Compute Q = [A] O,O and R = [A] N ,N .
(b) Find the change of basis matrix P = I N ,O .
(c) Compute S = P Q P −1 .
(d) Is it true that R = S? Why?
(e) If S = [si j ], verify that Av1 = s11 v1 + s21 v2 , Av2 = s12 v1 + s22 v2 .
3.8 Problems
The dot product in R3 is used to define the length of a vector and the angle between
two nonzero vectors. In particular, the dot product is used to determine when two
vectors become perpendicular to each other. This notion can be generalized to Fn .
For vectors u, v ∈ F1×n , we define their inner product as
u, v = uv∗ .
x, y = y ∗ x.
|x, y|2
0 ≤ x − αy, x − αy = x, x − αx, y − αy, x + ααx, y = x2 − .
y2
Using these properties, the acute (non-obtuse) angle between any two nonzero
vectors can be defined.
Let x and y be nonzero vectors in Fn . The angle between x and y, denoted by
θ (x, y), is defined by
|x, y|
cos θ (x, y) = .
x y
It follows that if x ⊥ y, then x2 + y2 = x + y2 for all vectors x, y. This
is referred to as Pythagoras law. The converse of Pythagoras law holds in Rn . That
is, for all x, y ∈ Rn , if x + y2 = x2 + y2 then x ⊥ y. But it does not hold in
Cn , in general.
A set of nonzero vectors in Fn is called an orthogonal set in Fn iff each vector in
the set is orthogonal to every other vector in the set. A singleton set with a nonzero
vector in it is assumed to be orthogonal.
Proof Let {v1 , . . . , vm } be an orthogonal set of vectors in Fn . Then they are nonzero
vectors. Assume that α1 v1 + · · · + αm vm = 0. For 1 ≤ j ≤ m,
m
αi vi , v j = α1 v1 + · · · + αm vm , v j = 0, v j = 0.
i=1
A vector v with v = 1 is called a unit vector. An orthogonal set of unit vectors
is called an orthonormal set. For instance, in F3 ,
is an orthogonal set, but not orthonormal; whereas the following set is an orthonormal
set:
√1 , √2 , √3 , √2 , √ −1
14 14 14 5 5
, 0 .
1. Compute (2, 1, 3), (1, 3, 2) and (1, 2i, 3), (1 + i, i, −3i).
2. Is {[1, 2, 3, −1]t , [2, −1, 0, 0]t , [0, 0, 1, 3]t } an orthogonal set in F4×1 ?
3. Show that if x, y ∈ Fn , then x ⊥ y implies x + y2 = x2 + y2 .
4. Show that if x, y ∈ Rn , then x + y2 = x2 + y2 implies x ⊥ y.
5. In C, the inner product is given by x, y = yx. Let x = 1 and y = i be two
vectors in C. Show that x2 + y2 = x + y2 but x, y = 0.
6. Show that the parallelogram law holds in Fn . That is, for all x, y ∈ Fn , we have
x + y2 + x − y2 = 2(x2 + y2 ).
7. If x and y are orthogonal vectors in Fn , then show that x ≤ x + y.
8. Construct an orthonormal set from {[1, 2, 0]t , [2, −1, 0]t , [0, 0, 2]t }.
84 4 Orthogonality
OB u 2 , u 1
= cos θ = .
OA u 2 u 1
−→
The required vector O B is the length of O A times cos θ times the unit vector in the
direction of u 1 . That is,
−→ u 2 , u 1 u 1 u 2 , u 1
O B = u 2 = u1.
u 2 u 1 u 1 u 1 , u 1
−→
Next, we define v2 as the vector u 2 − O B. Our construction says that
u 2 , v1
v1 = u 1 , v2 = u 2 − v1 .
v1 , v1
If more than two linearly independent vectors in Fn are given, we may continue
this process of taking away feet of the perpendiculars drawn from the last vector on all
the previous ones, assuming that the previous ones have already been orthogonalized.
It results in the process given in the following theorem.
4.2 Gram–Schmidt Orthogonalization 85
v1 = u 1
u 2 , v1
v2 = u 2 − v1
v1 , v1
..
.
u k , v1 u k , vk−1
vk = u k − v1 − · · · − vk−1 .
v1 , v1 vk−1 , vk−1
u j+1 , v1 u j+1 , v j
v j+1 = u j+1 − v1 − · · · − vj.
v1 , v1 v j , v j
Clearly, v j+1 ∈ span{u j+1 , v1 , . . . , v j } and u j+1 ∈ span{v1 , . . . , v j+1 }. By the induc-
tion hypothesis, we have span{v1 , . . . , v j } = span{u 1 , . . . , u j }. Thus, any vector
which is a linear combination of u 1 , . . . , u j , u j+1 is a linear combination of
v1 , . . . , v j , u j+1 ; and then it is a linear combination of v1 , . . . , v j , v j+1 . Similarly, any
vector which is a linear combination of v1 , . . . , v j , v j+1 is also a linear combination
of u 1 , . . . , u j , u j+1 . Hence
u j+1 , vi
v j+1 , vi = u j+1 , vi − vi , vi = 0.
vi , vi
v j+1 = α1 v1 + · · · + α j v j .
v1 = (1, 0, 0, 0).
u 2 , v1
v2 = u 2 − v1 = (1, 1, 0, 0) − 1 (1, 0, 0, 0) = (0, 1, 0, 0).
v1 , v1
u 3 , v1 u 3 , v2
v3 = u 3 − v1 − v2
v1 , v1 v2 , v2
= (1, 1, 1, 0) − (1, 0, 0, 0) − (0, 1, 0, 0) = (0, 0, 1, 0).
The vectors (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) are orthogonal; and they form
a basis for the subspace U = span{u 1 , u 2 , u 3 } of F4 . Also, {u 1 , u 2 , u 3 } is linearly
independent, and it is a basis of U.
v1 = [1, 1, 0, 1].
u 2 , v1
v2 = u 2 − v1 = [0, 1, 1, −1] − 0 [1, 1, 0, 1] = [0, 1, 1, −1].
v1 , v1
u 3 , v1 u 3 , v2
v3 = u 3 − v1 − v2
v1 , v1 v2 , v2
4.2 Gram–Schmidt Orthogonalization 87
Observe that in Gram–Schmidt process, at any stage if the computed vector turns
out to be a zero vector, it is to be discarded, and the process carries over to the next
stage.
Exercises for Sect. 4.2
1. Orthogonalize the vectors (1, 1, 1), (1, 0, 1), and (0, 1, 2) using Gram–Schmidt
process.
2. How do we use Gram–Schmidt to compute the rank of a matrix?
3. Construct orthonormal sets that are bases for span{u 1 , u 2 , u 3 }, where u 1 , u 2 , u 3
are the vectors in Examples 4.1–4.2.
4. Show that the cross product u × v of two linearly independent vectors u, v in
R1×3 is orthogonal to both u and v. How to obtain this third vector as u × v by
Gram–Schmidt process?
4.3 QR-Factorization
v1 = u 1 ; w1 = v1 /v1
v2 = u 2 − u 2 , w1 w1 ; w2 = v2 /v2
..
.
vm = u m − u m , w1 w1 − · · · − u m , wm−1 wm−1 ; wm = vm /vm .
v1 = u 1 = (1, 1, 0)
w1 = v1 /v1 = √12 , √12 , 0
v2 = u 2 − u 2 , w1 w1 = (0, 1, 1) − (0, 1, 1) · ( √12 , √12 , 0) ( √12 , √12 , 0)
= (0, 1, 1) − √12 √12 , √12 , 0 = − 21 , 21 , 1
√ √
w2 = v2 /v2 = √23 − 21 , 21 , 1 = − √16 , √16 , √23
v3 = u 3 − u 3 , w1 w1 − u 3 , w2 w2
= (1, 0, 1) − (1, 0, 1) · √12 , √12 , 0) √12 , √12 , 0
√ √
− (1, 0, 1) · − √16 , √16 , √23 − √16 , √16 , √23
= (1, 0, 1) − 21 , 21 , 0 − − 16 , 16 , 13 = 23 , − 23 , 23
√
w3 = v3 /v3 = 23 23 , − 23 , 23 = √13 , − √13 , √13
v4 = u 4 − u 4 , w1 w1 − u 4 , w2 w2 − u 4 , w3 w3 = 0
w4 = 0.
√ 1
The set √1 , √1 , 0 , − √1 , √1 , √2 , √ , − √1 , √1 is the required
2 2 6 6 3 3 3 3
orthonormal basis of U. Notice that U = F3 .
We have discussed two ways of extracting a basis for the span of a finite number
of vectors from Fn . One is the method of elementary row operations and the other is
Gram–Schmidt orthogonalization or orthonormalization. The latter is a superior tool
though computationally difficult. We now see one of its applications in factorizing a
matrix.
4.3 QR-Factorization 89
u 1 = a11 w1
u 2 = a12 w1 + a22 w2
..
.
u k = a1k w1 + · · · + akk wk
..
.
u n = a1n w1 + · · · + akn wk + · · · + ann wn
uk ∈
/ span{u 1 , . . . , u k−1 } = span{w1 , . . . , wk−1 }.
Thus, akk is nonzero for each k. Then, R is an upper triangular invertible matrix.
Construct Q = [w1 · · · wn ]. Since {w1 , . . . , wn } is an orthonormal set, Q ∗ Q =
I. Of course, if all entries of A are real, then so are the entries of Q and R. In that
case, Q t Q = I. Moreover, for 1 ≤ k ≤ n,
t
Q Rek = Q a1k , · · · , akk , 0, · · · , 0 = Q(a1k e1 + · · · + akk ek )
= a1k Qe1 + · · · + akk Qek = a1k w1 + · · · + akk wk = u k .
That is, the kth column of Q R is same as the kth column of A for each k ∈
{1, . . . , n}. Therefore, Q R = A.
For uniqueness of the factorization, suppose that
90 4 Orthogonality
A = Q 1 R1 = Q 2 R2 ,
Q 1 , Q 2 ∈ Fm×n satisfy Q ∗1 Q 1 = Q ∗2 Q 2 = I,
R1 = [ai j ], R2 = [bi j ] ∈ Fn×n are upper triangular, and
akk > 0, bkk > 0 for each k ∈ {1, . . . , n}.
We will see later what happens if the diagonal entries of R1 and of R2 are all
negative. Then
Notice that R1 , R2 , R1∗ and R2∗ are all invertible matrices. Multiplying (R2∗ )−1 on
the left, and (R1 )−1 on the right, we have
(R2∗ )−1 R1∗ R1 (R1 )−1 = (R2∗ )−1 R2∗ R2 (R1 )−1 .
It implies
(R2∗ )−1 R1∗ = R2 R1−1 .
Here, the matrix on the left is a lower triangular matrix and that on the right is an upper
triangular matrix. Therefore, both are diagonal. Comparing the diagonal entries in
the products we have
That is, |aii |2 = |bii |2 . Since aii > 0 and bii > 0, we see that aii = bii for 1 ≤ i ≤ n.
Hence (R2−1 )∗ R1∗ = R2 R1−1 = I. Therefore,
R2 = R1 , Q 2 = A R2−1 = A R1−1 = Q 1 .
Any vector v = (a, b, c) ∈ F3 can be expressed as v = ae1 + be2 + ce3 . Taking inner
product of v with the basis vectors we find that v, e1 = a, v, e2 = b and v, e3 =
c. Then v = v, e1 e1 + v, e2 e2 + v, e3 e3 .
Such an equality holds for any orthonormal basis for a subspace of Fn .
x = α1 v1 + · · · + αm vm .
Let i ∈ {1, . . . , m}. Taking inner product of x with vi , and using the fact that
vi , v j = 0 for j = i, and vi , vi = 1, we have
x, vi = αi vi , vi = αi .
for each v ∈ V, w ⊥ v to w ⊥ V.
When working with column vectors, the projection vector projV (x) can be seen
as a matrix product.
For this, let {v1 , . . . , vm } be an orthonormal basis of a subspace V of Fn×1 .
Let x ∈ Fn×1 . Write z = [c1 , · · · , cm ]t ∈ Fm×1 with c j = x, v j = v∗j x, and P =
[v1 · · · vm ]. Then
⎡ ⎤ ⎡ ∗ ⎤
c1 v1 x
⎢ .. ⎥ ⎢ .. ⎥
m
m
z = ⎣ . ⎦ = ⎣ . ⎦ = P ∗ x, y= x, v j v j = c j v j = Pz = P P ∗ x.
cm vm∗ x j=1 j=1
Notice that P P ∗ x = projV (x) for each x ∈ Fn×1 . Due to this reason, the matrix
P P ∗ ∈ Fn×n is called the projection matrix that projects vectors of Fn×1 onto the
subspace V.
94 4 Orthogonality
Example 4.5 Let V = span{[1, 0, −1]t , [1, 1, 1]t } and let x = [1, 2, 3]t . Compute
the projection matrix that projects vectors of F3×1 onto V, and projV (x).
Since the vectors [1, 0, −1]t and [1, 1, 1]t are orthogonal, an orthonormal basis
for V is given by {v1 , v2 }, where
√ √ √ √ √
v1 = [1/ 2, 0, −1/ 2]t , v2 = [1/ 3, 1/ 3, 1/ 3]t .
From the orthogonality condition, we guess that the length of x − projV (x) is the
smallest among the lengths of all vectors x − v, when v varies over V. If this intuition
goes well, then projV (x) would be closest to x compared to any other vector from
V. Further, we may think of projV (x) as an approximation of x from the subspace
V. We show that our intuition is correct.
Let V be a subspace of Fn and let x ∈ Fn . A vector u ∈ V is called a best approx-
imation of x from V iff x − u ≤ x − v for each v ∈ V.
Theorem 4.7 Let V be a subspace of Fn and let x ∈ Fn . Then projV (x) is the unique
best approximation of x from V.
4.5 Best Approximation and Least Squares Solution 95
x − u = x − y.
Also, the best approximation may be computed by using the projection matrix.
In the second approach, we look for a vector y that satisfies the orthogonality
condition:
x − y ⊥ v for each vector v ∈ V.
x − y ⊥ v j for each j = 1, . . . , m.
Since y ∈ V, we may write y = mj=1 α j v j . Then we determine the scalars α j
m
so that for i = 1, . . . , m, x − j=1 α j v j , vi = 0. That is, we solve the following
linear system:
v1 , v1 α1 + · · · + vm , v1 αm = x, v1
..
.
v1 , vm α1 + · · · + vm , vm αm = x, vm
96 4 Orthogonality
Theorem 4.7 guarantees that this linear system has a unique solution. Further, the
system matrix of this linear system is A = [ai j ], where ai j = v j , vi . Such a matrix
which results by taking the inner products of basis vectors is called a Gram matrix.
Theorem 4.7 implies that a Gram matrix is invertible. Can you prove directly that a
Gram matrix is invertible?
1 √1 1 1 1
P P ∗ = √12 1 1 = 2 1 1
.
1 2
We may use the technique of taking the best approximation for approximating a
solution of a linear system.
Let A ∈ Fm×n and let b ∈ Fm×1 . A vector u ∈ Fn×1 is a called a least squares
solution of the linear system Ax = b iff Au − b ≤ Az − b for all z ∈ Fm×1 .
Theorem 4.7 implies that u ∈ Fn×1 is a least squares solution of Ax = b iff Au is
the best approximation of b from R(A). This best approximation Au can be computed
uniquely from the orthogonality condition Au − b ⊥ R(A). However, the vector u
can be uniquely determined when A is one-one, that is, when the homogeneous
system Ax = 0 has a unique solution. We summarize these facts in the following
theorem.
Proof We prove only (3); others are obvious from the discussion we had. For this,
let u 1 , . . . , u n be the n columns of A. Since the range space R(A) is equal to
span{u 1 , . . . , u n }, by (2), we obtain
4.5 Best Approximation and Least Squares Solution 97
4.6 Problems
u, v
1. For a nonzero vector v and any vector u, show that u − v, v = 0. Then
v2
use Pythagoras theorem to derive Cauchy–Schwartz inequality.
2. Let n > 1. Using a unit vector u ∈ Rn×1 construct infinitely many matrices
A ∈ Rn×n so that A2 = I.
3. Let x and y be linearly independent vectors in Fn×1 . Let A = x y ∗ + yx ∗ . Show
that rank(A) = 2.
4. Fundamental subspaces: Let A ∈ Fm×n . Prove the following:
(a) N (A) = {x ∈ Fn×1 : x ⊥ y for each y ∈ R(A∗ )}.
(b) N (A∗ ) = {y ∈ Fm×1 : y ⊥ z for each z ∈ R(A)}.
(c) R(A) = {y ∈ Fm×1 : y ⊥ z for each z ∈ N (A∗ )}.
(d) R(A∗ ) = {x ∈ Fn×1 : x ⊥ u for each u ∈ N (A)}.
5. Let A ∈ Fm×n . Let B and E be bases for the subspaces N (A) and R(A∗ ) of
Fn×1 , respectively. Show that B ∪ E is a basis of Fn×1 .
6. Find a basis for N (A), where A has rowsas [1, 1, 1, −1] and [1, 1, 3, 5]. Using
this basis, extend the orthonormal set 21 [1, 1, 1, −1]t , 16 [1, 1, 3, 5]t to an
orthonormal set for R4×1 .
7. Let A ∈ Fm×n . Show that N (A∗ A) = N (A).
8. Let x̂ be a least squares solution of the linear system Ax = b for an m × n matrix
A. Show that an n × 1 vector y is a solution of Ax = b iff y = x̂ + z for some
z ∈ N (A).
9. Let {v1 , . . . , vm } be a basis for a subspace V of Fn . Let A = [ai j ] be the Gram
matrix, where ai j = v j , vi . Show that the Gram matrix A satisfies A∗ = A,
and x ∗ Ax > 0 for each nonzero x ∈ Fm . Conclude that the Gram matrix is
invertible.
10. Let A ∈ Fm×n have orthogonal columns and let b ∈ Fm×1 . Suppose that
y = [y1 , · · · , yn ]t is a least squares solution of Ax = b. Show that Ai 2 yi =
b∗ Ai for i = 1, . . . , n.
11. Let A ∈ Fm×n have rank n. Let Q and R be the matrices obtained by Gram-
Schmidt process applied on the columns of A, as in the QR-factorization. If
v = a1 Q 1 + · · · + an Q n is the projection of a vector b ∈ Fn×1 onto R(A),
4.6 Problems 99
Thus, the line {[x, x]t : x ∈ R} never moves. So also the line {[x, −x]t : x ∈ R}.
Observe that
x x x x
A =1 , A = (−1) .
x x −x −x
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 101
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_5
102 5 Eigenvalues and Eigenvectors
The last equation says that either λ = 1 or c = 0. If c = 0, then the second equation
implies that either λ = 1 or b = 0. Then the first equation yields either λ = 1 or
a = 0. Now, c = 0, b = 0, a = 0 is not possible, since it would lead to v = 0. In
any case, λ = 1. Then the equations are simplified to
a + b + c = a, b + c = b, c = c.
It implies that b = c = 0. Then v = [a, 0, 0]t for any nonzero a. That is, such a
vector v is an eigenvector for the only eigenvalue 1.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 a 1
Example 5.2 Let A = ⎣2 2 2⎦ . We see that A ⎣b ⎦ = (a + b + c) ⎣2⎦ .
3 3 3 c 3
Therefore, when a + b + c = 0, we have 0 as an eigenvalue with eigenvector
[a, b, c]. Verify that the vectors [1, 0, −1]t , [0, 1, −1]t are eigenvectors for the eigen-
value 0.
For a = 1, b = 2, c = 3, we see that an eigenvalue is a + b + c = 6 with a
corresponding eigenvector as [1, 2, 3]t .
Does A have eigenvalues other than 0 and 6?
Corresponding to an eigenvalue, there are infinitely many eigenvectors.
Exercises for Sect. 5.1
1. Suppose A ∈ Fn×n , λ ∈ F, and b ∈ Fn×1 are such that (A − λI )x = b has a
unique solution. Show that λ is not an eigenvalue of A.
2. Formulate a converse of the statement in Exercise 1 and prove it.
3. Let A ∈ Fn×n . Show that a scalar λ ∈ F is an eigenvalue of A iff the map A − λI :
Fn×1 → Fn×1 is not one-one.
4. Let A ∈ Fn×n . Show that A is invertible iff 0 is not an eigenvalue of A.
5. Let A ∈ Fn×n . If the entries in each column of A add up to a scalar λ, then find
an eigenvector of A.
The polynomial (−1)n det(A − t I ), which is also equal to det(t I − A), is called
the characteristic polynomial of the matrix A; and it is denoted by χA (t). Each
eigenvalue of A is a zero of the characteristic polynomial of A. Conversely, each
zero of the characteristic polynomial is said to be a complex eigenvalue of A.
If A is a complex matrix of order n, then χA (t) is a polynomial of degree n
in t. The fundamental theorem of algebra states that any polynomial of degree n,
with complex coefficients, can be written as a product of n linear factors each with
complex coefficients. Thus, χA (t) has exactly n, not necessarily distinct, zeros in C.
And these are the eigenvalues (complex eigenvalues) of the matrix A.
If A is a matrix with real entries, some of the zeros of χA (t) may turn out to be
complex numbers with nonzero imaginary parts. Considering A as a linear transfor-
mation on Rn×1 , the scalars are now real numbers. Thus each zero of the characteristic
polynomial may not be an eigenvalue; only the real zeros are.
However, if we regard a real matrix A as a matrix with complex entries, then A
is a linear transformation on Cn×1 . Each complex eigenvalue, that is, a zero of the
characteristic polynomial of A, is an eigenvalue of A.
Due to this obvious advantage, we consider a matrix in Rn×n as one in Cn×n so that
each root of the characteristic polynomial of a matrix is considered an eigenvalue of
the matrix. In this sense, an eigenvalue is taken as a complex eigenvalue, by default.
Observe that when λ is an (a complex) eigenvalue of A ∈ Fn×n , a corresponding
eigenvector x is a vector in Cn×1 , in general.
Example 5.3 Find the eigenvalues and corresponding eigenvectors of the matrix
⎡ ⎤
1 0 0
A=⎣1 1 0 ⎦.
1 1 1
1−t 0 0
Here, χA (t) = (−1)3 det(A − t I ) = − 1 1 − t 0 = (t − 1)3 .
1 1 1−t
The eigenvalues of A are its zeros, that is, 1, 1, 1. To get an eigenvector, we solve
A[a, b, c]t = [a, b, c]t or that
a = a, a + b = b, a + b + c = c.
For λ = i, we have b = ia, −a = ib. Thus, [a, ia]t is an eigenvector for a = 0. For
the eigenvalue −i, the eigenvectors are [a, − ia] for a = 0.
We consider A as a matrix with complex entries. With this convention, the matrix
A has (complex) eigenvalues i and −i.
A polynomial with the coefficient of the highest degree as 1 is called a monic
polynomial. In the characteristic polynomial, the factor (−1)n is multiplied with the
determinant to make the result a monic polynomial. We see that
The multiset of all complex eigenvalues of a matrix A ∈ Fn×n is called the spec-
trum of A; and we denote it by σ (A). If A ∈ Fn×n has the (complex) eigenvalues
λ1 , . . . , λn , counting multiplicities, then the spectrum of A is σ (A) = {λ1 , . . . , λn };
and vice versa. Notice that the spectrum of A ∈ Fn×n always has n elements; though
they may not be distinct. The following theorem lists some important facts about
eigenvalues, using the notion of spectrum.
Theorem 5.2 Let A ∈ Fn×n . Then the following are true.
1. σ (At ) = σ (A).
2. If B is similar to A, then σ (B) = σ (A).
3. If A = [ai j ] is a diagonal or an upper triangular or a lower triangular matrix,
then σ (A) = {a11 , . . . , ann }.
4. If σ (A) = {λ1 , . . . , λn }, then det(A) = λ1 · · · λn and tr(A) = λ1 + · · · + λn .
Proof (1) χAt (t) = det(At − t I ) = det((A − t I )t ) = det(A − t I ) = χA (t).
(2) χP −1 A P (t) = det(P −1 A P − t I ) = det(P −1 (A − t I )P) = det(P −1 ) det(A − t I )
det(P) = det(A − t I ) = χA (t).
(3) In all these cases, χA (t) = det(A − t I ) = (a11 − t) · · · (ann − t).
(4) Let σ (A) = {λ1 , . . . , λn }. Then
Multiply (5.2) with λm+1 and subtract from the last equation to get
Notice that all diagonal entries of a hermitian matrix are real since a ii = aii .
Similarly, each diagonal entry of a skew symmetric matrix must be zero, since aii =
−aii . And each diagonal entry of a skew hermitian matrix must be 0 or purely
imaginary, as a ii = −aii implies 2Re(aii ) = 0.
5.4 Special Types of Matrices 107
A = 21 (A + At ) + 21 (A − At ).
A = 21 (A + A∗ ) + 21 (A − A∗ ).
Example 5.5 For any θ ∈ R, the following are orthogonal matrices of order 2:
cos θ − sin θ cos θ sin θ
O1 := , O2 := .
sin θ cos θ sin θ − cos θ
Let u be the vector in the plane that starts at the origin and ends at the point (a, b).
Writing the point (a, b) as a column vector [a, b]t , we see that
a a cos θ − b sin θ a a cos θ + b sin θ
O1 = , O2 = .
b a sin θ + b cos θ b a sin θ − b cos θ
Thus, O1 [a, b]t is the end-point of the vector obtained by rotating the vector u by
an angle θ. Similarly, O2 [a, b]t is a point obtained by reflecting u along a straight
line that makes an angle θ/2 with the x-axis. Accordingly, O1 is called a rotation by
an angle θ, and O2 is called a reflection along a line making an angle of θ/2 with
the x-axis.
If A = [ai j ] is an orthogonal matrix of order 2, then At A = I implies
2
a11 + a21
2
= 1 = a12
2
+ a22
2
, a11 a12 + a21 a22 = 0.
Then there exist α, β ∈ R such that a11 = cos α, a21 = sin α, a12 = cos β, a22 =
sin β and cos(α − β) = 0. Thus α − β = ±(π/2). It then follows that A is in either
108 5 Eigenvalues and Eigenvectors
Theorem 5.4(1)–(2) show that unitary and orthogonal matrices preserve the inner
product and the norm. Thus they are also called isometries. The
condition A∗ A = I is equivalent to the condition that the columns of A are orthonor-
mal. Similarly, the rows of A are orthonormal iff A A∗ = I.
Theorem 5.4 implies that the determinant of an orthogonal matrix is either 1
or −1. It follows that the product of all eigenvalues, counting multiplicities, of an
orthogonal matrix is ±1.
The determinant of a hermitian matrix is a real number since A = A∗ implies
We prove some interesting facts about the eigenvalues and eigenvectors of these
special types of matrices.
λy ∗ x = y ∗ λx = y ∗ Ax = y ∗ A∗ x = (Ay)∗ x = (μ y)∗ x = μ y ∗ x = μ y ∗ x.
Ax = λx, Ay = λy.
Since v∗ v = 0, λ = −λ. That is, 2Re(λ) = 0. This shows that λ is purely imaginary
or zero.
When A is real skew symmetric, we take transpose instead of adjoint
everywhere in the above proof.
(4) Suppose A is unitary, i.e., A∗ A = I. Let v be an eigenvector corresponding to
the eigenvalue λ. Now, Av = λv implies v∗ A∗ = (λv)∗ = λv∗ . Then
v∗ v = v∗ I v = v∗ A∗ Av = λλv∗ v = |λ|2 v∗ v.
Since v∗ v = 0, |λ| = 1.
Replacing A∗ with At in the above proof yields the same conclusion when A is
an orthogonal matrix.
Similarly, a real skew symmetric matrix can have purely imaginary eigenvalues.
The only real eigenvalue of a real skew symmetric matrix is 0. Also, an orthogonal
matrix can have complex eigenvalues so that each eigenvalue need not be ±1. For
instance, the matrix
0 −1
1 0
has eigenvalues ±i. However, all real eigenvalues of an orthogonal matrix are ±1;
and any eigenvalue of an orthogonal matrix is of the form eiθ for θ ∈ R.
Exercises for Sect. 5.4
1. Let A ∈ Fn×n . Show that for all x, y ∈ Fn×1 , Ax, Ay = x, y iff for all x ∈
Fn×1 , Ax = x.
2. Construct a 3 × 3 hermitian matrix with no zero entries whose eigenvalues are
1, 2 and 3.
3. Construct a 2 × 2 real skew symmetric matrix whose eigenvalues are purely imag-
inary.
4. Show that if an invertible matrix is real symmetric, then so is its inverse.
5. Show that if an invertible matrix is hermitian, then so is its inverse.
6. Construct an orthogonal 2 × 2 matrix whose determinant is 1.
7. Construct an orthogonal 2 × 2 matrix whose determinant is −1.
5.5 Problems
√
2. Let A = [ai j ] ∈ R2×2 , where a12 a21 > 0. Write B = diag a21 /a12 , 1 . Let
C = B −1 AB. What do you conclude about the eigenvalues and eigenvectors of
C; and those of A?
3. Show that if λ is a nonzero eigenvalue of an n × n matrix A, then N (A − λI ) ⊆
R(A).
4. An n × n matrix A is said to be idempotent if A2 = A. Show that the only
possible eigenvalues of an idempotent matrix are 0 or 1.
5. An n × n matrix A is said to be nilpotent if Ak = 0 for some natural number k.
Show that 0 is the only eigenvalue of a nilpotent matrix.
6. Let A ∈ Fn×n have an eigenvalue λ with an associated eigenvector x. Suppose
A∗ has an eigenvalue μ associated with y. If λ = μ, then show that x ⊥ y.
7. Let A, B ∈ Fn×n . Show that
I A 0 0 I −A AB 0
= .
0 I B BA 0 I B 0
The matrix C is called the companion matrix of the polynomial p(t). Show the
following:
(a) χC (t) = p(t).
(b) If p(λ) = 0 for some λ ∈ F, then λ is an eigenvalue of the matrix C with an
associated eigenvector [λn−1 , λn−2 , · · · , λ, 1]t .
9. Let A ∈ Fn×n and let B = I − 2 A + A2 . Show the following:
(a) If 1 is an eigenvalue of A, then B is not invertible.
(b) If v is an eigenvector of A, then v is an eigenvector of B. Are the corresponding
eigenvalues equal?
10. For which scalars α, are the n × n matrices A and A + α I similar?
11. Show that there do not exist n × n matrices A and B with AB − B A = I.
12. Let A, B ∈ Fn×n . Show that if 1 is not an eigenvalue of A, then the matrix
equation AX + B = X has a unique solution in Fn×n .
13. Let A and B be hermitian matrices. Show that AB is hermitian iff AB = B A.
14. Let A and B be hermitian matrices. Determine whether the matrices A +
B, AB A, AB + B A, and AB − B A are hermitian or not.
112 5 Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors can be used to bring a matrix to nice forms using
similarity transformations. A very general result in this direction is Schur’s unitary
triangularization. It says that using a suitable similarity transformation, we can rep-
resent a square matrix by an upper triangular matrix. Thus, the diagonal entries of
the upper triangular matrix must be the eigenvalues of the given matrix.
Theorem 6.1 (Schur Triangularization) Let A ∈ Cn×n . Then there exists a unitary
matrix P ∈ Cn×n such that P ∗ A P is upper triangular. Moreover, if A ∈ Rn×n and
all eigenvalues of A are real, then P can be chosen to be an orthogonal matrix.
where e1 ∈ C(m+1)×1 has first component as 1 and all other components 0. Then
R ∗ A R can be written in the following block form:
∗ λ x
R AR = ,
0 C
will be helpful later. In general, the eigenvalues can occur on the diagonal of the
Schur form in any prescribed order, depending on our choice of eigenvalue in each
step.
⎡ ⎤
2 1 0
Example 6.1 Consider the matrix A = ⎣ 2 3 0⎦ for Schur triangularization.
−1 −1 1
We find that χA (t) = (t − 1)2 (t − 4). All eigenvalues of A are real; thus, there
exists an orthogonal matrix P such that P t A P is upper triangular. To determine such
a matrix P, we take one of the eigenvalues, say 1. An associated eigenvector of norm
1 is v = [0, 0, 1]t . We extend {v} to an orthonormal basis for C3×1 . For convenience,
we take the orthonormal basis as
for C2×1 . Then we construct the matrix S by taking these basis vectors as its columns,
that is, √ √
1/ 2 1/ 2
S = −1 √ √ .
/ 2 1/ 2
1 −1
We find that S C S =
t
, which is an upper triangular matrix. Then
0 4
⎡ ⎤⎡ ⎤ ⎡ √ √ ⎤
0 1 0 1 0 0 0 1/ 2 1/ 2
1 0 √ √ √ √
P=R = ⎣0 0 1⎦ ⎣0 1/ 2 1/ 2⎦ = ⎣0 −1/ 2 1/ 2⎦ .
0 S √
−1/ 2
√
1/ 2
1 0 0 0 1 0 0
118 6 Canonical Forms
⎡ √ ⎤
1 0 − 2
Consequently, P t A P = ⎣0 1 −1 ⎦ .
0 0 4
Analogously, a real square matrix having only real eigenvalues is also orthogonally
similar to a lower triangular matrix. We remark that the lower triangular form of a
matrix need not be the transpose or the adjoint of its upper triangular form.
Moreover, neither the unitary matrix P nor the upper triangular matrix P ∗ A P in
Schur triangularization is unique. That is, there can be unitary matrices P and Q such
that both P ∗ A P and Q ∗ AQ are upper triangular, and P = Q, P ∗ A P = Q ∗ AQ. The
non-uniqueness stems from the choice of eigenvalues, their associated eigenvectors,
and in extending those to an orthonormal basis. For instance, in Example 6.1, if you
extend {[0, 0, 1]t } to the orthonormal basis {[0, 0, 1]t , [0, 1, 0]t , [1, 0, 0]t }, then you
end up with (Verify.)
⎡ √ √ ⎤ ⎡ √ ⎤
0 −1/ 2 1/ 2 1 0 − 2
√ √
P = ⎣0 1/ 2 1/ 2⎦ , P t A P = ⎣0 1 1 ⎦.
1 0 0 0 0 4
(A − d1 I )(A − d2 I ) · · · (A − dn I ).
⎡ ⎤
d1
For example, let A = ⎣ 0 d2 ⎦ , where stands for any entry possibly nonzero.
0 0 d3
We see that
⎡ ⎤
0
A − d1 I = ⎣0 ⎦
0 0
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0 0
(A − d1 I )(A − d2 I ) = ⎣0 ⎦ ⎣0 0 ⎦ = ⎣0 0 ⎦
0 0 0 0 0 0
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0 0 0 0
(A − d1 I )(A − d2 I )(A − d3 I ) = ⎣0 0 ⎦ ⎣0 ⎦ = ⎣0 0 0⎦ .
0 0 0 0 0 0 0 0
Theorem 6.2 Let A ∈ Fn×n be an upper triangular matrix with diagonal entries
d1 , . . . , dn , in that order. Let 1 ≤ k ≤ n. Then the first k columns of the product
(A − d1 I ) · · · (A − dk I ) are zero columns.
Proof For k = 1, we see that the first column of A − d1 I is a zero column. It means
(A − d1 I )e1 = 0, where e1 , . . . , en are the standard basis vectors for Fn×1 . Assume
that the result is true for k = m < n. That is,
(A − d1 I ) · · · (A − dm I )e1 = 0, . . . , (A − d1 I ) · · · (A − dm I )em = 0.
Notice that
(A − d1 I ) · · · (A − dm I )(A − dm+1 I )e j
= (A − dm+1 I )(A − d1 I ) · · · (A − dm I )e j = 0.
120 6 Canonical Forms
Next, (A − dm+1 I )em+1 is the (m + 1)th column of A − dm+1 I, which has all
zero entries beyond the first m entries. That is, there are scalars α1 , . . . , αm such that
(A − dm+1 I )em+1 = α1 e1 + · · · + αm em . Then
χA (A) = P χA (U ) P ∗ .
1
A−1 = − a1 I + a2 A + · · · + an−1 An−2 + An−1 .
a0
0 0 0 0 0 0
That is, r (t) is a polynomial of degree less than k that annihilates A. Hence, r (t)
is the zero polynomial. Therefore, q(t) divides p(t).
Theorem 6.3 implies that
the minimal polynomial of a matrix divides its characteristic polynomial.
This fact is sometimes mentioned as the Cayley–Hamilton theorem.
Exercises for Sect. 6.2
1. Compute A−1 and A10 where A is the matrix of Example 6.2.
2. Use the formula (A − t I ) adj(A − t I ) = det(A − t I )I to give another proof of
Cayley–Hamilton theorem. [Hint: For matrices B0 , . . . , Bn−1 , write adj(A −
t I ) = B0 + B1 t + · · · + Bn−1 t n−1 . ]
3. Let A ∈ Fn×n . Let p(t) be a monic polynomial of degree n. If p(A) = 0, then
does it follow that p(t) is the characteristic polynomial of A?
4. Basing on Theorem 6.4, describe an algorithm for computing the minimal poly-
nomial of a matrix.
6.3 Diagonalizability
Schur triangularization implies that each square matrix with complex entries is simi-
lar to an upper triangular matrix. Moreover, a square matrix with real entries is similar
to an upper triangular real matrix provided all zeros of its characteristic polynomial
6.3 Diagonalizability 123
are real. The upper triangular matrix similar to a given square matrix takes a better
form when the matrix is hermitian.
Theorem 6.5 (Spectral theorem for hermitian matrices) Each hermitian matrix is
unitarily similar to a real diagonal matrix. And, each real symmetric matrix is orthog-
onally similar to a real diagonal matrix.
D ∗ = P ∗ A∗ P = P ∗ A P = D.
To see how the eigenvalues and eigenvectors are involved in the diagonalization
process, let A, P, D ∈ Fn×n be matrices, where
P = [v1 · · · vn ], D = diag(λ1 , . . . , λn ).
A Pe j = Av j = λ j v j = λ j Pe j = P (λ j e j ) = P De j for 1 ≤ j ≤ n.
⎡ ⎤
−1 0 0
We see that P −1 = P t and P −1 A P = P t A P = ⎣ 0 2 0 ⎦.
0 0 2
In fact, the spectral theorem holds for a bigger class of matrices. A matrix
A ∈ Cn×n is called a normal matrix iff A∗ A = A A∗ . All unitary matrices and all
hermitian matrices are normal matrices. All diagonal matrices are normal matrices.
In addition, a week converse to the last statement holds.
Theorem 6.6 Each upper triangular normal matrix is diagonal.
Proof Let U ∈ Cn×n be an upper triangular matrix. If n = 1, then clearly U is a
diagonal matrix. Suppose that each upper triangular normal matrix of order k is
diagonal. Let U be an upper triangular normal matrix of order k + 1. Write U in a
partitioned form as in the following:
R u
U=
0 a
where R ∈ Ck×k , u ∈ Ck×1 , 0 is the zero row vector in C1×k , and a ∈ C. Since U
is normal,
6.3 Diagonalizability 125
∗
R∗ R R∗u R R + uu ∗ ā u
U ∗U = = UU ∗
= .
u∗ R ∗
u u + |a| 2
a u∗ |a|2
Using this result on upper triangular normal matrices, we can generalize the
spectral theorem to normal matrices.
Theorem 6.7 (Spectral theorem for normal matrices) A square matrix is unitarily
diagonalizable iff it is a normal matrix.
Proof Let A ∈ Cn×n . Let A be unitarily diagonalizable. Then there exist a unitary
matrix P and a matrix D = diag(λ1 , . . . , λn ) such that A = P D P ∗ . Then A∗ A =
P D ∗ D P ∗ and A A∗ = P D D ∗ P ∗ . However,
There can be non-normal matrices which are diagonalizable. For example, with
⎡ ⎤ ⎡ ⎤
1 0 0 0 −1 0
A = ⎣4 3 −2⎦ , P = ⎣1 5 2⎦
2 1 0 1 3 1
Theorem 6.8 A matrix A ∈ Fn×n is diagonalizable iff there exists a basis of Fn×1
consisting of eigenvectors of A.
Observation 6.1 implies that A P = P D. Since the columns of P form a basis for
Fn×1 , P is invertible. Therefore, A is diagonalizable.
for some matrices C ∈ C×(n−) and D ∈ C(n−)×(n−) . Since A and M are similar,
they have the same characteristic polynomial χ(t) = (λ − t) p(t) for some polyno-
mial p(t) of degree n − .
But the zero λ of χ(t) is repeated k times. That is, χ(t) = (λ − t)k q(t) for some
polynomial q(t) of which (λ − t) is not a factor.
Notice that λ − t may or may not be a factor of p(t). In any case, ≤ k.
In Example 6.4, we see that the geometric multiplicity of each (the only) eigen-
value of the matrix A is equal to its algebraic multiplicity; both are 2. So, A is
diagonalizable; in fact, it is already diagonal. But the geometric multiplicity of the
(only) eigenvalue 1 of B is 1 while the algebraic multiplicity is 2. Therefore, B is
not diagonalizable.
Exercises for Sect. 6.3
1. Diagonalize the given matrix, and then compute its fifth power:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 7 −2 0 7 − 5 15
(a) ⎣1 0 1⎦ (b) ⎣−2 6 −2⎦ (c) ⎣ 6 − 4 15 ⎦
1 1 0 0 −2 5 0 0 1
2. Show that the following matrices are diagonalized by matrices in R3×3 .
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3/2 −1/2 0 3 −1/2 −3/2 2 −1 0
(a) ⎣ −1/2 3/2 0 ⎦ (b) ⎣ 1 3/2 3/2 ⎦ (c) ⎣ −1 2 0⎦
1/2 −1/2 1 −1 1/2 5/2 2 2 3
3. If possible, diagonalize the following matrices:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 0 0 1 −2 1 1 2 3
(a) ⎣ 2 1 0 ⎦ (b) ⎣ 0 1 1 ⎦ (c) ⎣ 2 4 6 ⎦
1 2 −1 0 3 −1 −1 −2 −3
4. Are the following matrices diagonalizable?
⎡ ⎤
⎡ ⎤ 2 1 0 0
1 −10 0
2 3 ⎢0 2 0 0⎥
(a) (b) ⎣ −1 3 1⎦ (c) ⎢
⎣0
⎥
6 −1 0 2 0⎦
−1 0 4
0 0 0 5
5. Check whether each of the following matrices is diagonalizable. If diagonalizable,
find a basis of eigenvectors for C3×1 :
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 1 0 1
(a) ⎣ 1 −1 1 ⎦ (b) ⎣ 0 1 1 ⎦ (c) ⎣ 1 1 0 ⎦
1 1 −1 0 0 1 0 1 1
6. Find orthogonal or unitary diagonalizing matrices for the following:
⎡ ⎤
−4 −2 2
2 1 1 3+i
(a) (b) (c) ⎣ −2 −1 1 ⎦
1 2 3−i 4
2 1 −1
that B 2 = A,
7. Determine A5 and a matrix B such ⎡ ⎤ where
9 −5 3
2 1
(a) A = (b) A = ⎣ 0 4 3 ⎦
−2 −1
0 0 1
8. If A is a normal matrix, then show that A∗ , I + A and A2 are also normal.
6.4 Jordan Form 129
Notice that the matrix similar to A with respect to such a basis would have λ j s on
the diagonal, and possibly nonzero entries on the super diagonal (entries above the
diagonal); all other entries being 0.
We will show that it is possible, by proving that there exists an invertible matrix
P such that
P −1 A P = diag(J1 , J2 , . . . , Jk ),
for some si . Each matrix J˜j (λi ) of order j here has the form
⎡ ⎤
λi 1
⎢ λi 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
J˜j (λi ) = ⎢ . . ⎥.
⎢ ⎥
⎣ 1⎦
λi
The missing entries are all 0. Such a matrix J˜j (λi ) is called a Jordan block with
diagonal entries λi . The order of the Jordan block is its order as a square matrix.
Any matrix which is in the block diagonal form diag(J1 , J2 , . . . , Jk ) is said to be in
Jordan form.
In writing Jordan blocks and Jordan forms, we do not show the zero entries for
improving legibility. For instance, the following are possible Jordan form matrices
of order 3 with all diagonal entries as 1:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1
⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 1 1⎦ ⎣ 1 1⎦
1 1 1 1
130 6 Canonical Forms
It has three Jordan blocks for the eigenvalue 1 of which two are of order 1 and
one of order 2; and it has one block of order 3 for the eigenvalue 2.
The eigenvalue 1 has geometric multiplicity 3, algebraic multiplicity 4, and the
eigenvalue 2 has geometric multiplicity 1 and algebraic multiplicity 3.
Theorem 6.12 (Jordan form) Each matrix A ∈ Cn×n is similar to a matrix in Jordan
form J, where the diagonal entries are the eigenvalues of A. For 1 ≤ k ≤ n, if m k (λ)
is the number of Jordan blocks of order k with diagonal entry λ, in J, then
Proof First, we will show the existence of a Jordan form, and then we will come
back to the formula m k , which will show the uniqueness of a Jordan form up to a
permutation of Jordan blocks.
Due to Schur triangularization, we assume that A is an upper triangular matrix,
where the eigenvalues of A occur on the diagonal, and equal eigenvalues occur
together. If λ1 , . . . , λk are the distinct eigenvalues of A, then our assumption means
that A is an upper triangular matrix with diagonal entries, read from top left to bottom
right, appear as
λ 1 , . . . , λ 1 ; λ2 , . . . , λ 2 ; . . . ; λk , . . . , λ k .
Let n i denote the number of times λi occurs in this list. First, we show that by
way of a similarity transformation, A can be brought to the form
diag(A1 , A2 , . . . , Ak ),
where each Ai is an upper triangular matrix of size n i × n i and each diagonal entry
of Ai is λi . Our requirement is shown schematically as follows, where each such
element marked x that is not inside the blocks Ai needs to be zeroed-out by a similarity
transformation.
⎡ ⎤ ⎡ ⎤
A1 x A1 0
⎢ ⎥ ⎢ ⎥
⎢ A2 ⎥ ⎢ A2 ⎥
⎢ .. ⎥ −→ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥
⎣ ⎦ ⎣ ⎦
Ak Ak
If such an x occurs as the (r, s)th entry in A, then r < s. Moreover, the corre-
sponding diagonal entries arr and ass are eigenvalues of A occurring in different
blocks Ai and A j . Thus arr = ass . Further, all entries below the diagonals of Ai and
of A j are 0. We use a combination similarity to obtain
−x
E −α [r, s] A E α [r, s] with α = .
arr − ass
This similarity transformation subtracts α times the sth row from the r th row and
then adds α times the r th column to the sth column. Since r < s, it changes the
entries of A in the r th row to the right of the sth column, and the entries in the sth
column above the r th row. Thus, the upper triangular nature of the matrix does not
change. Further, it replaces the (r, s)th entry x with
132 6 Canonical Forms
−x
ar s + α(arr − ass ) = x + (arr − ass ) = 0.
arr − ass
We use a sequence of such similarity transformations starting from the last row of
Ak−1 with smallest column index and ending in the first row with largest column
index. Observe that an entry beyond the blocks, which was 0 previously can become
nonzero after a single such similarity transformation. Such an entry will eventually be
zeroed-out. Finally, each position which is not inside any of the k blocks A1 , . . . , Ak
contains only 0. On completion of this stage, we end up with a matrix
diag(A1 , A2 , . . . , Ak ).
In the second stage, we focus on bringing each block Ai to the Jordan form. For
notational convenience, write λi as a. If n i = 1, then such an Ai is already in Jordan
form. We use induction on the order n i of Ai . Lay out the induction hypothesis that
each such matrix of order m − 1 has a Jordan form. Suppose Ai has order m. Look
at Ai in the following partitioned form:
B u
Ai = ,
0 a
Then
⎡ ⎤
a ∗ b1
−1 ⎢ a ∗ b2 ⎥
⎢ .. .. ⎥
Q 0 Q 0 Q −1 B Q Q −1 u ⎢ . . ⎥
Ai = =⎢ ⎥.
0 1 0 1 0 a ⎢ ∗ bm−2 ⎥
⎣ a b
⎦
m−1
a
Call the above matrix as C. In the matrix C, the sequence of ∗’s on the super-
diagonal, read from top left to right bottom, comprise a block of 1s followed by a
0, and then a block of 1s followed by a 0, and so on. The number of 1s depends on
the sizes of B1 , B2 , etc. That is, when B1 is over, and B2 starts, we have a 0. Also,
we have shown Q −1 u as [b1 · · · bm−1 ]t . Our goal is to zero-out all b j s except bm−1
which may be made a 0 or 1.
6.4 Jordan Form 133
In the next sub-stage, call it the third stage, we apply similarity transformations
to zero-out (all or except one of) the entries b1 , . . . , bm−2 . In any row of C, the entry
above the diagonal (the ∗ there) is either 0 or 1. The ∗ is a 0 at the last row of each
block B j . We leave all such b’s right now; they are to be tackled separately. So,
suppose in the r th row, br = 0 and the (r, r + 1)th entry (the ∗ above the diagonal
entry) is a 1. We wish to zero-out each such br which is in the (r, m) position. For
this purpose, we use a combination similarity to transform C to
Observe that this matrix is obtained from C by adding br times the last row to
the (r + 1)th row, and then subtracting br times the (r + 1)th column from the last
column. Its net result is replacing the (r, m)th entry by 0, and keeping all other
entries intact. Continuing this process of applying a suitable combination similarity
transformation, each nonzero bi with a corresponding 1 on the super-diagonal on
the same row is reduced to 0. We then obtain a matrix, where all entries in the last
column of C have been zeroed-out, without touching the entries at the last row of
any of the blocks B j . Write such entries as c1 , . . . , c . Thus, at the end of third stage,
Ai has been brought to the following from by similarity transformations:
⎡ ⎤
B1
⎢ c1 ⎥
⎢ B2 ⎥
⎢ c2 ⎥
⎢ .. ⎥
F := ⎢
⎢ . ⎥
⎥
⎢ ⎥
⎢ ⎥
⎣ B
c ⎦
a
In G, the earlier cq at (s, m)th position is now 1. Let B p be any block other than
Bq with c p = 0 in the r th row. Our goal in this sub-stage, call it the fifth stage, is to
zero-out c p . We use two combination similarity transformations as shown below:
1 at (s, m)th position. If this happens to be at the last row, then we have obtained a
Jordan form. Otherwise, in this sub-stage (call it the seventh stage), we move this 1
to the (s, s + 1)th position by the following sequence of permutation similarities:
This transformation exchanges the rows and columns beyond the sth so that the 1 in
(s, m)th position moves to (s, s + 1)th position making up a block; and other entries
remain as they were earlier.
Here ends the proof by induction that each block Ai can be brought to a Jordan form
by similarity transformations. From a similarity transformation for Ai , a similarity
transformation can be constructed for the block diagonal matrix
à := diag(A1 , A2 , . . . , Ak )
by putting identity matrices of suitable order and the similarity transformation for Ai
in a block form. As these transformations do not affect any other rows and columns
of Ã, a sequence of such transformations brings à to its Jordan form, proving the
existence part in the theorem.
Toward the formula for m k , let λ be an eigenvalue of A, and let 1 ≤ k ≤ n.
Observe that A − λI is similar to J − λI. Thus,
where C is the Jordan block of order r with diagonal entries as 0, and D is the matrix
of order n − r in Jordan form consisting of other blocks of J − λI. Then, for any j,
j
C 0
(J − λI ) j = .
0 Dj
To obtain a Jordan form of a given matrix, we may use the construction of similarity
transformations as used in the proof of Theorem 6.12, or we may use the formula
for m k as given there. We illustrate these methods in the following examples.
136 6 Canonical Forms
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 2 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
Example 6.6 Let A = ⎢
⎢ 2 0 0 0 0 ⎥⎥.
⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3
This is an upper triangular matrix. Following the proof of Theorem 6.12, we first
zero-out the circled entries, starting from the entry on the third row. Here, the row
index is r = 3, the column index is s = 7, the eigenvalues are arr = 2, ass = 3,
and the entry to be zeroed-out is x = 2. Thus, α = −2/(2 − 3) = 2. We use an
appropriate combination similarity to obtain
That is, in A, we replace r ow(3) with r ow(3) − 2 × r ow(7) and then replace col(7)
with col(7) + 2 × col(3). It leads to
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 0 −2 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
M1 = ⎢ ⎢ 2 0 0 0 0 ⎥ ⎥.
⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3
Notice that the similarity transformation brought in a new nonzero entry such
as −2 in (3, 8) position. But its column index has increased. Looking at the matrix
afresh, we must zero-out this entry first. The suitable combination similarity yields
M2 = E 2 [3, 8] M1 E −2 [3, 8]
which replaces r ow(3) with r ow(3) + 2 × r ow(8) and then replaces col(8) with
col(8) − 2 × col(3). Verify that it zeroes-out the entry −2 but introduces 2 at (3, 9)
position. Once more, we use a combination similarity to obtain
M3 = E −2 [3, 9] M2 E 2 [3, 9]
replacing r ow(3) with r ow(3) − 2 × r ow(9) and then replacing col(9) with col(9) +
2 × col(3). Now,
6.4 Jordan Form 137
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
M3 = ⎢
⎢ 2 0 0 0 0 ⎥.
⎥
⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3
Now, the matrix M6 is in block diagonal form. We focus on each of the blocks,
though we will be working with the whole matrix. We consider the block corre-
sponding to the eigenvalue 2 first. Since this step is inductive we scan this block
from the top left corner. The 2 × 2 principal sub-matrix of this block is already in
Jordan form. The 3 × 3 principal sub-matrix is also in Jordan form. We see that the
principal sub-matrix of size 4 × 4 and 5 × 5 is also in Jordan form, but the 6 × 6
sub-matrix, which is the block itself is not in Jordan form.
We wish to bring the sixth column to its proper shape. Recall that our strategy
is to zero out all those entries on the sixth column which are opposite to a 1 on the
super-diagonal of this block. There is only one such entry, which is encircled in M6
above.
The row index of this entry is r = 1, its column index is m = 6, and the entry
itself is br = 1. We use a combination similarity to obtain
138 6 Canonical Forms
⎡ ⎤
2 1 0 0 0 0
⎢ 2 0 0 0 5 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 ⎥
⎢ ⎥
⎢
M7 = E 1 [2, 6] M6 E −1 [2, 6] = ⎢ 2 0 ⎥.
⎥
⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3
Next, among the nonzero entries 5 and 2 at the positions (2, 6) and (4, 6), we
wish to zero-out the 5 and keep 2 as the row index of 2 is higher. First, we use a
dilation similarity to make this entry 1 as in the following:
It replaces r ow(4) with 1/2 times itself and then replaces col(4) with 2 times
itself, thus making (4, 6)th entry 1 and keeping all other entries intact. Next, we
zero-out the 5 on (2, 4) position by using the two combination similarities. Here,
c p = 5, r = 2, s = 4; thus
⎡ ⎤
2 1 0 0 0 0
⎢ 2 0 0 0 0 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 0 1 ⎥
⎢ ⎥
M9 = E −5 [1, 3] E −5 [2, 4] M8 E 5 [2, 4] E 5 [1, 3] = ⎢
⎢ 2 0 ⎥.
⎥
⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3
Example 6.7 Consider the matrix A of Example 6.6. Here, we compute the number
m k of Jordan blocks of size k corresponding to each eigenvalue. For this purpose, we
require the ranks of the matrices (A − λI )k for successive k, and for each eigenvalue
λ of A. We see that A has two eigenvalues 2 and 3.
For the eigenvalue 2,
m 1 (2) = 9 − 2 × 6 + 4 = 1, m 2 (2) = 6 − 2 × 4 + 3 = 1,
m 3 (2) = 4 − 2 × 3 + 3 = 1, m 3+k (2) = 3 − 2 × 3 + 3 = 0.
m 1 (3) = 9 − 2 × 8 + 7 = 0, m 2 (3) = 8 − 2 × 7 + 6 = 0,
m 3 (3) = 7 − 2 × 6 + 6 = 1, m 3+k (3) = 6 − 2 × 6 + 6 = 0.
Therefore, in the Jordan form of A, there is one Jordan block of size 1, one of
size 2 and one of size 3 with eigenvalue 2, and one block of size 3 with eigenvalue
3. From this information, we see that the Jordan form of A is uniquely determined
up to any rearrangement of the blocks. Check that M11 as obtained in Example 6.6
is one such Jordan form of A.
If the next Jordan block in J has diagonal entries as μ (which may or may not be
equal to λ), then we have Avk+1 = μvk+1 , Avk+2 = vk+1 + μvk+2 , . . . , and so on.
The list of vectors v1 , . . . , vk above is called a Jordan string that starts with v1
and ends with vk . The number k is called the length of the Jordan string. In such a
Jordan string, we see that
v1 ∈ N (A − λI ), v2 ∈ N (A − λI )2 , . . . , vk ∈ N (A − λI )k .
also implies that two dissimilar matrices will have different Jordan canonical forms.
Therefore, Jordan form characterizes similarity of matrices.
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Suppose J = P −1 A P. Now, J t = P t At (P −1 )t = P t At (P t )−1 . That is,
At is similar to J t . Thus, it is enough to show that J t is similar to J. First, let us
see it for a single Jordan block. For a Jordan block Jλ , consider the matrix Q of the
same order as in the following:
⎡ ⎤ ⎡ ⎤
λ 1 1
⎢ λ 1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢
1
⎥
⎢ .. .. ⎥
Jλ = ⎢ . . ⎥, Q=⎢
⎢ ... ⎥.
⎥
⎢ ⎥ ⎣ ⎦
⎣ 1⎦ 1
λ 1
In the matrix Q, the entries on the anti-diagonal are all 1 and all other entries are
0. We see that Q 2 = I. Thus, Q −1 = Q. Further,
Q −1 Jλ Q = Q Jλ Q = (Jλ )t .
Therefore, each Jordan block is similar to its transpose. Now, construct a matrix
R by putting matrices such as Q as its blocks matching the orders of each Jordan
block in J. Then it follows that R −1 J R = J t .
Jordan form guarantees that one can always choose m linearly independent gen-
eralized eigenvectors corresponding to the eigenvalue λ, where m is the algebraic
multiplicity of λ. Moreover, the following is guaranteed:
If the linear system (A − λI )k x = 0 has r < m number of linearly independent solutions,
then (A − λI )k+1 = 0 has at least r + 1 number of linearly independent solutions.
This result is more useful in computing the exponential of a square matrix rather
than using the Jordan form explicitly. See Sect. 7.6 for details.
Exercises for Sect. 6.4
1. Determine the Jordan forms of the following matrices:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 0 0 1 −2 −1 −3
(a) ⎣1 0 0⎦ (b) ⎣0 1 0⎦ (c) ⎣ 4 3 3⎦
2 1 0 1 0 0 −2 1 −1
2. Determine the matrix P ∈ C3×3 such that P −1 A P is in Jordan form, where A is
the matrix in Exercise 1(c).
3. Let A be a 7 × 7 matrix with characteristic polynomial (t − 2)4 (3 − t)3 . It is
known that in the Jordan form of A, the largest blocks for both the eigenvalues
are of order 2. Show that there are only two possible Jordan forms for A; and
determine those Jordan forms.
4. Let A be a 5 × 5 matrix whose first and second rows are, respectively,
[0, 1, 1, 0, 1] and [0, 0, 1, 1, 1]; and all other rows are zero rows. What is the
Jordan form of A?
142 6 Canonical Forms
Given an m × n matrix A with complex entries, there are two hermitian matrices
that can be constructed naturally from it, namely A∗ A and A A∗ . We wish to study
the eigenvalues and eigenvectors of these matrices and their relations to certain
parameters associated with A. We will see that these concerns yield a factorization
of A.
All eigenvalues of the hermitian matrix A∗ A ∈ Cn×n are real. If λ ∈ R is such an
eigenvalue with an associated eigenvector v ∈ Cn×1 , then A∗ Av = λv implies that
Since v > 0, we see that λ ≥ 0. The eigenvalues of A∗ A can thus be arranged
in a decreasing list
λ1 ≥ λ2 ≥ · · · ≥ λr > 0 = λr +1 = · · · = λn
for some r with 0 ≤ r ≤ n. Notice that λ1 , . . . , λr are all positive and the rest are all
equal to 0. In the following we relate this r with rank(A). Of course, we could have
considered A A∗ instead of A∗ A.
Let A ∈ Cm×n . Let λ1 ≥ · · · ≥ λn ≥ 0 be the n eigenvalues of A∗ A. The non-
negative square roots of these real numbers are called the singular values of A.
Conventionally, we denote the singular values of A by si . The eigenvalues of A∗ A
are then denoted by s12 ≥ · · · ≥ sn2 ≥ 0.
Therefore, rank(A∗ A) is equal to the rank of the above diagonal matrix; and that is
equal to r. This completes the proof.
From Theorem 6.13, it follows that A and A∗ have the same r number of positive
singular values, where r = rank(A) = rank(A∗ ). Further, A has n − r number of
zero singular values, whereas A∗ has m − r number of zero singular values. In
addition, if A ∈ Cn×n is hermitian and has eigenvalues λ1 , . . . , λn , then its singular
values are |λ1 |, . . . , |λn |.
Analogous to the factorization of A∗ A, we have one for A itself.
Then D ∗ = D, and
S −1 0 S2 0 S −1 0 I 0
(AQ D)∗ (AQ D) = D ∗ (Q ∗ A∗ AQ)D = = r .
0 In−r 0 0 0 In−r 0 0
(2) (a) A A∗ P = P Q ∗ Q ∗ P ∗ P = ∗
P. The matrix ∗
is a diagonal
matrix with diagonal entries as s1 , . . . , sr , 0, . . . , 0. Therefore, A A∗ u i = si2 u i for
2 2
(4) (a) For 1 ≤ i ≤ r, u i = (si )−1 Avi implies that u i ∈ R(A). The vectors u 1 , . . . , u r
are orthonormal and dim(R(A)) = r. Therefore, {u 1 , . . . , u r } is an orthonormal basis
of R(A).
(b) As in (a), {v1 , . . . , vr } is an orthonormal basis of R(A∗ ).
(c) Let r < j ≤ n. Now, A∗ Av j = 0 implies v∗j A∗ Av j = 0. So, Av j 2 = 0; or that
Av j = 0. Then v j ∈ N (A). But dim(N (A)) = n − r. Therefore, the n − r orthonor-
mal vectors vr +1 , . . . , vn form an orthonormal basis for N (A).
(d) As in (c), {u r +1 , . . . , u m } is an orthonormal basis for N (A∗ ).
Theorem 6.14 (2) and (4) imply that the columns of P are eigenvectors of A A∗ ,
and the columns of Q are eigenvectors of A∗ A. Accordingly, the columns of P
are called the left singular vectors of A; and the columns of Q are called the right
singular vectors of A. Notice that computing both sets of left and right singular
vectors independently will not serve the purpose since they may not satisfy the
equations in Theorem 6.14 (3).
1 0 1 0
Example 6.8 To determine the SVD of the matrix A = , we compute
0 1 0 1
⎡ ⎤
1 0 1 0
2 0 ⎢0 1 0 1⎥
A A∗ = , A A=⎢
∗
⎣1
⎥.
0 2 0 1 0⎦
0 1 0 1
s12 = 2, u 1 = e1 ; s22 = 2, u 2 = e2 .
Here, u 1 and u 2 are the left singular vectors. The corresponding right singular vectors
are:
146 6 Canonical Forms
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 0
1 ∗ ⎢0 1⎥ ⎢0⎥ 1 ∗ ⎢ 1⎥
v1 = A u 1 = √2 ⎢
1
⎣1
⎥ 1 0 =
⎦
√ ⎢ ⎥ , v2 =
1
2 ⎣1⎦
A u2 = √ ⎢ ⎥.
1
2 ⎣0⎦
s1 0 s1
0 1 0 1
a + c = 0 = b + d.
In the product P Q ∗ , there are possibly many zero rows that do not contribute
to the end result. Thus some simplifications can be done in the SVD. Let A ∈ Cm×n
where m ≤ n. Suppose A = P Q ∗ is an SVD of A. Let the ith column of Q be
denoted by vi ∈ Cn×1 . Write
P1 = P, Q 1 = v1 · · · vm ∈ Cm×n , 1 = diag s1 , . . . , sr , 0, . . . , 0 ∈ Cm×m .
Notice that P1 is unitary and the m columns of Q 1 are orthonormal. In block form,
we have
Q = Q1 Q3 , = 1 0 , Q 3 = vm+1 · · · vn .
Then
Q ∗1
A=P Q ∗ = P1 0 = P1 ∗
1 Q1 for m ≤ n. (6.1)
1
Q ∗3
6.5 Singular Value Decomposition 147
Similarly, when m ≥ n, we may curtail P accordingly. That is, suppose the ith
column of P is denoted by u i ∈ Cm×1 . Write
P2 = u 1 · · · u n ∈ Cm×n , 2 = diag s1 , . . . , sr , 0, . . . , 0 ∈ Cn×n , Q 2 = Q.
Then
∗
A=P Q = P2 P3 2
Q ∗2 = P2 ∗
2 Q2 for m ≥ n. (6.2)
0
The two forms of SVD in (6.1)–(6.2), one for m ≤ n and the other for m ≥ n
are called the thin SVD of A. Of course, for m = n, both the thin SVDs coincide
with the SVD. For a unified approach to the thin SVDs, take k = min{m, n}. Then a
matrix A ∈ Cm×n of rank r can be written as the product
A=P Q∗
where P ∈ Cm×k and Q ∈ Cn×k have orthonormal columns, and ∈ Ck×k is the
diagonal matrix diag(s1 , . . . , sr , 0, . . . , 0) with s1 , . . . , sr being the positive singular
values of A.
It is possible to simplify the thin SVDs further by deleting the zero rows. Observe
that in the product P ∗ AQ, the first r columns of P and the first r columns of Q
produce S; the other columns of P and of Q give the zero blocks. Thus taking
P̃ = u 1 · · · u r , Q̃ = v1 · · · vr ,
A = P̃ S Q̃ ∗ ,
A = B Q̃ ∗ = P̃ C,
where both B ∈ Cm×r and C ∈ Cr ×n are of rank r. It shows that each m × n matrix
of rank r can be written as a product of an m × r matrix of rank r and an r × n
matrix, also of rank r. We recognize it as the full rank factorization of A.
148 6 Canonical Forms
Example 6.9 Obtain SVD, tight SVD, and a full rank factorization of
⎡ ⎤
2 −1
A = ⎣−2 1⎦ .
4 −2
24 −12 √
Here, A∗ A = . It has eigenvalues 30 and 0. Thus s1 = 30. Notice
−12 6
that A A∗ is a 3 × 3 matrix with eigenvalues 30, 0 and 0. We see that r = rank(A) =
the number of positive singular values of A = 1.
For the eigenvalue 30, we solve the equation A∗ A[a, b]t = 30[a, b]t , that is,
For the tight SVD, we construct P̃ with its r columns as the the first r columns
of P, Q̃ with its r columns as the first r columns of Q, and S as the r × r block
consisting of first r singular values of A as the diagonal entries. With r = 1, we thus
have the tight SVD as
6.5 Singular Value Decomposition 149
⎡ ⎤ ⎡ √ ⎤
2 −1 −1/ 6
√ −2/√5 ∗
⎣−2 1⎦ = P̃ S Q̃ ∗ = ⎣ 1/√6⎦ 30 √ .
√ 1/ 5
4 −2 −2/ 6
In the tight SVD, using associativity of matrix product, we get the full rank
factorizations as
⎡ ⎤ ⎡ √ ⎤ ⎡ √ ⎤
2 −1 −√5 −2 √ ∗ −1/ 6 √ ∗
−2/ 6
⎣−2 / 5 ⎣ 1/√6⎦
1⎦ = ⎣ 5 ⎦ √ = √ .
√ 1/ 5 √ 6
4 −2 −2 5 −2/ 6
Each matrix u i vi∗ is of rank 1. This means that if we know the first r singular values
of A and we know their corresponding left and right singular vectors, we know A
completely. This is particularly useful when A is a very large matrix of low rank. No
wonder, SVD is used in image processing, various compression algorithms, and in
principal components analysis.
Let A = P Q ∗ be an SVD of A ∈ Cm×n , and let x ∈ Cn×1 be a unit vector. Let
s1 ≥ · · · ≥ sr be the positive singular values of A. Write y = Q ∗ x = [α1 , . . . , αn ]t .
Then
n
y2 = |αi |2 = Q ∗ x2 = x ∗ Q Q ∗ x = x ∗ x = x2 = 1.
i=1
Ax2 = P Q ∗ x2 = x ∗ Q ∗ P ∗ P Q ∗ x = x ∗ Q 2
Q∗ x
r
r
= y∗ 2
y= s 2j |α j |2 ≤ s12 |α j |2 ≤ s12 .
j=1 j=1
That is, the first singular value s1 gives the maximum magnification that a vector
experiences under the linear transformation A. Similarly, from above workout it
follows that
r
r
Ax =
2
s j |α j | ≥ sr
2 2 2
|α j |2 ≥ sr2 .
j=1 j=1
That is, the minimum positive magnification is given by the smallest positive
singular value sr .
Notice that if x is a unit vector, then sr ≤ Ax ≤ s1 .
Square matrices behave like complex numbers in many ways. One such example
is a powerful representation of square matrices using a stretch and a rotation. This
mimics the polar representation of a complex number as z = r eiθ , where r is a non-
negative real number representing the stretch; and eiθ is a rotation. Similarly, a square
matrix can be written as a product of a positive semi-definite matrix, representing
the stretch, and a unitary matrix representing the rotation. We slightly generalize it
to any m × n matrix.
A hermitian matrix P ∈ Fn×n is called positive semidefinite iff x ∗ P x ≥ 0 for
each x ∈ Fn×1 . We use such a matrix in the following matrix factorization.
6.6 Polar Decomposition 151
Theorem 6.15 (Polar decomposition) Let A ∈ Cm×n . Then there exist positive semi-
definite matrices P ∈ Cm×m , Q ∈ Cn×n , and a matrix U ∈ Cm×n such that
A = PU = U Q,
P2 = B B∗ B B∗ = B E∗ E B ∗ = A A∗ ,
∗ ∗ ∗
Q 2
= E E E E =E B B E ∗ = A∗ A.
x∗ Px = x∗ B B ∗ x = x ∗ B D ∗ D B ∗ x = D B ∗ x2 ≥ 0.
its first k × k block, where k = min{m, n}. Thus, the polar decomposition of A may
be constructed directly from the SVD. It is as follows:
If A ∈ Cm×n has SVD as A = B D E ∗ , then A = PU = U Q, where
m = n: U = B E ∗, P = B D B∗, Q = E D E ∗.
m < n: U = B E 1∗ , P = B D1 B ∗ , Q = E 1 D1 E 1∗ .
m > n: U = B1 E ∗ , P = B1 D2 B1∗ , Q = E D2 E ∗ .
Here, E 1 is constructed from E by taking its first m columns; D1 is constructed
from D by taking its first m columns; B1 is constructed from B by taking its first n
columns; and D2 is constructed from D by taking its first n rows.
⎡ ⎤
2 −1
Example 6.10 Consider the matrix A = ⎣−2 1⎦ of Example 6.9. We had
4 −2
obtained its SVD as A = B D E ∗ , where
⎡ √ √ √ ⎤ ⎡√ ⎤
−1/ 6 1/ 2 1/ 3 30 0 √ √
⎣ 1/√6 √ √
−1/ 3 ⎦ ,
−2/ 5 1/ 5
B= 1/ 2 D=⎣ 0 0⎦ , E= √
1/ 5
√
2/ 5
.
√ √
−2/ 6 0 1/ 3 0 0
Here, A ∈ C3×2 . Thus, Theorem 6.15 (3) is applicable; see the discussion following
the proof of the theorem. We construct the matrices B1 by taking first two columns
of B, and D2 by taking first two rows of D, as in the following:
⎡ √ √ ⎤
−1/ 6 1/ 2 √
√ √ 30 0
B1 = ⎣ 1/ 6 1/ 2⎦ , D2 = .
√ 0 0
−2/ 6 0
Then
√ ⎤
⎡ ⎡ √ √ ⎤
−1 √3 2 + √3 −1 + 2√3
−2 1
U = B1 E ∗ = √16 ⎣ 1 3⎦ √15 = √130 ⎣−2 + 3 1 + 2 3⎦ ,
1 2
−2 0 4 −2
⎡ ⎤ ⎡ ⎤
−1 0 1 −1 2
√ −1 1 −2 √
P = B1 D2 B1∗ = 5 ⎣ 1 0⎦ √16 √ √ = √56 ⎣−1 1 −2⎦ ,
3 3 0
−2 0 2 −2 4
√
√ −2 0 −2 1 4 −2
Q = E D2 E ∗ = 6 √1 = √65 .
1 0 5 1 2 −2 1
As expected we find that
⎡ ⎤ ⎡ √ √ ⎤ ⎡ ⎤
√ 1 −1 2 2 + √3 −1 + 2√3 2 −1
PU = √5 ⎣−1 1 − 2⎦ √1 ⎣−2 + 3 1 + 2 3⎦ = ⎣−2 1⎦ = A.
6 30
2 −2 4 4 −2 4 −2
⎡ √ √ ⎤ ⎡ ⎤
2 + √3 −1 + 2√3 √ 2 −1
4 −2
U Q = √130 ⎣−2 + 3 1 + 2 3⎦ √65 = ⎣−2 1⎦ = A.
−2 1
4 −2 4 −2
6.6 Polar Decomposition 153
6.7 Problems
6. Prove that if a normal matrix has only real eigenvalues, then it is hermitian.
Conclude that if a real normal matrix has only real eigenvalues, then it is real
symmetric.
7. Let A ∈ Fn×n have distinct eigenvalues λ1 , . . . , λk . For 1 ≤ j ≤ k, let the lin-
i
early independent eigenvectors associated with λ j be v1j , . . . , v jj . Prove that the
set {v11 , . . . , v1i1 , . . . , vk1 , . . . , vkik } is linearly independent. [Hint: See the proof of
Theorem 5.3.]
8. Let A = [ai j ] ∈ F3×3 be such that a31 = 0 and a32 = 0. Show that each eigen-
value of A has geometric multiplicity 1.
9. Suppose A ∈ F4×4 has an eigenvalue λ with algebraic multiplicity 3, and
rank(A − λI ) = 1. Is A diagonalizable?
10. Show that there exists only one n × n diagonalizable matrix with an eigenvalue
λ of algebraic multiplicity n.
11. Show that a nonzero nilpotent matrix is never diagonalizable.
[Hint: A = 0 but Am = 0 for some m ≥ 2.]
12. Let A ∈ Fn×n and let P −1 A P be a diagonal matrix. Show that the columns of
P that are eigenvectors associated with nonzero eigenvalues of A form a basis
for R(A).
13. Let A be a diagonalizable matrix. Show that the number of nonzero eigenvalues
of A is equal to rank(A).
14. Construct non-diagonalizable matrices A and B satisfying
(a) rank(A) is equal to the number of nonzero eigenvalues of A;
(b) rank(B) is not equal to the number of nonzero eigenvalues of B.
15. Let x, y ∈ Fn×1 and let A = x y ∗ . Show the following:
(a) A has an eigenvalue y ∗ x with an associated eigenvector x.
(b) 0 is an eigenvalue of A with geometric multiplicity at least n − 1.
(c) If y ∗ x = 0, then A is diagonalizable.
16. Using the Jordan form of a matrix show that a matrix A is diagonalizable iff
for each eigenvalue of A, its geometric multiplicity is equal to its algebraic
multiplicity.
17. Let A ∈ Cn×n have an eigenvalue λ with algebraic multiplicity m. Prove that
null((A − λI )m ) = m.
18. Let λ be an eigenvalue of a matrix A ∈ Cn×n having algebraic multiplicity m.
Prove that for each k ∈ N, if null((A − λI )k ) < m, then null((A − λI ))k <
null((A − λI )k+1 .
[Hint: Show that N (A − λI )i ⊆ N (A − λI )i+1 . Then use Exercise 17.]
19. Let λ be an eigenvalue of a matrix A and let J be the Jordan form of A. Prove
that the number of Jordan blocks with diagonal entry λ in J is the geometric
multiplicity of λ.
20. Let A be a hermitian n × n matrix with eigenvalues λ1 , . . . , λn . Show
n that there
exist an orthonormal set {x1 , . . . , xn } in Fn×1 such that x ∗ Ax = i=1 λi |x ∗ xi |2
for each x ∈ F .
n×1
6.7 Problems 155
(A A† )∗ = A A† , (A† A)∗ = A† A, A A† A = A, A† A A† = A† .
(b) There exists a unique matrix A† ∈ Fn×m satisfying the four equations men-
tioned in (a). The matrix A† is called the generalized inverse of A.
(c) For any b ∈ Fm×1 , A† b is the least squares solution of Ax = b.
25. Show that if s is a singular value of a matrix A, then there exists a nonzero
vector x such that Ax = sx.
26. Let A = P Q t be the SVD of a real n × n matrix. Let u i be the ith column of
P, and let vi be the ith column of Q. Define the matrix B and the vectors xi , yi
as follows:
0 At v −vi
B= , xi = i , yi = for 1 ≤ i ≤ n.
A 0 ui ui
Show that xi and yi are eigenvectors of B. How are the eigenvalues of B related
to the singular values of A?
27. Derive the polar decomposition from the SVD. Also, derive singular value
decomposition from the polar decomposition.
28. A positive definite matrix is a hermitian matrix such that for each x = 0, x ∗ Ax >
0. Show that a hermitian matrix is positive definite iff all its eigenvalues are
positive.
156 6 Canonical Forms
29. Show that the square of a real symmetric invertible matrix is positive definite.
30. Show that if A is positive definite, then so is A−1 . Give an example of a 2 × 2
invertible matrix which is not positive definite.
31. Show that A∗ A is positive semi-definite for any A ∈ Fm×n . Give an example of
a matrix A where A∗ A is not positive definite.
32. Show that if Q is unitary and A is positive definite, then Q AQ ∗ is positive
definite.
33. For a matrix A ∈ Fn×n , the principal submatrices are obtained by deleting its
last r rows and last r columns for r = 0, 1, . . . , n − 1. Show that all principal
submatrices of a positive definite matrix
⎡ are positive
⎤ definite. Further, verify that
1 1 −3
all principal submatrices of A = ⎣ 1 1 −3 ⎦ have non-negative determi-
−3 −3 5
nants but A is not positive semi-definite.
34. Let A be a real symmetric matrix. Show that the following are equivalent:
(a) A is positive definite.
(b) All principal submatrices of A have positive determinant.
(c) A can be reduced to an upper triangular form using only elementary row
operations of Type3, where all pivots are positive.
(d) A = U t U, where U is upper triangular with positive diagonal entries.
(e) A = B t B for some invertible matrix B.
35. Let A be a real symmetric positive definite n × n matrix. Show the following:
(a) All diagonal entries of A are positive.
(b) For any invertible n × n matrix P, P t A P is positive definite.
(c) There exists an n × n orthogonal matrix Q such that A = Q t Q.
(d) There exist unique n × n matrices U and D where U is upper triangular
with all diagonal entries 1, and D is a diagonal matrix with positive entries
on the diagonal such that A = U t DU.
(e) Cholesky factorization : There exists a unique upper triangular matrix with
positive diagonal entries such that A = U t U.
Chapter 7
Norms of Matrices
7.1 Norms
Recall that the norm of a vector is the non-negative square root of the inner product of
a vector with itself. Norms give an idea on the length of a vector. We wish to generalize
on this theme so that we may be able to measure the length of a vector without
resorting to an inner product. We keep the essential properties that are commonly
associated with the length.
Let V be a subspace of Fn . A norm on V is a function from V to R which we
denote by · , satisfying the following properties:
1. For each v ∈ V, v ≥ 0.
2. For each v ∈ V, v = 0 iff v = 0.
3. For each v ∈ V and for each α ∈ F, αv = |α| v.
4. For all u, v ∈ V, u + v ≤ u + v.
Once a norm is defined on V, we call it a normed linear space. Though norms can
be defined on any vector space, we require only subspaces of Fn . In what follows, a
finite dimensional normed linear space V will mean a subspace of some Fn in which
a norm · has been defined.
Recall that Property (4) of a norm is called the triangle inequality. As we had
seen earlier, in R2 ,
1/2
(a, b) = |a|2 + |b|2 for a, b ∈ R
defines a norm. This norm comes from the usual inner product on R2 . Some of the
useful norms on Fn are discussed in the following example.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 157
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_7
158 7 Norms of Matrices
Consider the ∞-norm on F2 . With x = (1, 0), y = (0, 1), we see that
(1, 0) + (0, 1)2∞ + (1, 0) − (0, 1)2∞ = (max{1, 1})2 + (max{1, 1})2 = 2.
2 (1, 0)2∞ + (0, 1)2∞ ) = 2 (max{1, 0})2 + (max{0, 1})2 = 4.
This violates the parallelogram law. Therefore, the ∞-norm does not come from
an inner product. Similarly, it is easy to check that the 1-norm does not come from
an inner product.
The normed linear space Fn with the Euclidean norm is different from Fn with
the ∞-norm.
Notice that a norm behaves like the absolute value function in R or C. With this
analogy, we see that the reverse triangle inequality holds for norms also. It is as
follows. Let x, y ∈ V, a normed linear space. Then,
This inequality becomes helpful in showing that the norm is a continuous func-
tional. Recall that a functional on a subspace V of Fn is a function that maps vectors
in V to the scalars in F.
Any functional f : V → R is continuous at v ∈ V iff for each ε > 0, there exists
a δ > 0 such that for each x ∈ V, if x − v < δ then | f (x) − f (v)| < ε.
7.1 Norms 159
Proof Let Sb = {x ∈ V : xb = 1} be the unit sphere in V with respect to the norm
· b . Since · a is a continuous function, it attains a maximum on Sb . So, let
α = max xa : x ∈ Sb .
Whenever the conclusion of Theorem 7.1 holds for two norms · a and · b , we
say that these two norms are equivalent. We thus see that on any subspace of Fn , any
two norms are equivalent. We will use the equivalence of norms later for defining a
160 7 Norms of Matrices
particular type of norms for matrices. We remark that on infinite dimensional normed
linear spaces, any two norms need not be equivalent.
Exercises for Sect. 7.1
1. Show that the 1-norm on Fn does not come from an inner product.
2. Let p ∈ N. Show that the p-norm is indeed a norm.
3. Let v ∈ Fn . Show the following inequalities:
√
(a) v∞ ≤ v2 ≤ v1 (b) v1 ≤ n v2 ≤ n v∞
Norms provide a way to quantify the vectors. It is easy to verify that the addition of
m × n matrices and multiplying a matrix with a scalar satisfy the properties required
of a vector space. Thus, Fm×n is a vector space. We can view Fm×n as a normed linear
space by providing a norm on it.
Example 7.2 The following functions · : Fm×n → R define norms on Fm×n . Let
A = [ai j ] ∈ Fm×n .
1. Ac = max |ai j | : 1 ≤ i ≤ m, 1 ≤ j ≤ n .
It is called
m the
Cartesian norm on matrices.
n
2. At = i=1 j=1 |ai j |.
It is called
m n norm2 on
the taxicab 1/2
matrices.
3. A F = i=1 |a
j=1 i j | = (tr(A∗ A))1/2 .
It is called the Frobenius norm on matrices.
Notice that the names of the norms in Example 7.2 follow the pattern in
Example 7.1. However, the notation uses the subscripts c, t, F instead of ∞, 1, 2.
The reason is that we are reserving the latter notation for some other matrix norms,
which we will discuss soon.
For matrices, it will be especially useful to have such a norm which satisfies
Av ≤ A v. It may quite well happen that an arbitrary vector norm and an
arbitrary matrix norm may not satisfy this property. Thus, given a vector norm, we
require to define a corresponding matrix norm so that such a property is satisfied. In
order to do that, we will first prove a result.
Theorem 7.2 Let A ∈ Fm×n . Let · n and · m be norms on Fn×1 and Fm×1 ,
Avm
respectively. Then, : v ∈ Fn×1 , v = 0 is a bounded subset of R.
vn
Proof Clearly, 0 is a lower bound for the given set. We need to show that the given
set has an upper bound.
Let {e1 , .
. . , en } be the standard basis of Fn×1 . For each i, Aei m is a real number.
n
Write α = i=1 Aei m .
7.2 Matrix Norms 161
vn ∈ F , v = 0. We have unique scalars β1 ,. . . , βn not all zero such that
n×1
Let
v = i=1 βi ei . Then, v∞ = max |βi | : 1 ≤ i ≤ n . And,
n n n
Avm = A βi ei = βi Aei = |βi | Aei m
m m
i=1 i=1 i=1
n
≤ v∞ Aei m = αv∞ .
i=1
Consider the norms · ∞ and · n on Fn×1 . Due to Theorem 7.1, there exists
a positive constant γ such that v∞ ≤ γ vn ; the constant γ does not depend on
particular vector v. Then, it follows that Avm ≤ αγ vn . That is, for each nonzero
vector v ∈ Fn×1 ,
Avm
≤ αγ .
vn
Therefore, the given set is bounded above by αγ . Also, the set is bounded below
by 0.
Avm
Am,n = lub : v ∈ Fn×1 , v = 0 for A ∈ Fm×n
vn
Av
A = lub : v ∈ Fn×1 , v = 0 = lub Ax : x ∈ Fn×1 , x = 1 .
v
The induced norms on Fn×n satisfy the desired properties with respect to the
product of a vector with a matrix and also that of a matrix with another.
Proof To keep the notation simple, let us write all the norms involved as · ; the
subscripts may be supplied appropriately.
If v = 0, then Av = 0 = v. If v = 0, then
Av Av
≤ lub : v ∈ Fk×1 , v = 0 = A.
v v
ABx
AB = lub : x ∈ Fn×1 , x = 0
x
ABx Bx
= lub : x ∈ Fn×1 , x = 0, Bx = 0
Bx x
ABx Bx
≤ lub : x ∈ Fn×1 , Bx = 0 lub : x ∈ Fn×1 , x = 0
Bx x
= A B.
Next, if Bx = 0 for some x = 0, then ABx = 0. Thus, in the first line of the
above calculation, restricting to the set {x : Bx = 0} will not change the least upper
bound. Further, if Bx = 0 for each x, then B = 0 = AB.
A matrix norm that satisfies the property AB ≤ A B for all matrices for
which the product AB is well defined is called a sub-multiplicative norm. Thus, the
induced norm on matrices is sub-multiplicative.
Then, · ∞ is a norm on Fm×n . It is called the maximum absolute row sum norm,
or the row sum norm, for short.
The row sum norm on matrices is induced by the ∞-norm on vectors. To see this,
suppose that
· ∞ is the ∞-norm on the spaces Fn×1 and Fm×1 ,
· is the norm on matrices induced by the vector norm · ∞ ,
· ∞ is the row sum norm on matrices, as defined above,
A = [ai j ] ∈ Fm×n , and v = (b1 , . . . , bn )t ∈ Fn×1 , v = 0.
Then (Write down in full and multiply.)
7.2 Matrix Norms 163
n n
Av∞ = max a1 j b j , . . . , am j b j
j=1 j=1
n n
≤ max a1 j |b j |, . . . , am j |b j |
j=1 j=1
n n
≤ max a1 j , . . . , am j v∞
j=1 j=1
≤ A∞ v∞ .
Av∞
That is, for each nonzero vector v ∈ Fn×1 , ≤ A∞ . Then,
v∞
Av∞
A = lub : v ∈ Fn×1 , v = 0 ≤ A∞ . (7.1)
v∞
Au∞
A ≥ = Au∞ = A∞ .
u∞
n n
b j A j
Av1 = bj Aj ≤ 1
1
j=1 j=1
n
n
= |b j | A j 1 ≤ max A j 1 : 1 ≤ j ≤ n |b j |
j=1 j=1
= max A j 1 : 1 ≤ j ≤ n v1 = A1 v1 .
Av1
≤ max A j 1 : 1 ≤ j ≤ n = A1 .
v1
Aek 1
A ≥ = Ak 1 = A1 .
ek 1
Example 7.5 Let A ∈ Fm×n . Let · 2 denote the matrix norm on Fm×n induced by
the Euclidean norm or the 2-norm · 2 on vectors. Then,
Av22 v ∗ A∗ Av
A22 = lub : v ∈ Fn×1
, v = 0 = lub : v ∈ Fn×1 , v = 0 .
v22 v∗v
n n n
v∗v = α i α j vi∗ v j = |αi |2 .
i=1 j=1 i=1
n n n n n
v ∗ A∗ Av = α i vi∗ s 2j α j v j = α i α j s 2j v ∗j v j = |αi |2 si2 .
i=1 j=1 i=1 j=1 i=1
∗ ∗
n n
v A Av i=1 |αi | si i=1 |αi | (s1 − si )
2 2 2 2 2
s1 −
2
= s 2
− n = n ≥ 0.
v∗v 1
i=1 |αi |
2
i=1 |αi |
2
7.2 Matrix Norms 165
v1∗ A∗ Av1 v ∗ s 2 v1
s12 − ∗ = 1 ∗1 = 0.
v1 v1 v1 v1
Therefore,
v ∗ A∗ Av
s12 = lub : v ∈ Fn×1 , v = 0 .
v∗v
It shows that A2 = s1 is the required induced norm, where s1 is the largest
singular value of A. This induced norm is called the 2-norm and also the spectral
norm on matrices.
v ∗ Av
In general, the quotient ρA (v) = ∗ is called the Rayleigh quotient of the
v v
matrix A with the nonzero vector v. The Rayleigh quotients ρA (v j ) for eigenvectors v j
give the corresponding eigenvalues of A. It comes of help in computing an eigenvalue
if the associated eigenvector is known. It can be shown that the Rayleigh quotient
of a hermitian matrix with any nonzero vector is a real number. Moreover, if a
hermitian matrix A has eigenvalues λ1 ≥ · · · ≥ λn , then λ1 ≥ ρA (v) ≥ λn for any
nonzero vector v.
We see that the induced norms on Fm×n with respect to the norms · ∞ , · 1 ,
and · 2 on both Fn×1 and Fm×1 are the row sum norm · ∞ , the column sum
norm · 1 , and the spectral norm · 2 , respectively.
I (v)
If · is any induced norm on Fn×n , then = 1 says that I = 1.
v
For n > 1, the taxicab norm I t = n; so, it is not an induced norm.
The Cartesian norm is not sub-multiplicative. For instance, take A and B as
the n × n matrix with each of its entries equal to 1. Then, Ac = Bc = 1 and
ABc = n. Thus, it is not an induced norm for n > 1. √
If n > 1, then the Frobenius norm of the identity is I F = n > 1. Thus, the
Frobenius norm is not induced by any vector norm. However, it satisfies the sub-
multiplicative property. For, Cauchy–Swartz inequality implies that
n n n 2 n n n n
AB2F = aik bk j ≤ |aik | 2
|bk j |2
i=1 j=1 k=1 i=1 j=1 k=1 k=1
n n n n
= |aik |2 |bk j |2 = A2F B2F .
i=1 k=1 j=1 k=1
The spectral norm and the Frobenius norms are mostly used in applications since
both of them satisfy the sub-multiplicative property. The other norms are sometimes
used for deriving easy estimates due to their computational simplicity.
166 7 Norms of Matrices
Av
lub : v ∈ Fn×1 , v = 0 = lub Ax : x ∈ Fn×1 , x = 1 .
v
√ √ √
x0 = 1, x1 = 2, x2 = 1+ 2, x3 = 1+ 1 + 2, . . . .
Those seem to decrease. So, we conjecture that this iteration can be used to
approximate the root of the equation that lies between 1 and 2. Our intuition relies
√ {xn } converges to a real number x, then the limit x
on the fact that if the sequence
will satisfy the equation x = 1 + x.
Moreover, we require the successive approximants to come closer to the root, or
at least, the difference between the successive approximants decreases to 0. This
requirement
√ may put some restrictions on the function we use for iteration, such as
f (x) = 1 + x above.
Let S be a nonempty subset of a normed linear space V. Let f : S → S be any
function. We say that v ∈ S is a fixed point of f iff f (v) = v. The function f :
S → S is called a contraction mapping or a contraction iff there exists a positive
number c < 1 such that
1
| f (x) − f (y)| ≤ max | f (x)| : x ∈ S |x − y| ≤ √ |x − y|.
2 3
Therefore, f is a contraction.
2. Let V = R, and S = R also. Define f : S → R by f (x) = x 2 − 1. Its fixed point
a satisfies a = a 2 − 1 or a 2 − a − 1 = 0.
For x = 2, y = 3, |x − y| = 1, | f (x) − f (y)| = |22 − 1 − (32 − 1)| = 5. Thus,
Hence, f is a contraction.
4. With S = V = Rn×1 and f (v) = v defined on S, we see that each vector v ∈ Rn×1
is a fixed point of f.
For any norm on V, we have f (u) − f (v) = u − v. Therefore, f is not a
contraction.
Here, x0 ∈ S is chosen initially, and we say that the iteration function is f. This
defines a sequence {xn }∞ n=0 in S. We require this sequence to converge to a point in
S. In this regard, we quote some relevant notions and facts from analysis.
Let S be a subset of a normed linear space V. We say that a sequence {yn }∞ n=0 of
vectors yn is in S iff yn ∈ S for each n.
A sequence {yn }∞n=0 in S is said to converge to a vector y ∈ V iff for each ε > 0,
there exists a natural number N such that if m > N , then ym − y < ε.
In such a case, the vector y ∈ V is called a limit of the (convergent)
sequence. If for all convergent sequences the corresponding limit vector happens
to be in S, then S is said to be a closed subset of V.
Further, a sequence {yn }∞
n=0 in S is called a Cauchy sequence iff for each ε > 0,
there exists a natural number N such that if m>N and k > 0, then ym+k − ym <ε.
In case, V is a finite dimensional normed linear space, each Cauchy
sequence in S is convergent; however, the limit vector may not be in S. For our
purpose, we then require S to be a closed subset of a finite dimensional normed
linear space V so that each Cauchy sequence in S will have its limit vector in S.
With this little background from analysis, we essentially show that the fixed-
point iteration with a contraction map defines a Cauchy sequence. But we rephrase
it keeping its applications in mind.
Then,
1 − ck
= cm x1 − x0
1−c
cm
≤ x1 − x0 . (7.2)
1−c
As 0 < c < 1 implies that 0 < ck < 1, we have 0 < 1 − ck < 1; thus, the last
inequality holds.
cm
Suppose ε > 0. Since lim = 0, we have a natural number N such that for
n→∞ 1 − c
m
c
each m > N , < ε. That is,
1−c
Therefore, {xn }∞
n=0 is a Cauchy sequence in S. Since S is a closed subset of a finite
dimensional normed linear space, this sequence converges, and the limit vector, say
u, is in S.
Observe that since f (x) − f (y) < c x − y, the function f is continuous.
Thus, taking limit of both the sides in the fixed-point iteration xn+1 = f (xn ), we see
that u = f (u). Therefore, u is a fixed point of f.
For uniqueness of such a fixed point, suppose u and v are two fixed points of f.
Then,
u = f (u), v = f (v).
This implies
(1 − c) x − xm−1 ≤ xm − xm−1 .
This error estimate is called a posteriori since it gives information about the error
in approximating x with xm only after xm has been computed.
We may consider f (x) as a vector function, say, f : Rn×1 → Rn×1 . That is,
∂ fi
f (x) = [ai j ] ∈ Rn×n with ai j = for i, j = 1, . . . , n.
∂x j
f (x n )h n = − f (x n ).
7.3 Contraction Mapping 171
Proof First, we show that both A and C are invertible. On the contrary, if at least
one of A or C is not invertible, then there exists a nonzero vector x ∈ Cn×1 such that
C Ax = 0. Then,
If x, y ∈ Cn×1 , then
Iteration (7.4) for solving a linear system is generically named as the fixed-point
iteration for linear systems. However, it is not just a method; it is a scheme of methods.
By fixing the matrix C and a sub-multiplicative norm, we obtain a corresponding
method to approximate the solution of Ax = b.
For a given linear system, Iteration (7.4) requires choosing a suitable matrix C
such that I − C A < 1 in some induced norm or the Frobenius norm. It may quite
well happen that in one induced norm, I − C A is greater than 1, but in another
induced norm, the same is smaller than 1. For example, consider the matrix
7.4 Iterative Solution of Linear Systems 173
2/3 0
B= 2/3
.
0
Its row sum norm, column sum norm, the Frobenius norm, and the spectral norm tell
different stories:
√ √
B∞ = 2
3
+ 2
3
= 4
3
> 1, B1 = 2
3
< 1, B F = 3
8
< 1, B2 = 3
8
< 1.
Example 7.8 Consider the linear system Ax = b, where A = [ai j ] ∈ Cn×n and b ∈
Cn×1 . Let D = diag(a11 , . . . , ann ), which consists of the diagonal portion of A as
its diagonal and the rest of the entries are all 0. Suppose that no diagonal entry of A
is 0, and that in some induced norm, or in the Frobenius norm, I − D −1 A < 1.
For instance, suppose A is a strict diagonally dominant matrix, that is, the entries
ai j of A satisfy
n
|aii | > |ai j | for i = 1, . . . , n.
j=1, j =i
This says that in any row, the absolute value of the diagonal entry is greater than the
sum of absolute values of all other entries in that row. Then,
n
I − D −1 A∞ = max 1 − |ai j |/|aii | : 1 ≤ i ≤ n
j=1
n
= max |ai j |/|aii | : 1 ≤ i, j ≤ n < 1.
j=1, j =i
x 0 , x m+1 = x m + D −1 (b − Ax m ) for m ≥ 0
x 0 , x m+1 = x m + L −1 (b − Ax m ) for m ≥ 0
is called the spectral radius of A. Show that if · is any induced norm, then
ρ (A) ≤ A.
2. Solve the linear system 3x − 6y + 2z = 15, − 4x + y − z = −2, x − 3y +
7z = 22 by using
(a) Jacobi iteration with initial guess as (0, 0, 0).
(b) Gauss–Seidel iteration with initial guess as (2, 2, −1).
The system A x = b has no solutions. Whereas the system Ãx = b has a unique
solution x1 = −1/δ, x2 = 2 + 1/δ.
A∞ = 47 , A−1 ∞ = 2
21
× 24 = 16
7
.
Thus, κ(A) = 47 × 16
7
= 4.
If we use the maximum column sum norm, then
A1 = 47 , A−1 1 = 2
21
× 27 = 18
7
.
Consequently, κ(A) = 7
4
× 18
7
= 29 .
Observe that the condition number of any matrix with respect to any induced norm
is at least 1. For,
176 7 Norms of Matrices
x − x b − b
≤ κ(A) .
x b
Next, Ax = b implies that b ≤ A x. Multiplying this with the previous
inequality, we obtain
x − x A − A
≤ κ(A) .
x A
x − x = −A−1 (A − A)x .
κ(A)
x − x ≤ A−1 A − A x = A − A x .
A
Theorem 7.8 Let A ∈ Fm×n be invertible, b ∈ Fn×1 , and let b = 0. Let x ∈ Fn×1
satisfy x = 0 and Ax = b. Let x̂ ∈ Fn×1 . Then,
b − A x̂
≤ x − x̂ = A−1 (b − A x̂) ≤ A−1 b − A x̂.
A
Since Ax = b, we get b ≤ A x; and x = A−1 b implies x ≤ A−1 b. It
follows that
Observe that (7.5) can be used to obtain a lower bound as well as an upper bound
on the relative residual.
The estimate in Theorem 7.8 says that the relative error in the approximate solution
b
of Ax = b can be as small as times the relative residual, or it can be as
A x
A−1 b
large as times its relative residual. It also says that when the condition
x
number of A is close to 1, the relative error and the relative residual are close to each
other; while the larger is the condition number of A, the relative residual provides
less information about the relative error.
1.01 0.99 −1 25.25 − 24.75
Example 7.12 The inverse of A = is A = .
0.99 1.01 −24.75 25.25
Using the maximum row sum norm, we see that
The linear system Ax = [2, 2]t has the exact solution x = [1, 1]t . We take an
approximate solution as x̂ = [1.01 1.01]t . The relative error and the relative residual
are given by
It shows that even if the condition number is large, the relative error and the
relative residual can be of the same order.
We change the right hand side to get the linear system Ax = [2, − 2]t . It has
the exact solution x = [100, − 100]t . We consider an approximate solution x̂ =
[101, − 99]t . The relative error and the relative residual are
That is, the relative residual is 100 times the relative error.
In general, when κ(A) is large, the system Ax = b is ill-conditioned for at least one
choice of b. There might still be choices for b so that the system is well-conditioned.
7.5 Condition Number 179
1 105 1 −105
Example 7.13 Let A = . Its inverse is A−1 = . Using the
0 1 0 1
absolute row sum norm, we have κ(A) = A∞ A−1 ∞ = (105 + 1)2 .
The system Ax = [1, 1]t has the solution x = [1 − 105 , 1]t .
Changing the vector b to b = [1.001, 1.001]t , the system Ax = b has the solu-
tion x = [1.001 − 105 − 102 , 1.001]t . We find that the relative change in the solu-
tion and the relative residual are as follows:
It shows that even if the condition number is large, the system can be well-
conditioned.
1. Using the spectral norm, show that an orthogonal matrix has the condition number
1.
2. norm · 2 for matrices, compute the condition number of the matrix
Using the
3 −5
.
6 1
3. Let A, B ∈ Fn×n . Show that κ(AB) ≤ κ(A)κ(B).
4. Using the norm · 2 for n × n matrices, show that κ(A) ≤ κ(A∗ A).
∞
Ai 1 1
eA = = I + A + A2 + A3 + · · ·
i=0
i! 2! 3!
J = diag(J1 , . . . , Jk ),
e A = Pe J P −1 = P diag(e J1 , . . . , e Jk ) P −1 .
The exponential of a matrix comes often in the context of solving ordinary dif-
ferential equations. Consider the initial value problem
dx
= A x, x(t0 ) = x 0
dt
7.6 Matrix Exponential 181
where x(t) = [x1 (t), · · · , xn (t)]t , A ∈ Rn×n , and x 0 = [a1 , · · · , an ]t ∈ Rn×1 . The
solution of this initial value problem is given by
The matrix et A may be computed via the Jordan form of A as outlined above.
There is an alternative. We wish to find out n linearly independent vectors
v1 , . . . , vn such that the series et A vi can be summed exactly. If this can be done,
then taking these as columns of a matrix B, we see that
B = et A v1 · · · vn .
The matrix v1 · · · vn is invertible since its columns are linearly independent.
Then, using its inverse, we can compute et A . We now discuss how to execute this
plan.
For any scalar λ, t (A − λI )(tλI ) = (tλI )t (A − λI ). Thus, we have
That is, for a generalized eigenvector v, the infinite sum turns out to be a finite
sum. The question is that can we choose n such generalized eigenvectors for A?
From the discussions about Jordan form, we know that the answer is affirmative. In
fact, the following is true:
Let λ be an eigenvalue of A with algebraic multiplicity m. If the linear system (A − λI )k x =
0 has r < m number of linearly independent solutions, then (A − λI )k+1 has at least r + 1
number of linearly independent solutions.
1. Find all eigenvalues of A along with their algebraic multiplicity. For each eigen-
value λ, follow Steps 2–6:
2. Determine linearly independent vectors v satisfying (A − λI )v = 0.
3. If m number of such vectors are found, then write them as v1 , . . . , vm , and set
w1 := eλ1 t v1 , . . . , wm := eλn t vm . Go to Step 7.
4. In Step 2, suppose that only k < m number of vectors vi could be obtained. To
find additional vectors, determine all vectors v such that (A − λI )2 v = 0 but
(A − λI )v = 0. For each such vector v, set the corresponding w as
182 7 Norms of Matrices
w := eλt v + t (A − λI )v .
5. If n number of vectors w j could not be found in Steps 3 and 4, then determine all
vectors v such that (A − λI )3 v = 0 but (A − λI )2 v = 0. Corresponding to each
such v, set
t2
w := eλt v + t (A − λI )v + (A − λI )2 v .
2!
6. Continue to obtain more vectors w considering vectors v that satisfy (A −
λI ) j+1 v = 0 but (A − λI ) j v = 0, and by setting
t2 tj
w := eλt v + t (A − λI )v + (A − λI )2 v + · · · + (A − λI ) j v .
2! j!
Notice that the eigenvalue 1 has algebraic multiplicity 2 but we got only one
linearly independent eigenvector. To compute an additional generalized eigenvector
for this eigenvalue, we find vectors v such that (A − 1 I )2 v = 0 but (A − 1 I )v = 0.
With v = [a, b, c]t , it gives
⎡ ⎤2 ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 a 0 0 0 0 0 0
(A − 1 I )2 v = ⎣1 0 0⎦ ⎣b ⎦ = ⎣1 0 0⎦ ⎣ a ⎦ = ⎣ 0 ⎦ = ⎣0⎦ .
0 0 2 c 0 0 2 2c 2c 0
7.6 Matrix Exponential 183
Then,
⎡ ⎤⎡ ⎤ ⎡ t ⎤ ⎡ ⎤
0 et 0 0 1 0 e 0 0 e 0 0
et A = BC −1 = ⎣et tet 0 ⎦ ⎣1 0 0⎦ = ⎣tet et 0 ⎦ , e A = ⎣e e 0 ⎦ .
0 0 e3t 0 0 1 0 0 e3t 0 0 e3
t2 2 t n−1
et A = eλt I + t B + B + ··· + B n−1 .
2! (n − 1)!
As you have seen, manual computation of eigenvalues of an arbitrary matrix using its
characteristic polynomial is impossible. Many numerical methods have been devised
to compute required number of eigenvalues approximately. This issue is addressed
in numerical linear algebra. Prior to these computations, it is often helpful to have
some information about the size or order of magnitude of the eigenvalues in terms
of the norms of some related vectors or matrices.
Thus, there are n number of Geršgorin discs; one for each row of A.
Theorem 7.10 (Geršgorin Discs) All eigenvalues of a matrix lie inside the union of
its Geršgorin discs.
Bringing the ith term to one side and taking absolute values, we have
n n n
|aii − λ| |bi | ≤ |ai j b j | = |ai j | |b j | ≤ |bi | |ai j | = |bi | ri (A).
j=1, j =i j=1, j =i j=1, j =i
Then, |λ − aii | ≤ ri (A). That is, λ ∈ Di (A). We see that corresponding to each
eigenvalue λ there exists a row i of A such that λ ∈ Di . Therefore, each eigenvalue
of A lies in D1 (A) ∪ · · · ∪ Dn (A).
Recall that a matrix A = [ai j ] ∈ Fn×n is called strict diagonally dominant iff
|aii | > ri (A) for each i = 1, . . . , n. Look at the proof of Theorem 7.10. If A is
not invertible, then 0 is an eigenvalue of A. With λ = 0, we obtain the inequality
|aii | ≤ ri (A) for some row index i. Thus, each strict diagonally dominant matrix is
invertible.
⎡ ⎤
0 3 2 3 3
⎢ −1 7 2 1 1 ⎥
⎢ ⎥
Example 7.15 Consider the matrix A = ⎢ ⎢ 2 1 0 1 1 ⎥.
⎥
⎣ 0 −1 1 0 1 ⎦
1 −1 2 1 0
The Geršgorin discs are specified by complex numbers z satisfying
The first disc contains all others except the second. Therefore, all eigenvalues lie
inside the union of discs |z| ≤ 11 and |z − 7| ≤ 5.
Notice that At and A have the same eigenvalues. It amounts to taking the Geršgorin
discs corresponding to the columns of A. Here, they are specified as follows:
186 7 Norms of Matrices
As earlier, it follows that all eigenvalues of A lie inside the union of the discs
|z| ≤ 7 and |z − 7| ≤ 6.
Therefore, all eigenvalues of A lie inside the intersection of the two regions
obtained earlier as unions of Geršgorin discs. It turns out that this intersection is
the union of the discs |z| ≤ 7 and |z − 7| ≤ 5.
7.8 Problems
√
1. Let c and s be real numbers with c2 + s 2 = 1. Show that A F = n, where
⎡ ⎤
−1 c c ··· c c
⎢ 0 −s cs · · · cs cs ⎥
⎢ ⎥
⎢ 0 0 −s 2 · · · cs 2 cs 2 ⎥
⎢ ⎥
A=⎢ .. ⎥ ∈ Fn×n .
⎢ . ⎥
⎢ ⎥
⎣ 0 0 0 · · · −s n−2 cs n−2 ⎦
0 0 0 ··· 0 −s n−2
f 1 (t), . . . , f n (t) .
References
1. M. Braun, Differential Equations and Their Applications, 4th edn. (Springer, New York, 1993)
2. R.A. Brualdi, The Jordan canonical form: an old proof. Am. Math. Mon. 94(3), 257–267 (1987)
3. S.D. Conte, C. de Boor, Elementary Numerical Analysis: An algorithmic approach (McGraw-
Hill Book Company, Int. Student Ed., 1981)
4. J.W. Demmel, Numerical Linear Algebra (SIAM Pub, Philadelphia, 1996)
5. F.R. Gantmacher, Matrix Theory, vol. 1–2 (American Math. Soc., 2000)
6. G.H. Golub, C.F. Van Loan, Matrix Computations, Hindustan Book Agency, Texts and Readings
in Math. - 43, New Delhi (2007)
7. R. Horn, C. Johnson, Matrix Analysis (Cambridge University Press, New York, 1985)
8. P. Lancaster, M. Tismenetsky, The Theory of Matrices, 2nd edn. (Elsevier, 1985)
9. A.J. Laub, Matrix Analysis for Scientists and Engineers (SIAM, Philadelphia, 2004)
10. S.J. Leon, Linear Algebra with Applications, 9th edn. (Pearson, 2014)
11. D. Lewis, Matrix Theory (World Scientific, 1991)
12. C. Meyer, Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia, 2000)
13. R. Piziak, P.L. Odell, Matrix Theory: From Generalized Inverses to Jordan Form (Chapman
and Hall/CRC, 2007)
14. G. Strang, Linear Algebra and Its Applications, 4th edn. (Cengage Learning, 2006)
15. R.S. Varga, Geršgorin and His Circles, Springer Series in Computational Mathematics, vol.
36 (Springer, 2004)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 189
Nature Switzerland AG 2021
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7
Index
D
C Determinant, 23
Cartesian norm, 157, 160 Diagonal
Cauchy schwartz inequality, 82 entries, 5
Cauchy sequence, 168 matrix, 6
Change of basis matrix, 73 of a matrix, 5
Characteristic polynomial, 103 Diagonalizable, 123
χ A(t) , 103 Diagonalized by, 123
Cholesky factorization, 156 Dilation similarity, 130
Closed subset, 168 Dimension, 59
Closed unit sphere, 159
Co-factor, 24
Column, 4 E
index, 4, 5 Eigenvalue, 101
rank, 40, 63 Eigenvector, 101
space, 63 Elementary matrix, 15
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 191
Nature Switzerland AG 2021
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7
192 Index
N
H Newton’s method, 170
Hermitian matrix, 106 Nilpotent, 111
Homogeneous system, 44 Norm, 82, 157
Normal, 124
Normed linear space, 157
I Nullity, 46, 63
Idempotent, 29, 111 Null space, 63
Identity matrix, 6
Ill-conditioned matrix, 176
Induced norm, 161 O
∞-norm, 157 Off-diagonal entries, 5
Inner product, 81 1-norm, 158
Invertible, 11 Order, 5
Isometry, 108 Orthogonal
Iteration function, 168 basis, 88
ith row, 5 matrix, 107
Index 193
set, 83 S
vectors, 82 Scalar matrix, 7
Orthonormal basis, 88 Scalars, 4
Orthonormal set, 83 Schur triangularization, 115
Self-adjoint, 106
Similar matrices, 78
P Singular value decomposition, 143
Parallelogram law, 83 Singular values, 142
Parseval identity, 91 Size, 5
Permutation matrix, 113 Skew hermitian, 106
Permutation similarity, 130 Skew symmetric, 106
Pivot, 17 Sol(A, b), 48
Pivotal column, 17 Solution of a linear system, 43
Pivotal row, 19 Spanning subset, 56
p-norm, 158 Spans, 55, 56
Polar decomposition, 151 Spectral
Positive definite, 155 mapping theorem, 118
Positive semidefinite, 150 norm, 165
Powers of matrices, 10 radius, 174
Principal submatrix, 156 Spectrum, 105
Projection matrix, 93 Square matrix, 5
Projection on a subspace, 93 Standard basis, 57
Proper subspace, 55 Standard basis vectors, 6
Pythagoras law, 83 Strict diagonally dominant, 173
Sub-multiplicative norm, 162
Subspace, 54
Sum, 7
Q
Super-diagonal, 5
QR-factorization, 89
SVD, 143
Symmetric, 106
System matrix, 43
R
Range space, 63
Rank, 19 T
echelon matrix, 76 Taxi-cab norm, 158, 160
theorem, 77 Theorem
Rayleigh quotient, 165 Basis extension, 60
Real skew symmetric, 106 Bessel inequality, 93
Real symmetric, 106 Cayley–Hamilton, 120
Reduction to RREF, 19 Contraction mapping, 168
Reflection, 107 Fixed point iteration, 172
Relative residual, 177 Geršgorin discs, 185
Residual, 177 Gram-Schmidt orthogonalization, 85
Reverse triangle inequality, 158 Jordan form, 131
Right singular vector, 145 Polar decomposition, 151
Rotation, 107 QR-factorization, 89
Row, 4 Rank factorization, 76
index, 4, 5 Rank nullity, 63
rank, 40, 63 Rank theorem, 77
reduced echelon form, 18, 19 Schur triangularization, 115
space, 63 Spectral mapping, 118
sum norm, 162 Spectral theorem, 123, 125
vector, 4 SVD, 143
RREF, 18, 19 Thin SVD, 147
194 Index
U
Unit vector, 83
Unitary, 107 Z
Upper triangular, 7 Zero matrix, 5