0% found this document useful (0 votes)
209 views

Introduction To Matrix Theory

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views

Introduction To Matrix Theory

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

Arindama Singh

Introduction to
Matrix Theory
Introduction to Matrix Theory
Arindama Singh

Introduction to Matrix
Theory
Arindama Singh
Department of Mathematics
Indian Institute of Technology Madras
Chennai, India

ISBN 978-3-030-80480-0 ISBN 978-3-030-80481-7 (eBook)


https://doi.org/10.1007/978-3-030-80481-7

Jointly published with Ane Books Pvt. Ltd.


In addition to this printed edition, there is a local printed edition of this work available via Ane Books in
South Asia (India, Pakistan, Sri Lanka, Bangladesh, Nepal and Bhutan) and Africa (all countries in the
African subcontinent).
ISBN of the Co-Publisher’s edition: 978-93-86761-20-0

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publishers, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Practising scientists and engineers feel that calculus and matrix theory form the
minimum mathematical requirement for their future work. Though it is recommended
to spread matrix theory or linear algebra over two semesters in an early stage, the
typical engineering curriculum allocates only one semester for it. In addition, I found
that science and engineering students are at a loss in appreciating the abstract methods
of linear algebra in the first year of their undergraduate programme. This resulted
in a curriculum that includes a thorough study of system of linear equations via
Gaussian and/or Gauss–Jordan elimination comprising roughly one month in the
first or second semester. It needs a follow-up of one-semester work in matrix theory
ending in canonical forms, factorizations of matrices, and matrix norms.
Initially, we followed the books such as Leon [10], Lewis [11], and Strang [14]
as possible texts, referring occasionally to papers and other books. None of these
could be used as a textbook on its own for our purpose. The requirement was a
single text containing development of notions, one leading to the next, and without
any distraction towards applications. It resulted in creation of our own material. The
students wished to see the material in a book form so that they might keep it on their
lap instead of reading it off the laptop screens. Of course, I had to put some extra
effort in bringing it to this form; the effort is not much compared to the enjoyment
in learning.
The approach is straightforward. Starting from the simple but intricate problems
that a system of linear equations presents, it introduces matrices and operations
on them. The elementary row operations comprise the basic tools in working with
most of the concepts. Though the vector space terminology is not required to study
matrices, an exposure to the notions is certainly helpful for an engineer’s future
research. Keeping this in view, the vector space terminology is introduced in a
restricted environment of subspaces of finite-dimensional real or complex spaces.
It is felt that this direct approach will meet the needs of scientists and engineers.
Also, it will form a basis for abstract function spaces, which one may study or use
later.
Starting from simple operations on matrices, this elementary treatment of matrix
theory characterizes equivalence and similarity of matrices. The other tool of Gram–
Schmidt orthogonalization has been discussed leading to best approximations and
v
vi Preface

least squares solution of linear systems. On the go, we discuss matrix factorizations
such as rank factorization, QR-factorization, Schur triangularization, diagonaliza-
tion, Jordan form, singular value decomposition, and polar decomposition. It includes
norms on matrices as a means to deal with iterative solutions of linear systems and
exponential of a matrix. Keeping the modest goal of an introductory textbook on
matrix theory, which may be covered in a semester, these topics are dealt with in a
lively manner.
Though the earlier drafts were intended for use by science and engineering
students, many mathematics students used those as supplementary text for learning
linear algebra. This book will certainly fulfil that need.
Each section of the book has exercises to reinforce the concepts; problems have
been added at the end of each chapter for the curious student. Most of these problems
are theoretical in nature, and they do not fit into the running text linearly. Exercises
and problems form an integral part of the book. Working them out may require some
help from the teacher. It is hoped that the teachers and the students of matrix theory
will enjoy the text the same way I and my students did.
Most engineering colleges in India allocate only one semester for linear algebra
or matrix theory. In such a case, the first two chapters of the book can be covered
in a rapid pace with proper attention to elementary row operations. If time does not
permit, the last chapter on matrix norms may be omitted or covered in numerical
analysis under the veil of iterative solutions of linear systems.
I acknowledge the pains taken by my students in pointing out typographical errors.
Their difficulties in grasping the notions have contributed a lot towards the contents
and this particular sequencing of topics. I cheerfully thank my colleagues A. V.
Jayanthan and R. Balaji for using the earlier drafts for teaching linear algebra to
undergraduate engineering and science students at IIT Madras. They pointed out
many improvements, which I cannot pinpoint now. Though the idea of completing
this work originated five years back, time did not permit it. IIT Madras granted me
sabbatical to write the second edition of may earlier book on Logics for Computer
Science. After sending a draft of that to the publisher, I could devote the stop-gap for
completing this work. I hereby record my thanks to the administrative authorities of
IIT Madras.
It will be foolish on my part to claim perfection. If you are using the book, then
you should be able to point out improvements. I welcome you to write to me at
[email protected].

Chennai, India Arindama Singh


Contents

1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Examples of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Transpose and Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Row Reduced Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7 Computing Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Determining Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Solvability of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Gauss–Jordan Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Matrix as a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Subspace and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Coordinate Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Coordinate Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 Change of Basis Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.7 Equivalence and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Gram–Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 QR-Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

vii
viii Contents

4.5 Best Approximation and Least Squares Solution . . . . . . . . . . . . . . . . 94


4.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1 Invariant Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 The Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1 Schur Triangularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Annihilating Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4 Jordan Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.6 Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7 Norms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3 Contraction Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.4 Iterative Solution of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.5 Condition Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.6 Matrix Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.7 Estimating Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
About the Author

Dr. Arindama Singh is a professor in the Department of Mathematics, Indian Insti-


tute of Technology (IIT) Madras, India. He received his Ph.D. degree from the
IIT Kanpur, India, in 1990. His research interests include knowledge compilation,
singular perturbation, mathematical learning theory, image processing, and numer-
ical linear algebra. He has published six books, over 60 papers in journals and confer-
ences of international repute. He has guided five Ph.D. students and is a life member of
many academic bodies, including the Indian Society for Industrial and Applied Math-
ematics, Indian Society of Technical Education, Ramanujan Mathematical Society,
Indian Mathematical Society, and The Association of Mathematics Teachers of India.

ix
Chapter 1
Matrix Operations

1.1 Examples of Linear Equations

Linear equations are everywhere, starting from mental arithmetic problems to


advanced defence applications. We start with an example. Consider the system of
linear equations

x1 + x2 = 3
x1 − x2 = 1

Subtracting the first from the second, we get −2x2 = −2. It implies x2 = 1. That
is, the original system is replaced with the following:

x1 + x2 = 3
x2 = 1

Substituting x2 = 1 in the first equation of the new system, we get x1 = 2. We


verify that x1 = 2, x2 = 1 satisfy the equations. Hence, the system of equations has
this unique solution.
To see it geometrically, let x1 represent points on x-axis, and let x2 represent
points on the y-axis. Then, the first equation represents a straight line that passes
through the point (3, 0) and has slope −1. Similarly, the second equation represents
a straight line passing through the point (1, 0) and having slope 1. They intersect at
the point (2, 1).
What about the following linear system?

x1 + x2 = 3
x1 − x2 = 1
2x1 − x2 = 3

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 1


A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_1
2 1 Matrix Operations

The first two equations have a unique solution, and that satisfies the third. Hence,
this system also has a unique solution x1 = 2, x2 = 1. Geometrically, the third equa-
tion represents the straight line that passes through (0, −3) and has slope 2. The
intersection of all the three lines is the same point (2, 1). So, the extra equation does
not put any constraint on the solutions that we obtained earlier.
But what about our systematic solution method? We aim at eliminating the first
unknown from all but the first equation. We replace the second equation with the one
obtained by second minus the first. We also replace the third by third minus twice
the first. It results in

x1 + x2 = 3
−x2 = −1
−3x2 = 3

Notice that the second and the third equations coincide, hence the conclusion. We
give another twist. Consider the system

x1 + x2 = 3
x1 − x2 = 1
2x1 + x2 = 3

The first two equations again have the solution x1 = 2, x2 = 1. But this time, the
third is not satisfied by these values of the unknowns. So, the system has no solution.
Geometrically, the first two lines have a point of intersection (2, 1); the second
and the third have the intersection point as (4/3, 1/3); and the third and the first have
the intersection point as (0, 3). They form a triangle. There is no point common to
all the three lines. Also, by using our elimination method, we obtain the equations
as:

x1 + x2 = 3
−x2 = −1
−x2 = −3

The last two equations are not consistent. So, the original system has no solution.
Finally, instead of adding another equation, we drop one. Consider the linear
equation
x1 + x2 = 3

The old solution x1 = 2, x2 = 1 is still a solution of this system. But there are other
solutions. For instance, x1 = 1, x2 = 2 is a solution. Moreover, since x1 = 3 − x2 ,
by assigning x2 any real number, we get a corresponding value for x1 , which together
give a solution. Thus, it has infinitely many solutions.
1.1 Examples of Linear Equations 3

Geometrically, any point on the straight line represented by the equation is a solu-
tion of the system. Notice that the same conclusion holds if we have more equations,
which are multiples of the only given equation. For example,

x1 + x2 = 3
2x1 + 2x2 = 6
3x1 + 3x2 = 9

We see that the number of equations really does not matter, but the number of
independent equations does matter. Of course, the notion of independent equations
is not yet precise; we have some working ideas only.
It is not also very clear when does a system of equations have a solution, a unique
solution, infinitely many solutions, or even no solutions. And why not a system
of equations has more than one but finitely many solutions? How do we use our
elimination method for obtaining infinite number of solutions?
To answer these questions, we will introduce matrices. Matrices will help us in
representing the problem in a compact way and will lead to a definitive answer.
We will also study the eigenvalue problem for matrices which come up often in
applications. These concerns will allow us to represent matrices in elegant forms.
Exercises for Sect. 1.1
1. For each of the following system of linear equations, find the number of solutions
geometrically:
(a) x1 + 2x2 = 4, −2x1 − 4x2 = 4
(b) −x1 + 2x2 = 3, 2x1 − 4x2 = −6
(c) x1 + 2x2 = 1, x1 − 2x2 = 1, −x1 + 6x2 = 3
2. Show that the system of linear equations a1 x1 + x2 = b1 , a2 x1 + x2 = b2 has a
unique solution if a1 = a2 . Is the converse true?

1.2 Basic Matrix Operations

In the last section, we have solved a linear system by transforming it to equivalent


systems. Our method of solution may be seen schematically as follows:

x1 + x2 = 3 x1 + x2 = 3 x1 = 2
x1 − x2 = 1 ⇒ x2 = 1 ⇒ x2 = 1
We can minimize writing by ignoring the unknowns and transform only the num-
bers in the following way:
4 1 Matrix Operations

1 1 3 1 1 3 1 0 2
1 −1 1 ⇒ 0 1 1 ⇒ 0 1 1
To be able to operate with such array of numbers and talk about them, we require
some terminology. First, some notation:

Notation 1.1 N denotes the set of all natural numbers 1, 2, 3, . . .


R denotes the set of all real numbers.
C denotes the set of all complex numbers.

We will write F for either R or C. The numbers in F will also be referred to as


scalars. A rectangular array of scalars is called a matrix. We write a matrix with a
pair of surrounding square brackets as in the following.
⎡ ⎤
a11 · · · a1n
⎢ .. .. ⎥
⎣ . . ⎦
am1 · · · amn

Here, ai j are scalars.


We give names to matrices. If A is (equal to) the above matrix, then we say that
A has m number of rows and n number of columns; and we say that A is an m × n
matrix. The scalar that is common to the ith row and jth column of A is ai j . With
respect to the matrix A, we say that ai j is its (i, j)th entry. The entries of a matrix
are scalars. We also write the above matrix as

A = [ai j ], ai j ∈ F for i = 1, . . . , m, j = 1, . . . , n.

Thus, the scalar ai j is the (i, j)th entry of the matrix [ai j ]. Here, i is called the row
index and j is called the column index of the entry ai j .
The set of all m × n matrices with entries from F will be denoted by Fm×n .
A row vector of size n is a matrix in F1×n . Similarly, a column vector of size
n is a matrix in Fn×1 . The vectors in F1×n (row vectors) will be written as (with or
without commas)
[a1 , . . . , an ] or as [a1 · · · an ]

for scalars a1 , . . . , an . The vectors in Fn×1 are written as


⎡ ⎤
b1
⎢ .. ⎥
⎣ . ⎦ or as [b1 , . . . , bn ] or as [b1 · · · bn ]
t t

bn

for scalars b1 , . . . , bn . The second way of writing is the transpose notation; it saves
vertical space. Also, if a column vector v is equal to u t for a row vector u, then we
1.2 Basic Matrix Operations 5

write the row vector u as vt . Thus, we accept (u t )t = u and (vt )t = v as a way of


writing.
We will write both F1×n and Fn×1 as Fn . Especially when a result is applicable
to both row vectors and column vectors, this notation will become handy. Also, we
may write a typical vector in Fn as

(a1 , . . . , an ).

When Fn is F1×n , you should read (a1 , . . . , an ) as [a1 , . . . , an ], a row vector, and
when Fn is Fn×1 , you should read (a1 , . . . , an ) as [a1 , . . . , an ]t , a column vector.
The ith row of a matrix A = [ai j ] ∈ Fm×n is the row vector

[ai1 , . . . ain ].

We also say that the row index of this row is i. Similarly, the jth column of A is
the column vector
[a1 j , . . . am j ]t .

And, its column index is j.


Any matrix in Fm×n is said to have its size as m × n. If m = n, the rectangular
array becomes a square array with n rows and n columns; and the matrix is called a
square matrix of order n.
Two matrices of the same size are considered equal when their corresponding
entries coincide; i.e. if A = [ai j ] and B = [bi j ] are in Fm×n , then

A = B iff ai j = bi j for 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Matrices of different sizes are unequal.


The zero matrix is a matrix, each entry of which is 0. We write 0 for all zero
matrices of all sizes. The size is to be understood from the context.
Let A = [ai j ] ∈ Fn×n be a square matrix of order n. The entries aii are called the
diagonal entries of A. The diagonal of A consists of all diagonal entries; the first
entry on the diagonal is a11 , and the last diagonal entry is ann . The entries of A,
which are not on the diagonal, are called the off-diagonal entries of A; they are ai j
for i = j. In the following matrix, the diagonal is shown in bold:
⎡ ⎤
1 2 3
⎣2 3 4⎦ .
3 4 0

Here, 1 is the first diagonal entry, 3 is the second diagonal entry, and 5 is the third
and the last diagonal entry.
The super-diagonal of a matrix consists of entries above the diagonal. That is, the
entries ai,i+1 comprise the super-diagonal of an n × n matrix A = [ai j ]. Of course,
i varies from 1 to n − 1 here. In the following matrix, the super-diagonal is shown
6 1 Matrix Operations

in bold: ⎡ ⎤
1 2 3
⎣2 3 4⎦ .
3 4 0

If all off-diagonal entries of A are 0, then A is said to be a diagonal matrix. Only


a square matrix can be a diagonal matrix. There is a way to generalize this notion
to any matrix, but we do not require it. Notice that all diagonal entries in a diagonal
matrix need not be nonzero. For example, the zero matrix of order n is a diagonal
matrix. We also write a diagonal matrix with diagonal entries d1 , . . . , dn as

diag(d1 , . . . , dn ).

The following is a diagonal matrix. We follow the convention of not showing the
non-diagonal entries in a diagonal matrix, which are 0.
⎡ ⎤ ⎤ ⎡
1 1 0 0
diag(1, 3, 0) = ⎣ 3 ⎦ = ⎣0 3 0⎦ .
0 0 0 0

The identity matrix is a diagonal matrix with each diagonal entry as 1. We write
an identity matrix of order m as Im . Sometimes, we omit the subscript m if it is
understood from the context.

I = Im = diag(1, . . . , 1).

We write ei for a column vector whose ith component is 1 and all other compo-
nents 0. The jth component of ei is δi j . Here,

1 if i = j
δi j =
0 if i = j

is Kronecker’s delta. The size of a column vector ei is to be understood from the


context. Notice that the identity matrix I = [δi j ].
There are then n distinct column vectors e1 , . . . , en in Fn×1 . These are referred to
as the standard basis vectors for reasons you will see later. We also say that ei is
the ith standard basis vector. These are the columns of the identity matrix of order
n, in that order; that is, ei is the ith column of I. Then, eit is the ith row of I. Thus,
⎡ t⎤
e1
⎢ .. ⎥
I = [δi j ] = diag(1, . . . , 1) = [e1 · · · en ] = ⎣ . ⎦ .
ent
1.2 Basic Matrix Operations 7

A scalar matrix is a diagonal matrix with equal diagonal entries. For instance,
the following is a scalar matrix: ⎡ ⎤
3
⎢ 3 ⎥
⎢ ⎥.
⎣ 3 ⎦
3

It is also written as diag(3, 3, 3, 3).


A matrix A ∈ Fm×n is said to be upper triangular iff all entries below the diagonal
are zero. That is, A = [ai j ] is upper triangular when ai j = 0 for i > j. In writing
such a matrix, we simply do not show the zero entries below the diagonal.
Similarly, a matrix is called lower triangular iff all its entries above the diagonal
are zero.
Both upper triangular and lower triangular matrices are referred to as triangular
matrices. In the following, L is a lower triangular matrix, and U is an upper triangular
matrix, each of order 3 :
⎡ ⎤ ⎡ ⎤
1 123
L = ⎣2 3 ⎦ , U = ⎣ 3 4⎦ .
345 5

A diagonal matrix is both upper triangular and lower triangular.


Sum of two matrices of the same size is a matrix whose entries are obtained by
adding the corresponding entries in the given two matrices. If A = [ai j ] ∈ Fm×n and
B = [bi j ] ∈ Fm×n , then A + B = [ci j ] ∈ Fm×n with

ci j = ai j + bi j for 1 ≤ i ≤ m, 1 ≤ j ≤ n.

We write the same thing as [ai j ] + [bi j ] = [ai j + bi j ]. For example,

123 312 435


+ = .
231 213 444

Thus, we informally say that matrices are added entry-wise. Matrices of different
sizes can never be added. It is easy to see that

A + B = B + A, A+0=0+ A = A

for all matrices A, B ∈ Fm×n , with an implicit understanding that 0 ∈ Fm×n .


Similarly, matrices can be multiplied by a scalar entry-wise. If α ∈ F and
A = [ai j ] ∈ Fm×n , then
α A = [αai j ] ∈ Fm×n .

Therefore, a scalar matrix with α on the diagonal is written as α I.


8 1 Matrix Operations

For A = [ai j ], the matrix −A ∈ Fm×n is taken as one whose (i, j)th entry is −ai j .
Thus,
−A = (−1)A, (−A) + A = A + (−A) = 0.

We also abbreviate A + (−B) to A − B, as usual. For example,

123 312 057


3 − = .
231 213 480

Addition and scalar multiplication of matrices satisfy the following properties:


Let A, B, C ∈ Fm×n , and let α, β ∈ F. Then,
1. A + B = B + A.
2. (A + B) + C = A + (B + C).
3. A + 0 = 0 + A = A.
4. A + (−A) = (−A) + A = 0.
5. α(β A) = (αβ)A.
6. α(A + B) = α A + α B.
7. (α + β)A = α A + β A.
8. 1 A = A.
9. (−1) A = −A.
Notice that whatever we discuss here for matrices apply to row vectors and column
vectors, in particular. But remember that a row vector cannot be added to a column
vector unless both are of size 1 × 1.
We also define multiplication or product of matrices. Let A = [aik ] ∈ Fm×n , and
let B = [bk j ] ∈ Fn×r . Then, their product AB is a matrix [ci j ] ∈ Fm×r , where the
(i, j)th entry is given by
n
ci j = ai1 b1 j + · · · + ain bn j = aik bk j .
k=1

Mark the sizes of A and B. The matrix product AB is defined only when the number
of columns in A is equal to the number of rows in B. The result AB has number of
rows as that of A and the number of columns as that of B.
A particular case might be helpful. Suppose u is a row vector in F1×n and v is a
column vector in Fn×1 . Then, their product uv ∈ F1×1 . It is a 1 × 1 matrix. Often,
we identify such matrices with scalars. The product now looks like:
⎡ ⎤
b1
⎢ .. ⎥
a1 · · · an ⎣ . ⎦ = [a1 b1 + · · · + an bn ].
bn

This is helpful in visualizing the general case, which looks like


1.2 Basic Matrix Operations 9
⎡ ⎤
⎡ ⎤ b11 b1j b1r ⎡ ⎤
a11 a1k a1n ⎢ .. ⎥ c11 c1 j c1r
⎢ ⎥ ⎢ . ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ai1 · · · aik · · · ain ⎥ ⎢b1 bj br ⎥ = ⎢ ci1 cij cir ⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎢ .. ⎥ ⎣ ⎦
⎣ . ⎦
am1 amk amn cm1 cm j cmr
bn1 bnj bnr

The ith row of A multiplied with the jth column of B gives the (i, j)th entry in AB.
Thus to get AB, you have to multiply all m rows of A with all r columns of B, taking
one from each in turn. For example,
⎡ ⎤⎡ ⎤ ⎡ ⎤
3 5 −1 2 −2 3 1 22 −2 43 42
⎣ 4 0 2⎦ ⎣5 0 7 8⎦ = ⎣ 26 −16 14 6 ⎦ .
−6 −3 2 9 −4 1 1 −9 4 −37 −28

If u ∈ F1×n and v ∈ Fn×1 , then uv ∈ F1×1 ; but vu ∈ Fn×n . For instance,


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 6 1
3 6 1 ⎣2⎦ = 19 , ⎣2⎦ 3 6 1 = ⎣ 6 12 2⎦ .
4 4 12 24 4

It shows clearly that matrix multiplication is not commutative. Commutativity


can break down due to various reasons. First of all when AB is defined, B A may not
be defined. Secondly, even when both AB and B A are defined, they may not be of
the same size; thirdly, even when they are of the same size, they need not be equal.
For example,

1 2 0 1 4 7 0 1 1 2 2 3
= but = .
2 3 2 3 6 11 2 3 2 3 8 13

It does not mean that AB is never equal to B A. If A, B ∈ Fm×m and A is a scalar


matrix, then AB = B A. Conversely, if A ∈ Fm×m is such that AB = B A for all
B ∈ Fm×m , then A must be a scalar matrix. This fact is not obvious, and its proof is
involved. Moreover, there can be some particular non-scalar matrices A and B both
in Fn×n such that AB = B A.
Observe that if A ∈ Fm×n , then AIn = A and Im A = A. Look at the columns of
In in this product. They say that

Ae j = the jth column of A for j = 1, . . . , n.

Here, e j is the standard jth basis vector, the jth column of the identity matrix of
order n; its jth component is 1, and all other components are 0. The above identity
can also be seen by directly multiplying A with e j , as in the following:
10 1 Matrix Operations
⎡ ⎤⎡ ⎤ ⎡ ⎤
a11 · · · a1 j · · · a1n 0 a1 j
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢.⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
Ae j = ⎢ a
⎢ i1 · · · a ij · · · a ⎥⎢ ⎥ ⎢ ⎥
in ⎥ ⎢1⎥ = ⎢ ai j ⎥ = jth column of A.
⎢ .. ⎥ ⎢.⎥ ⎢ . ⎥
⎣ . ⎦ ⎣ .. ⎦ ⎣ .. ⎦
am1 · · · am j · · · amn 0 am j

Thus, A can be written in block form as

A = Ae1 · · · Ae j ··· Aen .

Unlike numbers, product of two nonzero matrices can be a zero matrix. For
instance,
1 0 0 0 0 0
= .
0 0 0 1 0 0

Let A ∈ Fm×n . We write its ith row as Ai and its kth column as Ak .
We can now write A as a row of columns and also as a column of rows in the
following manner:
⎡ ⎤
A1
⎢ ⎥
A = [aik ] = A1 · · · An = ⎣ ... ⎦ .
Am

Write B ∈ Fn×r similarly as


⎡ ⎤
B1
⎢ ⎥
B = [bk j ] = B1 · · · Br = ⎣ ... ⎦ .
Bn

Then, their product AB can now be written in block form as (ignoring extra brackets):
⎡ ⎤
A1 B
⎢ ⎥
AB = AB1 · · · ABr = ⎣ ... ⎦ .
Am B

It is easy to verify the following properties of matrix multiplication:


1. If A ∈ Fm×n , B ∈ Fn×r and C ∈ Fr × p , then (AB)C = A(BC).
2. If A, B ∈ Fm×n and C ∈ Fn×r , then (A + B)C = AB + AC.
3. If A ∈ Fm×n and B, C ∈ Fn×r , then A(B + C) = AB + AC.
4. If α ∈ F, A ∈ Fm×n and B ∈ Fn×r , then α(AB) = (α A)B = A(α B).
Powers of square matrices can be defined inductively by taking
1.2 Basic Matrix Operations 11

A0 = I and An = A An−1 for n ∈ N.


⎡ ⎤ ⎡ ⎤
1 1 0 1 n n(n − 1)
Example 1.1 Let A= ⎣0 1 2⎦ . We show that An = ⎣0 1 2n ⎦ for n∈N.
0 0 1 0 0 1
We use induction on n. The basis case n = 1 is obvious. Suppose An is as given.
Now,
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 0 1 n n(n − 1) 1 n + 1 (n + 1)n
An+1 = A An = ⎣0 1 2⎦ ⎣0 1 2n ⎦ = ⎣0 1 2(n + 1)⎦ .
0 0 1 0 0 1 0 0 1

By convention, A0 = I. Incidentally, taking n = 0 in An , we have A0 = I. 

A square matrix A of order m is called invertible iff there exists a matrix B of


order m such that
AB = I = B A.

If B and C are matrices that satisfy the above equations, then

C = C I = C(AB) = (C A)B = I B = B.

Therefore, there exists a unique matrix B corresponding to any given invertible


matrix A such that AB = I = B A. We write such a matrix B as A−1 and call it the
inverse of the matrix A. That is, A−1 is that matrix which satisfies

A A−1 = I = A−1 A.

We talk of invertibility of square matrices only; all square matrices are not invert-
ible. For example, I is invertible but 0 is not. If AB = 0 for nonzero square matrices
A and B, then neither A nor B is invertible. Why?
If both A, B ∈ Fn×n are invertible, then (AB)−1 = B −1 A−1 . Reason:

B −1 A−1 AB = B −1 I B = I = AI A−1 = AB B −1 A−1 .

Invertible matrices play a crucial role in solving linear systems uniquely. We will
come back to the issue later.
Exercises for Sect. 1.2
1. Compute AB, C A, DC, DC AB, A2 , D⎡2 and A3⎤
B 2 , where⎡ ⎤
−1 2 3 2 1
2 3 4 −1
A= , B= , C = ⎣ 2 −1⎦ , D = ⎣4 −6 0 ⎦.
1 2 4 0
1 3 1 −2 − 2
12 1 Matrix Operations

2. Let A = [ai j ] ∈ F2×2 with a11 = 0. Let b = a21 /a11 . Show that there exists c ∈ F
1 0 a11 a12
such that A = . What could be c?
b1 0 c
3. Let A ∈ Fm×n , and let B ∈ Fn×k . For 1 ≤ i ≤ m, 1 ≤ j ≤ k, show that
(a) (AB)i = Ai B (b) (AB) j = A B j
4. Construct two 3 × 3 matrices A and B such that AB = 0 but B A = 0.
5. Can you construct invertible 2 × 2 matrices A and B such that AB = 0?
6. Let A, B ∈ Fn×n be such that AB = 0. Then which of the following is/are true,
and why?
(a) At least one of A or B is the zero matrix.
(b) At least one of A or B is invertible.
(c) At least one of A or B is non-invertible.
(d) If A = 0 and B = 0, then neither is invertible.
7. Prove all properties of multiplication of matrices mentioned in the text.
8. Let A be the 4 × 4 matrix whose super-diagonal entries are all 1, and all other
entries 0. Show that An = 0 for n ≥ 4.
9. Let A be the 4 × 4 matrix with each diagonal entry as 21 , and each non-diagonal
entry as − 21 . Compute An for n ∈ N.

1.3 Transpose and Adjoint

Given a matrix A ∈ Fm×n , its transpose is the matrix in Fn×m defined by

the (i, j)th entry of At = the ( j, i)th entry of A.

The transpose of A is denoted by At .


That is, the ith column of At is the column vector [ai1 , . . . , ain ]t . In this sense,
the rows of A become the columns of At and the columns of A become the rows of
At . For example,
⎡ ⎤
12
123 123
if A = then At = ⎣2 3⎦ and (At )t = .
231 231
31

In particular, if u = [a1 , . . . am ] is a row vector, then its transpose is


⎡ ⎤
a1
⎢ ⎥
u t = ⎣ ... ⎦ ,
am
1.3 Transpose and Adjoint 13

which is a column vector, as mentioned earlier. Similarly, the transpose of a column


vector is a row vector. If you write A as a row of column vectors, then you can express
At as a column of row vectors.
⎡ ⎤ ⎡ t ⎤
A1 A1
⎢ .. ⎥ ⎢ .. ⎥
For A = A1 · · · An = ⎣ . ⎦ , A = ⎣ . ⎦ = At1 · · · Atm .
t

Am Atn

The following are some of the properties of the operation of transpose.


1. (At )t = A.
2. (A + B)t = At + B t .
3. (α A)t = α At .
4. (AB)t = B t At .
5. If A is invertible, then At is invertible, and (At )−1 = (A−1 )t .
In the above properties, we assume that the operations are allowed; that is, in (2),
A and B must be matrices of the same size. In (3), α is a scalar. Similarly, in (4), the
number of columns in A must be equal to the number of rows in B; and in (5), A
must be a square matrix.
It is easy to see all the above properties, except perhaps the fourth one. For this,
let A ∈ Fm×n and let B ∈ Fn×r . Now, the ( j, i)th entry in (AB)t is the (i, j)th entry
in AB; it is given by
ai1 b j1 + · · · + ain b jn .

On the other side, the ( j, i)th entry in B t At is obtained by multiplying the jth
row of B t with the ith column of At . This is same as multiplying the entries in the
jth column of B with the corresponding entries in the ith row of A and then taking
the sum. Thus, it is
b j1 ai1 + · · · + b jn ain .

This is the same as computed earlier.


The fifth one follows from the fourth one and the fact that (AB)−1 = B −1 A−1 .
Observe that transpose of a lower triangular matrix is an upper triangular matrix
and vice versa.
Close to the operations of transpose of a matrix is the adjoint. Recall that the
complex conjugate of a scalar α is written as α, where a + ib = a − ib for a, b ∈ R.
The matrix obtained from A by taking complex conjugate of each entry is written as
A. That is, the (i, j)th entry of A is the complex conjugate of the (i, j)th entry of
A. The adjoint of a matrix A = [ai j ] ∈ Fm×n , written as A∗ , is the transpose of A.
That is, A∗ = ( A )t = At . The adjoint of A is also called the conjugate transpose
of A. It may be defined directly by

the (i, j)th entry of A∗ = the complex conjugate of ( j, i)th entry of A.


14 1 Matrix Operations
⎡ ⎤
1−i 2
1+i 2 3
For instance, if A = then A∗ = ⎣ 2 3 ⎦.
2 3 1−i
3 1+i
If A = [ai j ] ∈ Rm×n , then a i j = ai j . Consequently, A∗ = At . For example,
⎡ ⎤
1 2
123
if A = then A∗ = ⎣2 3⎦ = A t .
231
3 1

The ith column of A∗ is the column vector [a i1 , . . . , a in ]t , which is the adjoint of


the ith row of A. We may write the operation of adjoint in terms of rows and columns
of a matrix as in the following.
⎡ ⎤ ⎡ ⎤
A1 A∗1
⎢ ⎥ ⎢ ⎥
For A = A1 · · · An = ⎣ ... ⎦ , A∗ = ⎣ ... ⎦ = A∗1 · · · A∗m .
Am A∗n

The operation of adjoint satisfies the following properties:


1. (A∗ )∗ = A.
2. (A + B)∗ = A∗ + B ∗ .
3. (α A)∗ = α A∗ .
4. (AB)∗ = B ∗ A∗ .
5. If A is invertible, then A∗ is invertible, and (A∗ )−1 = (A−1 )∗ .
In (2), the matrices A and B must be of the same size, and in (4), the number of
columns in A must be equal to the number of rows in B.
Exercises for Sect. 1.3
1. Determine At , A, A∗ , A∗ A and A A∗ , where
⎡ ⎤
⎡ ⎤ 1 −2 + i 3 − i
−1 2 3 1 ⎢ i −1 − i 2i ⎥
(a) A = ⎣ 2 −1 0 3 ⎦ (b) A = ⎢⎣ 1 + 3i −i

−3 ⎦
0 −1 −3 1
−2 0 −i
2. Prove all properties of the transpose and the adjoint mentioned in the text.
3. Let A ∈ Cm×n . Suppose A A∗ = Im . Does it follow that A∗ A = In ?

1.4 Elementary Row Operations

Recall that while solving linear equations in two or three variables, we try to eliminate
a variable from all but one equation by adding an equation to the other, or even adding
a constant times one equation to another. We do similar operations on the rows of a
1.4 Elementary Row Operations 15

matrix. Theoretically, it will be advantageous to see these operations as multiplication


by some special kinds of matrices.
Let e1 , . . . , em be the standard basis vectors of Fm×1 . Let 1 ≤ i, j ≤ m. The
product ei etj is an m × m matrix whose (i, j)th entry is 1, and all other entries are
0. We write such a matrix as E i j . For instance, when m = 3, we have
⎡ ⎤ ⎡ ⎤
0 0 0 0
e2 e3t = ⎣1⎦ 0 0 1 = ⎣0 0 1⎦ = E 23 .
0 0 0 0

An elementary matrix of order m is one of the following three types of matrices:


1. E[i, j] = I − E ii − E j j + E i j + E ji with i = j.
2. E α [i] = I − E ii + α E ii , where α is a nonzero scalar.
3. E α [i, j] = I + α E i j , where α is a nonzero scalar and i = j.
Here, I is the identity matrix of order m. Similarly, the order of the elementary
matrices will be understood from the context; we will not show that in our symbolism.
The following are instances of elementary matrices of order 3 :
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 1 0 0 1 0 0
E[1, 2] = ⎣1 0 0⎦ , E −1 [2] = ⎣0 −1 0⎦ , E 2 [3, 1] = ⎣0 1 0⎦ .
0 0 1 0 0 1 2 0 1
We see that E i j A = ei etj A = ei (At e j )t = ei ((At ) j )t = ei A j . The last matrix
has all rows as zero rows except the ith one, which is equal to A j , the jth row of
A. That is,
E i j A is an m × n matrix whose ith row is the jth row of A, and the rest of its rows are zero
rows. In particular, E ii A is the matrix obtained from A by replacing all its rows by zero rows
except the ith one.

Using these, we observe the following.


Observation 1.1 1. E[i, j] A is obtained from A by exchanging its ith and jth
rows.
2. E α [i] A is obtained from A by multiplying its ith row with α.
3. E α [i, j] A is obtained from A by replacing its ith row with the ith row plus α
times the jth row.
We call each of these operations of pre-multiplying a matrix with an elementary
matrix as an elementary row operation. Thus, there are three kinds of elementary
row operations corresponding to those listed in Observation 1.1. Sometimes, we will
refer to them as of Type 1, 2, or 3, respectively. Also, in computations, we will write

E
A −→ B

to mean that the matrix B has been obtained from A by an elementary row operation
E, that is, when B = E A.
16 1 Matrix Operations

Example 1.2 Verify the following applications of elementary row operations:


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 1 1 1
E −3 [3,1] E −2 [2,1]
⎣ 2 2 2 ⎦ −→ ⎣ 2 2 2 ⎦ −→ ⎣ 0 0 0 ⎦ . 
3 3 3 0 0 0 0 0 0

Often, we will apply elementary row operations in a sequence. In this way, the
above operations could be shown in one step as E −3 [3, 1], E −2 [2, 1]. However,
remember that the result of application of this sequence of elementary row operations
on a matrix A is E −2 [2, 1] E −3 [3, 1] A; the products are in reverse order.
Observe that each elementary matrix is invertible. In fact, E[i, j] is its own inverse,
E 1/α [i] is the inverse of E α [i], and E −α [i, j] is the inverse of E α [i, j]. Therefore, any
product of elementary matrices is invertible. It follows that if B has been obtained
from A by applying a sequence of elementary row operations, then A can also be
obtained from B by a sequence of elementary row operations.
Exercises for Sect. 1.4
1. Show the following:
(a) E i j E jm = E im ; if j = k, then E i j E km = 0.  
n n
(b) Each A = [ai j ] ∈ Fn×n can be written as A = i=1 j=1 ai j E i j .

2. Compute E[2, 3]A, E i [2]A, E −1/2 [1, 3]A⎡and E i [1, 2]A, where⎤
⎡ ⎤ 1 −2 + i 3 − i
−1 2 3 1 ⎢ i −1 − i 2i ⎥
(a) A = ⎣ 2 −1 0 3 ⎦ (b) A = ⎢ ⎣ 1 + 3i −i

−3 ⎦
0 −1 −3 1
−2 0 −i
3. Take an invertible 2 × 2 matrix. Bring it to the identity matrix of order 2 by
applying elementary row operations.
4. Take a non-invertible 2 × 2 matrix. Try to bring it to the identity matrix of order
2 by applying elementary row operations.
5. Argue in general terms why the following in Observation 1.1 are true:
(a) E[i, j] A is obtained from A by exchanging its ith and jth rows.
(b) E α [i] A is obtained from A by multiplying its ith row with α.
(c) E α [i, j] A is obtained from A by adding to its ith row α times the jth row.
6. How can the elementary matrices be obtained from the identity matrix?
7. Let α be a nonzero scalar. Show the following:
(a) (E[i, j])t = E[i, j], (E α [i])t = E α [i], (E α [i, j])t = E[ j, i].
(b) (E[i, j])−1 = E[i, j], (E α [i])−1 = E 1/α [i], (E α [i, j])−1 = E −α [i, j].
8. For each of the following pairs of matrices, find an elementary matrix E such that
B = E A.
⎡ ⎤ ⎡ ⎤
2 1 3 2 1 3
(a) A = ⎣ 3 1 4 ⎦ , B = ⎣ −2 4 5 ⎦
−2 4 5 1 5 9
1.4 Elementary Row Operations 17
⎡ ⎤ ⎡ ⎤
4 −2 3 3 −2 1
(b) A = ⎣ 1 0 2 ⎦ , B = ⎣1 0 2⎦
0 3 5 0 3 5
9. For each of the following pairs of matrices, find an elementary matrix E such that
B = AE. [Hint: The requirement is B t = E t At .]
⎡ ⎤ ⎡ ⎤
3 1 4 4 5 3
(a) A = ⎣ 4 1 2 ⎦ , B = ⎣ 2 3 4 ⎦
2 3 1 1 4 2
⎡ ⎤ ⎡ ⎤
2 −2 3 4 −2 1
(b) A = ⎣ −1 4 2 ⎦ , B = ⎣ −2 4 −4 ⎦
3 1 −2 6 1 8

1.5 Row Reduced Echelon Form

Consider solving the linear equations x + y = 3 and −x + y = 1. If we add the


equations, we obtain 2y = 4 or y = 2. Substituting this value in one of the equations,
we get x = 1. In terms of matrices, we write the equations as

113
−1 1 1

where the third column is the right-hand side of the equality sign in our equations.
Our method of solution suggests that we proceed as follows:

113 E 1 [2,1] 113 E 1/2 [2] 113 E −1 [1,2] 101


−→ −→ −→
−1 1 1 024 012 012

In the final result, the first 2 × 2 block is an identity matrix. Thus, we could obtain
a unique solution as x = 1, y = 2, which are the respective entries in the last column.
As you have seen, we may not be able to bring any arbitrary square matrix to the
identity matrix of the same order by elementary row operations. On the other hand,
if two rows of a matrix are same, then one of them can be made a zero row after
a suitable elementary row operation. We may thus look for a matrix with as many
zero entries as possible and somewhat closer to the identity matrix. Moreover, such
a matrix if invertible must be the identity matrix. We would like to define such a
matrix looking at our requirements on the end result.
Recall that in a matrix, the row index of the ith row is i, which is also called the
row index of the (i, j)th entry. Similarly, j is the column index of the jth column
and also of the (i, j)th entry.
In a nonzero row of a matrix, the nonzero entry with minimum column index (first
from left) is called a pivot. We mark a pivot by putting a box around it. A column
where a pivot occurs is called a pivotal column.
18 1 Matrix Operations

In the following matrix, the pivots are shown in boxes:


⎡ ⎤
0 1 2 0
⎣0 0 0 0 ⎦
0 0 0 2

The row index of the pivot 1 is 1, and its column index is 2. The row index of the
pivot 2 is 3, and its column index is 4. The column indices of the pivotal columns
are 2 and 4.
A matrix A ∈ Fm×n is said to be in row reduced echelon form (RREF) iff the
following conditions are satisfied:
1. Each pivot is equal to 1.
2. In a pivotal column, all entries other than the pivot are zero.
3. The row index of each nonzero row is smaller than the row index of each zero
row.
4. Among any two pivots, the pivot with larger row index also has larger column
index.

Example 1.3 The following matrices are in row reduced echelon form:
⎡ ⎤ ⎡ 0⎤ ⎡ ⎤
1
1 2 0 0  
⎢ ⎥ ⎢ 0⎥ ⎢ 0 ⎥
⎣ 0 0 1 0 ⎦,⎢ ⎥ ⎢
⎣ 0⎦ , ⎣
⎥, 0 0 0 0 , 0 1 0 2 .

0
0 0 0 1 0 0

The following are not in row reduced echelon form:


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 3 0 0 1 3 1 0 1 0 0 0 1 3 0
⎢0 0 0 2 ⎥ ⎢0 0 0 1 ⎥ ⎢0 0 1 0 ⎥ ⎢0 0 0 1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥.
⎣0 0 0 0 ⎦ , ⎣0 0 0 0 ⎦ , ⎣0 0 0 0 ⎦ , ⎣0 0 0 1⎦

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

In a row reduced echelon form matrix, all zero rows (if there are any) occur at the
bottom of the matrix. Further, the pivot in a latter row occurs below and to the right
of any former row.
A column vector (an n × 1 matrix) in row reduced echelon form is either the zero
vector or e1 .
If a matrix in RREF has k pivotal columns, then those columns occur in the matrix
as e1 , . . . , ek , read from left to right, though there can be other columns between these
pivotal columns.
Further, if a non-pivotal column occurs between two pivotal columns ek and ek+1 ,
then the entries of the non-pivotal column beyond the kth entry are all zero.
In a row reduced echelon form matrix, all entries below and to the left of any
pivot are zero. Ignoring such zero entries and drawing lines below and to the left of
pivots, a pattern of steps emerges, thus the name echelon form.
1.5 Row Reduced Echelon Form 19

Any matrix can be brought to a row reduced echelon form by using elementary
row operations. We first search for a pivot and make it 1; then using elementary row
operations, we zero-out all entries except the pivot in a column and then use row
exchanges to take the zero rows to the bottom. Following these guidelines, we give
an algorithm to reduce a matrix to its RREF.
Reduction to Row Reduced Echelon Form
1. Set the work region R to the whole matrix A.
2. If all entries in R are 0, then stop.
3. If there are nonzero entries in R, then find the leftmost nonzero column. Mark it
as the pivotal column.
4. Find the topmost nonzero entry in the pivotal column. Box it; it is a pivot.
5. If the pivot is not on the top row of R, then exchange the row of A which contains
the top row of R with the row where the pivot is.
6. If the pivot, say, α is not equal to 1, then replace the top row of R in A by 1/α
times that row. Mark the top row of R in A as the pivotal row.
7. Zero-out all entries, except the pivot, in the pivotal column by replacing each row
above and below the pivotal row using an elementary row operation (of Type 3)
in A with that row and the pivotal row.
8. If the pivot is the rightmost and the bottommost entry in A, then stop. Else, find
the sub-matrix to the right and below the pivot. Reset the work region R to this
sub-matrix, and go to Step 2.
We will refer to the output of the above reduction algorithm as the row reduced
echelon form (the RREF) of a given matrix.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 120 1 1 20 1 1 2 0
⎢ 3 5 7 1⎥ R1 ⎢ 0 2 1⎥ E 1/2 [2] ⎢
1 1/2 1/2⎥

Example 1.4 A = ⎣ ⎥ −→ ⎣⎢ 1 ⎥ −→ ⎢ 0 ⎥
1 5 4 5⎦ 0 4 25⎦ ⎣ 0 4 2 5⎦
2 879 0 6 39 0 6 3 9
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 3/2 −1/2 1 0 3/2 −1/2 1 0 3/2 0
⎢ ⎥ ⎢ ⎥
R2 ⎢ 0 1 1/2 1/2⎥ E 1/3 [3] ⎢ 0 1 1/2 1/2 ⎥ R3 ⎢ 0 1 1/2 0 ⎥
−→ ⎢⎣ 0 0 0
⎥ −→
⎦ ⎢ ⎥ −→ ⎢ ⎥ = B.
3 ⎣ 0 0 0 1 ⎦ ⎣ 0 0 0 1⎦
0 0 0 6 0 0 0 6 0 0 0 0
Here, R1=E −3 [2, 1], E −1 [3, 1], E −2 [4, 1], R2 = E −1 [2, 1], E −4 [3, 2], E −6 [4, 2]
and R3 = E 1/2 [1, 3], E −1/2 [2, 3], E −6 [4, 3]. Notice that the matrix B, which is the
RREF of A, is given by
B = E −6 [4, 3] E −1/2 [2, 3] E 1/2 [1, 3] E 1/3 [3] E −6 [4, 2] E −4 [3, 2]
E −1 [2, 1] E 1/2 [2] E −2 [4, 1] E −1 [3, 1] E −3 [2, 1] A. 

The number of pivots in the RREF of a matrix A is called the rank of the matrix;
we denote it by rank(A).
Let A ∈ Fm×n . Suppose A has the columns as u 1 , . . . , u n ; these are column vectors
from Fm×1 . Thus, we write A = u 1 · · · u n .
20 1 Matrix Operations

Let B be the RREF of A obtained by applying a sequence of elementary row


operations. Let E be the m × m invertible matrix, which is the product of the corre-
sponding elementary matrices, so that

E A = E u 1 · · · u n = B.

Suppose rank(A) = r. Then, the standard basis vectors e1 , . . . , er of Fm×1 occur


as the pivotal columns in B, in that order. Denote the n − r non-pivotal columns in
B as v1 , . . . , vn−r , occurring in B in that order.
Observation 1.2 In B, if vi occurs between the pivotal columns e j and e j+1 , then
vi = [a1 , . . . a j , 0, 0, . . . , 0]t = a1 e1 + · · · + a j e j for some a1 , . . . , a j ∈ F.
Notice that the scalars a1 , . . . , a j in the above observation need not all be nonzero.
If e1 occurs as ki th column in B, e2 occurs as the k2 th column in B, and so on,
then Eu k1 = e1 , Eu k2 = e2 , . . . . In the notation of Observation 1.2,

vi = a1 Eu k1 + · · · + a j Eu k j .

Then, E −1 vi = a1 u k1 + · · · + a j u k j . As E −1 vi is the corresponding column of A,


we observe the following.
Observation 1.3 In B, if a vector vi = [a1 , . . . , a j , 0, 0, . . . , 0]t occurs as the kth
column, and prior to it occur the standard basis vectors e1 , . . . , e j (and no other) in
the columns k1 , . . . , k j , respectively, then u k = a1 u k1 + · · · + a j u k j .
In Example 1.4, the first two columns appear as pivotal columns and the third one
is a non-pivotal column in the RREF. Observation 1.2 says that the third column of
A is some scalar times the first column plus a scalar times the second column; the
scalars are precisely the entries in the third column of B. That is,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1
⎢ 7⎥ ⎢ ⎥
3 ⎢ 3⎥
⎢ ⎥
1 ⎢ 5⎥
⎢ ⎥= + .
⎣ 4⎦ 2 ⎣ 1⎦ 2 ⎣ 5⎦

7 2 8

Notice that β1 u k1 + · · · + βr u kr = β1 E −1 e1 + · · · + βr E −1 er . Since er +1 is not


expressible in the form α1 e1 + · · · + αr er , we see that E −1 er +1 is not expressible in
the form α1 E −1 e1 + · · · + αr E −1 er . Therefore, we observe the following:
Observation 1.4 If rank(A) = r < m, then E −1 er +1 , . . . , E −1 em are not express-
ible in the form α1 u 1 + · · · + αn u n for any α1 , . . . , αn ∈ F.
Given a matrix A, our algorithm produces a unique matrix B in RREF. It raises
the question whether by following another algorithm that employs elementary row
operations on A, we would obtain a matrix in RREF different from B? The following
result shows that this is not possible; the row reduced echelon form of a matrix is
canonical.
1.5 Row Reduced Echelon Form 21

Theorem 1.1 Let A ∈ Fm×n . There exists a unique matrix in Fm×n in row reduced
echelon form obtained from A by elementary row operations.

Proof Suppose B, C ∈ Fm×n are matrices in RREF such that each has been obtained
from A by elementary row operations. Since elementary matrices are invertible,
B = E 1 A and C = E 2 A for some invertible matrices E 1 , E 2 ∈ Fm×m . Now, B =
E 1 A = E 1 (E 2 )−1 C. Write E = E 1 (E 2 )−1 to have B = EC, where E is invertible.
Assume, on the contrary, that B = C. Then, there exists a column index, say
k ≥ 1, such that the first k − 1 columns of B coincide with the first k − 1 columns
of C, respectively; and the kth column of B is not equal to the kth column of C. Let
u be the kth column of B, and let v be the kth column of C. We have u = Ev and
u = v.
Suppose the pivotal columns that appear within the first k − 1 columns in C are
e1 , . . . , e j . Then, e1 , . . . , e j are also the pivotal columns in B that appear within the
first k − 1 columns. Since B = EC, we have C = E −1 B; consequently,

e1 = Ee1 = E −1 e1 , . . . , e j = Ee j = E −1 e j .

Since C is in RREF, either u = e j+1 or there exist scalars α1 , . . . , α j such that


u = α1 e1 + · · · + α j e j . The latter case includes the possibility that u = 0. Similarly,
either v = e j+1 or v = β1 e1 + · · · + β j e j for some scalars β1 , . . . , β j . We consider
the following exhaustive cases.
If u = e j+1 and v = e j+1 , then u = v.
If u = e j+1 and v = β1 e1 + · · · + β j e j , then

u = Ev = β1 Ee1 + · · · + β j Ee j = β1 e1 + · · · + β j e j = v.

If u = α1 e1 + · · · + α j u j (and whether v = e j+1 or v = β1 e1 + · · · + β j e j ), then

v = E −1 u = α1 E −1 e1 + · · · + α j E −1 e j = α1 e1 + · · · + α j e j = u.

In either case, u = v; this is a contradiction. Therefore, B = C. 

Theorem 1.1 justifies our use of the term the RREF of a matrix. Thus, the rank of
a matrix does not depend on which algorithm we have followed in reducing it to its
RREF.

Exercises for Sect. 1.5


1. Which of the following matrices are in RREF, and which are not?
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1210 1200 1001
(a) ⎣ 0 0 1 3 ⎦ (b) ⎣ 0 0 1 3 ⎦ (c) ⎣ 0 1 0 1 ⎦
0000 0000 0011
22 1 Matrix Operations

2. Compute the RREF of the following matrices:


⎡ ⎤
⎡ ⎤ ⎡ ⎤ 1 2 1 −1
001 2 −1 −1 0 ⎢0 2 3 3⎥
(a) ⎣ 0 1 0 ⎦ (b) ⎣ 1 0 1 −4 ⎦ (c) ⎢ ⎣ 1 −1 −3 −4 ⎦

100 0 1 −1 −4
1 1 5 −2
3. In Example 1.4, let u i be the ith column of A, and let w j be the jth row of A.
Let B = E A be the RREF of A.
(a) Verify that Eu 2 = e2 .
(b) Find a, b ∈ R such that u 3 = au 1 + bu 2 using the RREF of A.
(c) Determine a, b, c ∈ R such that w4 = aw1 + bw2 + cw3 .
4. Show that if an n × n matrix in RREF has rank ⎡ n, then it is the
⎤ identity matrix.
1 2 0 3 1 −2
⎢0 0 1 2 4 5⎥
5. Suppose the RREF of a matrix A is equal to ⎢ ⎣0 0 0 0 0 0⎦.

00000 0
If A1 = [1, 1, 3, 4]t and A3 = [2, −1, 1, 3]t , then determine A6 .
6. Consider the row vectors v1 = [1, 2, 3, 4], v2 = [2, 0, 1, 1], v3 = [−3, 2, 1, 2],
and v4 = [1, −2, −2, −3]. Construct a row vector v ∈ R4×1 which is not express-
ible as av1 + bv2 + cv3 + dv4 for any a, b, c, d ∈ R.
[Hint: Compute the RREF of A = [v1t v2t v3t v4t ]. ]

1.6 Determinant

There are two important quantities associated with a square matrix. One is the trace,
and the other is the determinant.
The sum of all diagonal entries of a square matrix is called the trace of the matrix.
That is, if A = [ai j ] ∈ Fm×m , then

n
tr(A) = a11 + · · · + ann = akk .
k=1

In addition to tr(Im ) = m and tr(0) = 0, the trace satisfies the following properties:
1. tr(α A) = α tr(A) for each α ∈ F.
2. tr(At ) = tr(A) and tr(A∗ ) = tr(A).
3. tr(A + B) = tr(A) + tr(B).
4. tr(AB) = tr(B A).
5. tr(A∗ A) = 0 iff tr(A A∗ ) = 0 iff A = 0.
Observe that (4) does not assert 
that the trace of a product is equal to the product
m m
of their traces. Further, tr(A∗ A) = i=1 j=1 |ai j | proves (5).
2
1.6 Determinant 23

The determinant of a square matrix A = [ai j ] ∈ Fn×n , written as det(A), is


defined inductively as follows:
1. If n = 1, then det(A) = a11 .
2. If n > 1, then det(A) = nj=1 (−1)1+ j a1 j det(A1 j ),
where A1 j ∈ F(n−1)×(n−1) is obtained from A by deleting the first row and the jth
column of A.
When A = [ai j ] is written showing all its entries, we also write det(A) by replacing
the two big closing brackets [ and ] by two vertical bars | and |. For a 2 × 2 matrix,
its determinant is seen as follows:
 
a11 a12 
 
a21 a22  = (−1) a11 det[a22 ] + (−1) a12 det[a21 ] = a11 a22 − a12 a21 .
1+1 1+2

Similarly, for a 3 × 3 matrix, we need to compute three 2 × 2 determinants. For


instance,
 
 1 2 3      
       
2 3 1 = (−1)1+1 × 1 × 3 1 + (−1)1+2 × 2 × 2 1 + (−1)1+3 × 3 × 2 3
  1 2  3 2  3 1
3 1 2 
     
3 1  2 1  2 3
= 1 ×   − 2 ×   + 3 ×  
   
12 32 31
= (3 × 2 − 1 × 1) − 2 × (2 × 2 − 1 × 3) + 3 × (2 × 1 − 3 × 3)
= 5 − 2 × 1 + 3 × (−7) = −18.

For a lower triangular matrix, we see that


 
a11   
  a22 
a12 a22   
  a23 a33 
a13 a23 a33   
  = a11  . ..  = · · · = a11 a22 · · · ann .
 . ..   .. . 
 .. .   
  a · · · ann 
a ··· ann  n1
n1

In general, the determinant of any triangular matrix (upper or lower) is the product
of its diagonal entries. In particular, the determinant of a diagonal matrix is also the
product of its diagonal entries. Thus, if I is the identity matrix of order n, then
det(I ) = 1 and det(−I ) = (−1)n .
Our definition of determinant expands the determinant in the first row. In fact,
the same result may be obtained by expanding it in any other row or even in any
column. To mention some similar properties of the determinant, we introduce some
terminology.
Let A ∈ Fn×n . The sub-matrix of A obtained by deleting the ith row and the jth
column is called the (i, j)th minor of A and is denoted by Ai j .
24 1 Matrix Operations

The (i, j)th co-factor of A is (−1)i+ j det(Ai j ); it is denoted by Ci j (A). Some-


times, when the matrix A is fixed in a context, we write Ci j (A) as Ci j .
The adjugate of A is the n × n matrix obtained by taking transpose of the matrix
whose (i, j)th entry is Ci j (A); it is denoted by adj(A). That is, adj(A) ∈ Fn×n is the
matrix whose (i, j)th entry is the ( j, i)th co-factor C ji (A).
Also, we write Ai (x) for the matrix obtained from A by replacing its ith row by
a row vector x of appropriate size.
Let A ∈ Fn×n . Let i, j, k ∈ {1, . . . , n}. Let E[i, j], E α [i] and E α [i, j] be the ele-
mentary matrices of order n for 1 ≤ i = j ≤ n and a nonzero scalar α. Then, the
following statements are true.
1. det(E[i, j] A) = −det(A).
2. det(E α [i] A) = α det(A).
3. det(E α [i, j] A) = det(A).
4. If some row of A is the zero vector, then det(A) = 0.
5. If one row of A is a scalar multiple of another row, then det(A) = 0.
6. For any i ∈ {1, . . . , n}, det( Ai (x + y) ) = det( Ai (x) ) + det( Ai (y) ).
7. det(At ) = det(A) and det(A∗ ) = det(A).
8. If A is a triangular matrix, then det(A) is equal to the product of the diagonal
entries of A.
9. det(AB) = det(A) det(B) for any matrix B ∈ Fn×n .
10. A adj(A) = adj(A)A = det(A) I.
11. A is invertible iff det(A) = 0.
Elementary column operations are operations similar to row operations, but with
columns instead of rows. Notice that since det(At ) = det(A), the facts concerning
elementary row operations also hold true if elementary column operations are used.
Using elementary (row and column) operations, the computational complexity for
evaluating a determinant can be reduced drastically. The trick is to bring a matrix to
a triangular form by using elementary row operations, so that the determinant of the
ensuing triangular matrix can be computed easily.
Example 1.5
       
 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
   
−1 1 0 1 R1 0 1 0 2 R2 0 1 0 2 R3 0 1 0 2
 =  =  =  = 8.
−1 −1 1    4
 1 0 −1 1 2 0 0 1 4 0 0 1
−1 −1 −1 1 0 −1 −1 2 0 0 −1 4 0 0 0 8

Here, R1 = E 1 [2, 1], E 1 [3, 1], E 1 [4, 1]; R2 = E 1 [3, 2], E 1 [4, 2]; R3 = E 1 [4, 3]. 
Example 1.6 See that the following is true, for verifying Property (6) as mentioned
above:      
 3 1 2 4  1 0 0 1  2 1 2 3
     
−1 1 0 1    
  = −1 1 0 1 + −1 1 0 1 . 
−1 −1 1 1 −1 −1 1 1 −1 −1 1 1
     
−1 −1 −1 1 −1 −1 −1 1 −1 −1 −1 1
1.6 Determinant 25

Exercises for Sect. 1.6


1. Prove the properties of the trace of a matrix as mentioned in the text.
2. For each n > 2, construct an n × n nonzero matrix where no row is a scalar
multiple of another row but its determinant is 0.
3. Is it true that if det(A) = 0, then among the rows R1 , . . . , Rn of A, some row,
say, Ri can be written as Ri = α1 R1 + · · · + αi−1 Ri−1 + αi+1 Ri+1 + · · · + αn Rn
for some scalars α j ? What about columns instead of rows?
4. Let a1 , . . . , an ∈ C. Let A be the n × n matrix whose first row has all entries
as 1 and whose kth row has entries a1k−1 , . . . , ank−1 in that order. Show that
det(A) = (−1) /2 1≤i< j≤n (ai − a j ). ⎡
n(n−1)

1 0 0 1
⎢ −1 1 0 1 ⎥
5. Compute A−1 using adj(A), where A = ⎢ ⎣ −1 −1 1 1 ⎦ .

−1 −1 −1 1
6. Let A, B ∈ F3×3 with det(A) = 2 and det(B) = 3. Determine
(a) det(4 A) (b) det(AB) (c) det(5AB) (d) det(2 A−1 B)
7. Let A, B ∈ F , and let E = [ei j ] with e11 = e22 = 0, e12 = e21 = 1. Show that
2×2

if B = E A, then det(A + B) = det(A) + det(B).


8. Give examples of A and B in R2×2 so that det(A + B) = det(A) + det(B).

1.7 Computing Inverse of a Matrix

The adjugate property of the determinant provides a way to compute the inverse
of a matrix, provided it is invertible. However, it is very inefficient. We may use
elementary row operations to compute the inverse. Our computation of the inverse
bases on the following fact.

Theorem 1.2 A square matrix is invertible iff it is a product of elementary matrices.

Proof Since elementary matrices are invertible, so is their product. Conversely, sup-
pose that A is an invertible matrix. Let E A−1 be the RREF of A−1 . If E A−1 has a
zero row, then E = E A−1 A also has a zero row. But E is a product of elementary
matrices, which is invertible; it does not have a zero row. Therefore, E A−1 does not
have a zero row. Then, each row in the square matrix E A−1 has a pivot. But the only
square matrix in RREF having a pivot at each row is the identity matrix. Therefore,
E A−1 = I. That is, A = E, a product of elementary matrices. 

Now, suppose a matrix A is invertible. Then, there exist elementary matrices


E 1 , . . . , E m such that A = E 1 · · · E m . It follows that A−1 = E m−1 · · · E 1−1 I and
E m−1 · · · E 1−1 A = I. Since I is in RREF, it follows from the uniqueness of RREF
that the row reduced form of A is I. Therefore, if a sequence of elementary row
operations applied on A results in I, then the same sequence applied on I will result
in A−1 .
26 1 Matrix Operations

This suggests applying the same elementary row operations on both A and I
simultaneously so that A is reduced to its RREF. For this purpose, we introduce
the notion of an augmented matrix. If A ∈ Fm×n and B ∈ Fm×k , then the matrix
[A | B] ∈ Fm×(n+k) obtained from A and B by writing first all the columns of A and
then the columns of B, in that order, is called an augmented matrix. The vertical bar
shows the separation of columns of A and of B, though conceptually unnecessary.
For computing the inverse of a matrix, we start with the augmented matrix [A | I ].
We then apply elementary row operations for reducing A to its RREF, while simulta-
neously applying the same operations on the entries of I. This means we pre-multiply
the matrix [A | I ] with a product E of elementary matrices so that E A is in RREF.
In block form, our result is the augmented matrix [E A | E I ]. If A is invertible, then
E A = I and then E I = A−1 . Once the A portion has been reduced to I, the I portion
is A−1 . On the other hand, if A is not invertible, then its RREF will have at least one
zero row.

Example 1.7 Determine inverses of the following matrices, if possible:


⎡ ⎤ ⎡ ⎤
1 −1 2 0 1 −1 2 0
⎢ −1 0 0 2 ⎥ ⎢ −1 0 0 2 ⎥
A=⎢ ⎥
⎣ 2 1 −1 −2 ⎦ , B=⎢ ⎥
⎣ 2 1 −1 −2 ⎦ .
1 −2 4 2 0 −2 0 2

We augment A with an identity matrix to get


⎡ ⎤
1 −1 2 0 1 0 0 0
⎢ −1 0 0 2 0 1 0 0⎥
⎢ ⎥
⎣ 2 1 −1 −2 0 0 1 0⎦
1 −2 4 2 0 0 0 1

Next, we use elementary row operations in order to reduce A to its RREF. Since
a11 = 1, it is a pivot. To zero-out the other entries in the first column, we use the
sequence of elementary row operations E 1 [2, 1], E −2 [3, 1], E −1 [4, 1], and obtain
⎡ ⎤
1 −1 2 0 1 0 0 0
⎢ 0 −1 2 2 1 1 0 0⎥
⎢ ⎥
⎣ 0 3 −5 −2 −2 0 1 0⎦
0 −1 2 2 −1 0 0 1

The pivot is −1 in (2, 2) position. Use E −1 [2] to make the pivot 1. And then, use
E 1 [1, 2], E −3 [3, 2], E 1 [4, 2] to zero-out all non-pivot entries in the pivotal column:
⎡ ⎤
1 0 0 −2 0 −1 0 0
⎢ 0 1 −2 −2 −1 −1 0 0⎥
⎢ ⎥
⎣ 0 0 1 4 1 3 1 0⎦
0 0 0 0 −2 −1 0 1
1.7 Computing Inverse of a Matrix 27

Since a zero row has appeared in the A portion of the augmented matrix, we
conclude that A is not invertible. You see that the second portion of the augmented
matrix has no meaning now. However, it records the elementary row operations which
were carried out in the reduction process. Verify that this matrix is equal to

E 1 [4, 2] E −3 [3, 2] E 1 [1, 2] E −1 [2] E −1 [4, 1] E −2 [3, 1] E 1 [2, 1].

and that the first portion is equal to this matrix times A.


For B, we proceed similarly. The augmented matrix [B | I ] with the first pivot
looks like: ⎡ ⎤
1 −1 2 0 1 0 0 0
⎢ −1 0 0 2 0 1 0 0 ⎥
⎢ ⎥
⎣ 2 1 −1 −2 0 0 1 0 ⎦
0 −2 0 2 0 0 0 1

The sequence of elementary row operations E 1 [2, 1], E −2 [3, 1] yields


⎡ ⎤
1 −1 2 0 1 0 0 0
⎢ 0 −1 2 2 1 1 0 0⎥
⎢ ⎥
⎣ 0 3 −5 −2 −2 0 1 0⎦
0 −2 0 2 0 0 0 1

Next, the pivot is −1 in (2, 2) position. Use E −1 [2] to get the pivot as 1. And
then, E 1 [1, 2], E −3 [3, 2], E 2 [4, 2] gives
⎡ ⎤
1 0 0 −2 0 −1 0 0
⎢ 0 1 −2 −2 −1 −1 0⎥
⎢ 0 ⎥
⎣ 0 0 1 4 1 3 1 0⎦
0 0 −4 −2 −2 −2 0 1

Next pivot is 1 in (3, 3) position. Now, E 2 [2, 3], E 4 [4, 3] produces


⎡ ⎤
1 0 0 −2 0 −1 0 0
⎢ 0 1 0 61 5 2 0⎥
⎢ ⎥
⎣ 0 0 1 41 3 1 0⎦
0 0 0 14 2 10 4 1

Next pivot is 14 in (4, 4) position. Use E 1/14 [4] to get the pivot as 1. Use
E 2 [1, 4], E −6 [2, 4], E −4 [3, 4] to zero-out the entries in the pivotal column:
⎡ ⎤
1 0 0 0 2/7 3/7 4/7 1/7
⎢ 1/7 5/7 2/7 ⎥
⎢ 0 1 0 0 −3/7 ⎥
⎢ ⎥
⎣ 0 0 1 0 3/7 1/7 −1/7 −2/7 ⎦

0 0 0 1 1/7 5/7 2/7 1/14


28 1 Matrix Operations
⎡ ⎤
2 3 4 1

1 ⎢1 5 2 −3 ⎥
Thus, B −1 = 7⎣ ⎥ . Verify that B −1 B = B B −1 = I. 
3 1 −1 −2 ⎦
1 5 2 1/2

Observe that if a matrix is not invertible, then our algorithm for reduction to RREF
produces a pivot in the I portion of the augmented matrix.
Exercises for Sect. 1.7
1. Compute the inverses of the following matrices, if possible:
⎡ ⎤
⎡ ⎤ ⎡ ⎤ 3 1 1 2
2 1 2 1 4 −6 ⎢ 1 2 0 1⎥
(a) ⎣ 1 3 1 ⎦ (b) ⎣ −1 −1 3 ⎦ (c) ⎢⎣ 1 1 2 −1 ⎦

−1 1 2 1 −2 3
−2 1 −1 3
21 −1
2. Let A = . Express A and A as products of elementary matrices.
64
52 34
3. Given matrices A = and B = , find matrices X and Y such that
31 12
AX = B and Y A = B. [Hint: Both A and B are invertible.]
⎡ ⎤
0 1 0
4. Let A = ⎣ 0 0 1 ⎦ , where b, c ∈ C. Show that A−1 = bI + c A.
1 −b −c
5. Show that if A is an upper triangular invertible matrix, then so is A−1 .
6. Show that if A is a lower triangular invertible matrix, then so is A−1 .
7. Can every square matrix be written as a sum of two invertible matrices?
8. Can every invertible matrix be written as a sum of two non-invertible matrices?

1.8 Problems

1. Let A and D be matrices of order n. Suppose that D is a diagonal matrix. Describe


the products AD and D A.
2. Give examples of matrices A and B so that
(a) A2 − B 2 = (A + B)(A − B) (b) (A + B)2 = A2 + 2 AB + B 2
3. Find nonzero matrices A, B and C such that A = B and AC = BC.
4. Construct a 2 × 2 matrix A with each entry nonzero so that A2 = 0. Can such a
matrix A satisfy A∗ = A?
5. Construct a matrix A and column vectors x = y such that Ax = Ay. Can such
a matrix A be invertible?
6. Construct matrices A and B so that AB = A and B = I. Is such a matrix A
invertible?
7. Give infinitely many 2 × 2 invertible matrices A such that A−1 = At .
8. Give infinitely many 2 × 2 matrices A satisfying A2 = I.
1.8 Problems 29

9. Let A ∈ Fn×n satisfy A2 = 0. Show that I − A is invertible. What is the inverse


of I − A?
10. Let A ∈ Fn×n satisfy Ak+1 = 0 for some k ∈ N. Show that I − A is invertible.
[Hint: What could be the inverse?]
11. For n ∈ N, compute An , where A = [ai j ] ∈ F2×2 is given by
(a) a11 = a12 = a21 = a22 = 1 (b) a11 = −a12 = −a21 = a22 = 21
12. A square matrix A is called idempotent iff A2 = A. Show the following:
(a) Let A be a 3 × 3 matrix whose third row has each entry as 21 and all other
entries as 41 . Then, A is idempotent.
(b) A diagonal matrix with diagonal entries in {0, 1} is idempotent.
(c) If A is idempotent and B is invertible, then B −1 AB is idempotent.
(d) If A is idempotent, then so is I − A.
(e) If A is idempotent, then I + A is invertible.
(f) If A is a square matrix with A2 = I, then B = 21 (I + A) and C = 21 (I − A)
are idempotent, and BC = 0.
13. Let B be the RREF of an m × n matrix A of rank r. Show the following:
(a) Let u 1 , . . . , u n be the columns of B. Let k1 , . . . , kr be the column indices
of all pivotal columns in B. If v = α1 u 1 + · · · + αn u n for some scalars αs ,
then v = β1 u k1 + · · · + βr u kr for some scalars βs . Moreover, among these r
columns u k1 , . . . , u kr , no vector u ki can be expressed as u ki = γ1 u k1 + · · · +
γi−1 u ki−1 + γi+1 u ki+1 + · · · + γr u kr for scalars γs .
(b) Let w j1 , . . . , w jr be the rows of A which have become the pivotal rows in
B. Then, any row w of A can be written as w = α1 w j1 + · · · + αr w jr for
some scalars αs . Further, no vector w ji can be written as w ji = β1 w j1 + · · · +
βi−1 w ji−1 + βi+1 w ji+1 + · · · + βr w jr for scalars βs .
14. Let v1 , . . . , vn ∈ Fn×1 be such that the matrix P = [v1 · · · vn ] is invert-
ible. Let v ∈ Fn×1 . For α1 , . . . , αn ∈ F, show that v = α1 v1 + · · · + αn vn iff
P −1 v = [α1 , . . . , αn ]t . Connect this result to the RREF of the augmented matrix
[v1 · · · ⎡vn | v]. ⎤
2 1 1
15. Let A = ⎣4 1 3⎦ . Determine a product E of elementary matrices so that U =
6 4 5
E A is upper triangular. With L = E −1 , we have A = LU. What type of matrix
is L?
16. Let A ∈ Cn×n . Show the following:
(a) If tr(A A∗ ) = 0, then A = 0. (b) If A∗ A = A2 , then A∗ = A.
17. Suppose each entry of an n × n matrix A is an integer. If det(A) = ±1, then
prove that each entry of A−1 is also an integer.
18. The Vandermonde matrix with numbers a1 , . . . , an+1 is a matrix of order n + 1
whose ith row is [a1i−1 , . . . , an+1
i−1
] for 1 ≤ i ≤ n + 1. Show that the determinant
of such a matrix is given by the product i< j (ai − a j ).
30 1 Matrix Operations

19. Let A, E ∈ Fm×m , B, F ∈ Fm×n , C, G ∈ Fn×m , and let D, H ∈ Fn×n . Show that

A B E F AE + BG AF + B H
= .
C D G H C E + DG C F + D H

20. Let A, B, C, D, E, F, G, H be as in Problem 19. Show that


−1
A B (A − B D −1 C)−1 A−1 B(C A−1 B − D)−1
= −1 −1 −1
C D (C A B − D) C A (D − C A−1 B)−1

provided all the inverses exist.


A B
21. Let A, B, C, D, E, F, G, H be as in Problem 19. Let M = . Show the
C D
following:
(a) If B = 0 or C = 0, then det(M) = det(A) · det(B). 
(b) If det(A) = 0, then det(M) = det(A) · det(D − C A−1 B) .
I −A−1 B
[Hint: Consider the matrix M m .]
0 In
22. Elementary column operations work with the columns of a matrix in a similar
fashion as elementary row operations work with the rows. Let A be an m × n
matrix. Show the following:
(a) A E[i, j] is obtained from A by exchanging the ith and jth columns.
(b) A E α [i] is obtained from A by multiplying α with the ith column.
(c) A E α [i, j] is obtained from A by adding α times ith column to the jth column.
23. What simpler form can you obtain by using a sequence of elementary column
operations on a matrix which is already in RREF?
24. Let E i j denote the n × n matrix with its (i, j)th entry as 1, and all other entries
as 0. Let A be any n × n matrix. Show the following:
(a) If i = j, then E i2j = 0 and (I + E i j )−1 = I − E i j .
(b) E ii2 = E ii and (I + E ii )−1 = I − 21 E ii .
(c) If AE i j = E i j A for all i, j ∈ {1, . . . , n}, then a11 = a22 = · · · = ann , and
ai j = 0 for i = j.
(d) If AB = B A for all invertible n × n matrices B, then A = α I for some scalar
α.
25. Let A = [ai j ] ∈ Fn×n , B = diag(1, 2, . . . , n), C = diag(a11 , a22 , . . . , ann ), and
let D ∈ Fn×n have its first row as [1, 2, . . . , n], all diagonal entries as 1, and all
other entries as 0. Show the following
(a) If AB = B A, then ai j = 0 for all i = j.
(b) If C D = DC, then a11 = a22 = · · · = ann .
(c) If AM = M A for all invertible matrices M ∈ Fn×n , then A = α I for some
scalar α.
Chapter 2
Systems of Linear Equations

2.1 Linear Independence

In the reduction to RREF, why some rows are reduced to zero rows and why the
others are not reduced to zero rows? Similarly, in the RREF of a matrix, why some
columns are pivotal and others are not?
Recall that a row vector in F1×n and a column vector in Fn×1 are both written
uniformly as an n-tuple (a1 , . . . , an ) in Fn . Such an n-tuple of scalars from F is
interpreted as either a row vector with n components or a column vector with n
components, as the case demands. Thus, an n-tuple of scalars is called a vector in
Fn .
The sum of two vectors from Fn and the multiplication of a vector from Fn
by a scalar follow those of the row and/or column vectors. That is, for β ∈
F, (a1 , . . . , an ), (b1 , . . . , bn ) ∈ Fn , we define

(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn ),


β(a1 , . . . , an ) = (βa1 , . . . , βan ).

Let v1 , . . . , vm ∈ Fn . The vector α1 v1 + · · · + αm vm ∈ Fn is called a linear com-


bination of v1 , . . . , vm , where α1 , . . . , αm are some scalars from F.
By taking some particular scalars αi , if the sum α1 v1 + · · · + αm vm evaluates to
a vector v, then we also say that v is a linear combination of the vectors v1 , . . . , vm .
For example, one linear combination of v1 = [1, 1] and v2 = [1, −1] is

2[1, 1] + 1[1, −1].

This linear combination evaluates to [3, 1]. We say that [3, 1] is a linear combination
of v1 and v2 . Is [4, −2] a linear combination of v1 and v2 ? Yes, since

[4, −1] = 1[1, 1] + 3[1, −1].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 31


A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_2
32 2 Systems of Linear Equations

In fact, every vector in F1×2 is a linear combination of v1 and v2 . Reason:

[a, b] = a+b
2
[1, 1] + a−b
2
[1, −1].

However, every vector in F1×2 is not a linear combination of [1, 1] and [2, 2].
Reason? Any linear combination of these two vectors is a scalar multiple of [1, 1].
Then, [1, 0] is not a linear combination of these two vectors.
Let A ∈ Fm×n be a matrix of rank r. Let u i1 , . . . , u ir be the r columns of A that
correspond to the pivotal columns in its RREF. Then among these r columns, no u i j
is a linear combination of others, and each other column in A is a linear combination
of these r columns.
Similarly, let wk1 , . . . , wkr be the r rows in A that correspond to the r nonzero
rows in its RREF (monitoring the row exchanges). Then among these r rows, no wk j
is a linear combination of others, and each other row of A is a linear combination of
these r rows.
The vectors v1 , . . . , vm in Fn are called linearly dependent iff at least one of
them is a linear combination of others. The vectors are called linearly independent
iff none of them is a linear combination of others.
For example, [1, 1], [1, −1], [3, 1] are linearly dependent vectors since [3, 1] =
2[1, 1] + [1, −1], whereas [1, 1], [1, −1] are linearly independent vectors in F1×2 .
If α1 = · · · = αm = 0, then the linear combination α1 v1 + · · · + αm vm evaluates
to 0. That is, the zero vector can always be written as a trivial linear combination.
However, a non-trivial linear combination of some vectors can evaluate to 0. For
instance, 2[1, 1] + [1, −1] − [3, 1] = 0. We guess that this can happen for linearly
dependent vectors, but may not happen for linearly independent vectors.
Suppose the vectors v1 , . . . , vm are linearly dependent. Then, one of them, say, vi
is a linear combination of others. That is, for some scalars α j ,

vi = α1 v1 + · · · + αi−1 vi−1 + αi+1 vi+1 + · · · + αm vm .

Then,

α1 v1 + · · · + αi−1 vi−1 + (−1)vi + αi+1 vi+1 + · · · + αm vm = 0.

Here, we see that a linear combination becomes zero, where at least one of the
coefficients, that is, the ith one, is nonzero. That is, a non-trivial linear combination
of v1 , . . . , vm exists which evaluates to 0.
Conversely, suppose that we have scalars β1 , . . . , βm not all zero such that

β1 v1 + · · · + βm vm = 0.

Say, the kth scalar βk is nonzero. Then,

1 
vk = − β1 v1 + · · · + βk−1 vk−1 + βk+1 vk+1 + · · · + βm vm .
βk
2.1 Linear Independence 33

That is, the vectors v1 , . . . , vm are linearly dependent.


Thus, we have proved the following:
v1 , . . . , vm are linearly dependent
iff α1 v1 + · · · + αm vm = 0 for scalars α1 , . . . , αm not all zero
iff the zero vector can be written as a non-trivial linear combination of v1 , . . . , vm .
The same may be expressed in terms of linear independence.

Theorem 2.1 Vectors v1 , . . . , vm ∈ Fn are linearly independent iff for all


α1 , . . . , αm ∈ F,
α1 v1 + · · · αm vm = 0 implies that α1 = · · · = αm = 0.

Theorem 2.1 provides a way to determine whether a finite number of vectors are
linearly independent or not. You start with a linear combination of the given vectors
and equate it to 0. Then, use the laws of addition and scalar multiplication to derive
that each coefficient in that linear combination is 0. Once you succeed, you conclude
that the given vectors are linearly independent. On the other hand, if it is not possible
to derive that each coefficient is 0, then from the proof of this impossibility you will
be able to express one of the vectors as a linear combination of the others. And this
would prove that the given vectors are linearly dependent.

Example 2.1 Are the vectors [1, 1, 1], [2, 1, 1], [3, 1, 0] linearly independent?
We start with an arbitrary linear combination and equate it to the zero vector. Then,
we solve the resulting linear equations to determine whether all the coefficients are
necessarily 0 or not. So, let

a [1, 1, 1] + b [2, 1, 1] + c [3, 1, 0] = [0, 0, 0].

Comparing the components, we have

a + 2b + 3c = 0, a + b + c = 0, a + b = 0.

The last two equations imply that c = 0. Substituting in the first, we see that
a + 2b = 0. This and the equation a + b = 0 give b = 0. Then, it follows that a = 0.
We conclude that the given vectors are linearly independent. 

Example 2.2 Are the vectors [1, 1, 1], [2, 1, 1], [3, 2, 2] linearly independent?
Clearly, the third one is the sum of the first two. So, the given vectors are linearly
dependent.
To illustrate our method, we start with an arbitrary linear combination and equate
it to the zero vector. We then solve the resulting linear equations to determine whether
all the coefficients are necessarily 0 or not. As earlier, let

a [1, 1, 1] + b [2, 1, 1] + c [3, 2, 2] = [0, 0, 0].

Comparing the components, we have


34 2 Systems of Linear Equations

a + 2b + 3c = 0, a + b + 2c = 0, a + b + 2c = 0.

The last equation is redundant. From the first and the second, we have

b + c = 0.

We may choose b = 1, c = −1 to satisfy this equation. Then from the second equa-
tion, we have a = 1. Our starting equation says that the third vector is the sum of the
first two. 

Be careful with the direction of implication here. Your workout must be in the
following form:
Assume α1 v1 + · · · + αm vm = 0. Then · · · Then α1 = · · · = αm = 0.
And that would prove linear independence.
To see how linear independence is helpful, consider the following system of linear
equations:
x1 +2x2 −3x3 = 2
2x1 −x2 +2x3 = 3
4x1 +3x2 −4x3 = 7

Here, we find that the third equation is redundant, since 2 times the first plus the
second gives the third. That is, the third one linearly depends on the first two. You can
of course choose any other equation here as linearly depending on the other two, but
that is not important and that may not be always possible. Now, take the row vectors
of coefficients of the unknowns along with the right-hand side, as in the following:

v1 = [1, 2, −3, 2], v2 = [2, −1, 2, 3], v3 = [4, 3, −4, 7].

We see that v3 = 2v1 + v2 , as it should be. That is, the vectors v1 , v2 , v3 are linearly
dependent. But the vectors v1 , v2 are linearly independent. Thus, solving the given
system of linear equations is the same thing as solving the system with only first two
equations.
For solving linear systems, it is of primary importance to find out which equations
linearly depend on others. Once determined, such equations can be thrown away, and
the rest can be solved.
As to our opening questions, now we know that in the RREF of a matrix, a column
that corresponds to a non-pivotal column is a linear combination of the columns
that correspond to the pivotal columns. Similarly, a zero row corresponds to one
(monitoring row exchanges) which is a linear combination of the nonzero (pivotal)
rows in the RREF. We will see this formally in the next section.
Exercises for Sect. 2.1
1. Check whether the given vectors are linearly independent:
(a) (1, 2, 6), (−1, 3, 4), (−1, −4, 2) in R3
(b) (1, 0, 2, 1), (1, 3, 2, 1), (4, 1, 2, 2) in C4
2.1 Linear Independence 35

2. Suppose that u, v, w are linearly independent in C5 . Are the following vectors


linearly independent?
(a) u + v, v + w, w + u
(b) u − v, v − w, w − u
(c) u, v + αw, w, where α is a nonzero real number.
(d) u + v, v + 2w, u + αv, v − αw, where α is a complex number.
3. Give three linearly dependent vectors in R2 such that none of the three is a scalar
multiple of another.
4. Suppose S is a finite set of vectors, and some v ∈ S is not a linear combination
of other vectors in S. Is S linearly independent?
5. Prove that nonzero vectors v1 , . . . , vm ∈ Fn are linearly dependent iff there exists
k > 1 such that the vectors v1 , . . . , vk−1 are linearly independent and vk is a linear
combination of v1 , . . . , vk−1 .

2.2 Determining Linear Independence

If a vector w is a linear combination of vectors v1 , v2 and if v1 , v2 are linear combi-


nations of u 1 , u 2 , then w is a linear combination of u 1 , u 2 . We look at the reduction
to RREF via this principle.
The reduction to RREF is achieved by performing elementary row operations. The
row exchanges do not disturb linear dependence or independence except reordering
the row vectors. So, let us look at the other types of elementary row operations.
Suppose the kth row of an m × n matrix A is a linear combination of its first r
rows. To fix notation, let v1 , . . . , vm be the rows of A. We have scalars α1 , . . . , αr
and k > r such that
vk = α1 v1 + · · · + αr vr .

Now, the rows of E β [i] A are v1 , . . . , vi−1 , βvi , vi+1 , . . . , vn for some β = 0.
If 1 ≤ i ≤ r, then vk = α1 v1 + · · · αi−1 vi−1 + αβii βi vi + αi+1 vi+1 + · · · + αr vr .
If i = k, then βvk = α1 βv1 + · · · + αr βvr .
If i > r, i = k, then vk = α1 v1 + · · · + αr vr .
In any case, the kth row of E β [i] A is a linear combination of the first r rows of
E β [i] A.
Similarly, the rows of E β [i, j] A are v1 , . . . , vi−1 , vi + βv j , vi+1 , . . . , vm . Sup-
pose that 1 ≤ j ≤ r.
If 1 ≤ i ≤ r, then vk = α1 v1 + · · · + αi (vi + βv j ) + · · · αr vr − αi βv j .
If i = k > r, then vk + βv j = α1 v1 + · · · + αi (vi + βv j ) + · · · αr vr .
If i > r, i = k, then vk = α1 v1 + · · · + αr vr .
In any case, if 1 ≤ j ≤ r, then the kth row of E β [i, j] A is a linear combination of
the first r of its rows. We summarize these facts as follows.
36 2 Systems of Linear Equations

Observation 2.1 Let A ∈ Fm×n . Let v1 , . . . , vm be the rows of A in that order. Let
E ∈ Fm×m be a product of elementary matrices of the types E β [i] and/or E β [i, j],
where 1 ≤ i, j ≤ r. Let w1 , . . . , wm be the rows of E A in that order. Let k > r. If the
kth row vk of A is a linear combination of the vectors v1 , . . . , vr , then the kth row
wk in E A is also a linear combination of w1 , . . . , wr .

In the notation of Observation 2.1, you can also show that if any row vector v is
a linear combination of v1 , . . . , vr , then it is a linear combination of w1 , . . . , wr and
vice versa.

Theorem 2.2 Let A ∈ Fm×n be a matrix of rank r. Let vi denote the ith row of
A. Suppose that the rows vi1 , . . . , vir have become the pivotal rows w1 , . . . , wr ,
respectively, in the RREF of A. Let v be any row of A other than vi1 , . . . , vir . Then,
the following are true:
(1) The vectors vi1 , . . . , vir are linearly independent.
(2) The row v of A has become a zero row in the RREF of A.
(3) The row v is a linear combination of vi1 , . . . , vir .
(4) The row v is a linear combination of w1 , . . . , wr .

Proof Monitoring the row exchanges, it can be found out which rows have become
zero rows and which rows have become the pivotal rows. Assume, without loss of
generality, that no row exchanges have been performed during reduction of A to
its RREF B. Then, B = E A, where E is a product of elementary matrices of the
forms E β [i] or E β [i, j]. The first r rows in B are the pivotal rows, i.e. i 1 = 1, i 2 =
2, . . . , ir = r. So, suppose that v1 , . . . , vr have become the pivotal rows w1 , . . . , wr
in B, respectively.
(1) If one of v1 , . . . , vr , say, vk is a linear combination of the others, then by Obser-
vation 2.1, the pivotal row wk is a linear combination of other pivotal rows. But
this is not possible. Hence among the vectors v1 , . . . , vr , none of them is a linear
combination of the others. Therefore, v1 , . . . , vr are linearly independent.
(2) Since rank(A) = r and v1 , . . . , vr have become the pivotal rows, no other row is
a pivotal row. That is, all other rows of A, including v, have become zero rows.
(3) The vectors wr +1 , . . . , wm are the zero rows in B. Each of them is a linear
combination of the pivotal rows w1 , . . . , wr . Now, the vectors w1 , . . . , wr are rows
in B, and v1 , . . . , vr are the corresponding rows in A = E −1 B, where E −1 is a
product of elementary matrices of the forms E β [i] or E β [i, j] with 1 ≤ j ≤ r. By
Observation 2.1, each of vr +1 , . . . , vm is a linear combination of v1 , . . . , vr .
(4) During row reduction, elementary row operations use the pivotal rows. Therefore,
each of the vectors v1 , . . . , vr is a linear combination of w1 , . . . , wr ; and each of the
vectors w1 , . . . , wr is a linear combination of v1 , . . . , vr . Then, it follows from (3)
that vr +k is a linear combination of w1 , . . . , wr also. 

Theorem 2.2 can be used to determine linear independence of vectors. Given


vectors v1 , . . . , vm ∈ F1×n , we form a matrix A by taking these as its rows. During
the reduction of A to its RREF, if a zero row appears, then the vectors are linearly
2.2 Determining Linear Independence 37

dependent. Else, the rank of A turns out to be m; consequently, the vectors are linearly
independent.
Example 2.3 To determine whether the vectors [1, 1, 0, 1], [0, 1, 1, −1], and
[1, 3, 2, −1] are linearly independent or not, we form a matrix with the given vectors
as its rows and then reduce it to its RREF. It is as follows.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 E [3,1] 1 1 0 1 1 0 −1 2
⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦
−1 R1

1 3 2 −1 0 2 2 −2 0 0 0 −4
⎡ ⎤
1 0 −1 0
R2 ⎢ ⎥
−→ ⎣ 0 1 1 0 ⎦ .
0 0 0 1

Here, R1 = E −1 [1, 2], E −2 [3, 2] and R2 = E −1/4 [3], E −2 [1, 3], E 1 [2, 3].
The last matrix is in RREF in which there is no zero row; each row has a pivot.
So, the original vectors are linearly independent. 
Example 2.4 Are the vectors [1, 1, 0, 1], [0, 1, 1, −1] and [2, −1, −3, 5] linearly
independent?
We construct a matrix with the vectors as rows and reduce it to RREF.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 E [3,1] 1 1 0 1 1 0 −1 −2
⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ −→ ⎣ 0 1 1 −1 ⎦ .
−2 R1

2 −1 −3 5 0 −3 −3 3 0 0 0 0

Here, R1 = E −1 [1, 2], E 3 [3, 2]. Since a zero row has appeared, the original vectors
are linearly dependent. Also, notice that no row exchanges were carried out in the
reduction process. So, the third vector is a linear combination of the first two, which
are linearly independent. 
Does reduction to RREF change the linear dependence or linear independence of
columns of a matrix?
Theorem 2.3 Let E ∈ Fm×m be invertible. Let v1 , . . . , vk , v ∈ Fm×1 and
α1 , . . . , αk ∈ F.
(1) v = α1 v1 + · · · αk vk iff Ev = α1 Ev1 + · · · + αk Evk .
(2) v1 , . . . , vk are linearly independent iff Ev1 , . . . , Evk are linearly independent.
Proof (1) If v = α1 v1 + · · · + αk vk , then multiplying E on the left, we have Ev =
α1 Ev1 + · · · + αk Evk . Conversely, if Ev = α1 Ev1 + · · · + αk Evk , then multiplying
E −1 on the left we have v = α1 v1 + · · · + αk vk .
(2) Vectors v1 , . . . , vk are linearly dependent iff there exist scalars β1 , . . . , βk not
all zero such that β1 v1 + · · · βk vk = 0. By (1), this happens iff there exist scalars
β1 , . . . , βk not all zero such that β1 Ev1 + · · · βk Evk = 0 iff the vectors Ev1 , . . . , Evk
are linearly dependent. 
38 2 Systems of Linear Equations

Since a product of elementary matrices is invertible, Theorem 2.3 implies that


reduction to RREF does not change the linear dependence or linear independence of
column vectors of a matrix. This, along with Observation 1.2, yields the following
result.

Theorem 2.4 Let v1 , . . . , vn ∈ Fm×1 . Let A = [v1 · · · vn ] ∈ Fm×n be the matrix


whose jth column is v j . Then, the following are true:
(1) Vectors v1 , . . . , vn are linearly independent iff rank(A) = n iff each column is a
pivotal column in the RREF of A.
(2) If the kth column in the RREF of A is a non-pivotal column, then it has the form
[a1 , a2 , . . . , a j , 0, . . . 0]t where j is the number of pivotal columns to the left
of this kth column, j < k, and vk = a1 v1 + · · · + a j v j .

Given vectors v1 , . . . , vn ∈ Fm×1 , we form the matrix A = [v1 · · · vn ] and then


apply Theorem 2.4. If the vectors are linearly dependent, then it will help us in finding
out a subset of vectors which are linearly independent. It will also let us know how
the dependent ones could be expressed as linear combinations of the independent
ones. If the vectors are linearly independent, then all columns of A in its RREF
will turn out to be pivotal columns. The advantage in working with columns is that
the RREF shows explicitly how a linearly dependent vector depends on the linearly
independent ones.
Moreover, if instead of column vectors, we are given with row vectors, then we
may work with their transposes. We solve Example 2.4 once more to illustrate this
point.

Example 2.5 To determine whether u 1 = [1, 1, 0, 1], u 2 = [0, 1, 1, −1]


u 3 = [2, −1, −3, 5] are linearly independent or not, we form the matrix u t1 u t2 u t3
and then reduce it to its RREF, as in the following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 2 1 0 2 1 0 2
⎢ 1 1 −1 ⎥ R1 ⎢ 0 1 −3 ⎥ R2 ⎢ 1 −3 ⎥
⎢ ⎥ ⎢ ⎥ −→ ⎢ 0 ⎥.
⎣ 0 1 −3 ⎦ −→ ⎣ 0 1 −3 ⎦ ⎣ 0 0 0⎦
1 −1 5 0 −1 3 0 0 0

Here, R1 = E −1 [2, 1], E −1 [4, 1] and R2 = E −1 [3, 2] E 1 [4, 2]. The first and second
columns are pivotal, and the third is non-pivotal. The components of the third column
in the RREF show that u t3 = 2 u t1 − 3 u t2 . Thus, u 3 = 2 u 1 − 3 u 2 . 

We also use the phrases such as linear dependence or independence for sets of vec-
tors. Given a set of vectors A = {v1 , . . . , vm }, we say that A is a linearly independent
set iff the vectors v1 , . . . , vm are linearly independent. Our method of determining
linear independence gives rise to the following useful result.

Theorem 2.5 Any set containing more than n vectors from Fn is linearly dependent.
2.2 Determining Linear Independence 39

Proof Without loss of generality, let S ⊆ Fn×1 have more than n vectors. So, let
v1 , . . . , vn+1 be distinct vectors from S. Consider the (augmented) matrix

A = [v1 · · · vn | vn+1 ].

Let B be the RREF of A. The reduction to RREF shows that there can be at most n
pivots in B and the last column in B is a non-pivotal column. By Observation 1.3,
vn+1 is a linear combination of the pivotal columns, which are among v1 , . . . , vn .
Therefore, A is linearly dependent. 

Exercises for Sect. 2.2


1. Using elementary row operations, determine whether the given vectors are linearly
dependent or independent in each of the following cases, by taking the given
vectors as rows of a matrix.
(a) [1, 2, 3], [4, 5, 6], [7, 8, 9]
(b) [1, 0, −1, 2, −3], [−2, 1, 2, 4, −1], [3, 0, −1, 1, 1], [2, 1, 1, −1, −2]
(c) [1, 0, −1, 2, −3], [−2, 1, 2, 4, −1], [3, 0, −1, 1, 1], [2, −1, 0, −7, 3]
(d) [1, i, −1, 1 − i], [i, −1, −i, 1 + i], [2, 0, 1, i], [1 + i, 1 − i, −1, −i]
2. Solve Exercise 1 by forming a matrix with its columns as the transposes of the
given vectors. If the vectors turn out to be linearly dependent, then express one
of them as a linear combination of the others.
3. Let A = [u 1 , u 2 , u 3 , u 4 , u 5 ] ∈ F4×5 . In each of the following cases, determine the
RREF of A.
(a) u 1 , u 2 , u 3 are linearly independent; u 4 = u 1 + 2u 2 , u 5 = u 1 − u 2 + 2u 3 .
(b) u 1 , u 2 , u 4 are linearly independent; u 3 = u 1 + 2u 2 , u 5 = u 1 − u 2 + 2u 4 .
(c) u 1 , u 3 , u 5 are linearly independent; u 2 = u 1 + 2u 3 , u 4 = u 1 − u 3 + u 5 .
4. Answer the following questions with justification:
(a) Is every subset of a linearly independent set linearly independent?
(b) Is every subset of a linearly dependent set linearly dependent?
(c) Is every superset of a linearly independent set linearly independent?
(d) Is every superset of a linearly dependent set linearly dependent?
(e) Is union of two linearly independent sets linearly independent?
(f) Is union of two linearly dependent sets linearly dependent?
(g) Is intersection of two linearly independent sets linearly independent?
(h) Is intersection of two linearly dependent sets linearly dependent?

2.3 Rank of a Matrix

Recall that the rank of a matrix A, denoted by rank(A), is the number of pivots in the
RREF of A. If rank(A) = r, then there are r number of linearly independent columns
in A and other columns are linear combinations of these r columns. The linearly
40 2 Systems of Linear Equations

independent columns correspond to the pivotal columns. Also, there exist r number
of linearly independent rows of A such that other rows are linear combinations of
these r rows. The linearly independent rows correspond to the pivotal rows, assuming
that we have monitored the row exchanges during the reduction of A to its RREF.
It raises a question. Suppose for a matrix A, we find r number of linearly inde-
pendent rows such that other rows are linear combinations of these r rows. Can it
happen that there are also k rows which are linearly independent and other rows are
linear combinations of these k rows, and that k = r ?

Theorem 2.6 Let A ∈ Fm×n . There exists a unique number r with 0 ≤ r ≤ min{m, n}
such that A has r number of linearly independent rows, and other rows of A are
linear combinations of these r ones. Moreover, such a number r is equal to the rank
of A.

Proof Let r = rank(A). Theorem 2.2 implies that there exist r rows of A which are
linearly independent and the other rows are linear combinations of these r rows.
Conversely, suppose there exist r rows of A which are linearly independent and
the other rows are linear combinations of these r rows. So, let i 1 , . . . , ir be the indices
of these r numbers of linearly independent rows of A. Consider the matrix

B = E[1, i 1 ] E[2, i 2 ] · · · E[r, ir ] A.

In the matrix B, those r linearly independent rows of A have become the first r
rows, and other rows are now placed as (r + 1)th row onwards. In the RREF of B,
the first r rows are pivotal rows, and other rows are zero rows. The matrix B has
been obtained from A by elementary row operations. By the uniqueness of RREF
(Theorem 1.1), A also has the same RREF as does B. The number of pivots in this
RREF is r. Therefore, rank(A) = r.
Moreover, when A = 0, the zero matrix, the number of pivots in A is 0. And if A is
a nonzero matrix, then in the RREF of A, there exists at least one pivot. The number
of pivots cannot exceed the number of rows; also, it cannot exceed the number of
columns. Therefore, r = rank(A) is a number between 0 and min{m, n}. 

Now if there are k number of linearly independent columns in an m × n matrix


A, where the other columns are linear combinations of these k ones, then the same
thing happens about rows in At . By Theorem 2.6, rank(At ) is equal to this k.
Conventionally, the maximum number of linearly independent rows of a matrix
is called the row rank of a matrix. Similarly, the maximum number of linearly
independent columns of a matrix is called its column rank. As we see, due to
Theorem 2.6, the row rank and the column rank of a matrix are well defined. To
connect the row rank and the column rank, we prove the following result.

Theorem 2.7 Let A ∈ Fm×n . Then, rank(At ) = rank(A). Consequently, both the
row rank of A and the column rank of A are equal to rank(A).
2.3 Rank of a Matrix 41

Proof Let B = E At be the RREF of At , where E is a suitable product of ele-


mentary matrices. Let rank(At ) = r. Then, there are r number of pivots in B. The
pivotal columns in B are e1 , . . . , er ∈ Fn×1 . Since E is invertible, by Theorem 2.3,
E −1 e1 , . . . , E −1 er are linearly independent columns of At , and the other columns of
At are linear combinations of these r ones. Then, (E −1 e1 )t , . . . , (E −1 er )t are the r
number of linearly independent rows of A and other rows are linear combinations of
these r rows. By Theorem 2.6, rank(A) = r. This proves that if rank(At ) = r, then
rank(A) = r.
Conversely, suppose rank(A) = r. As (At )t = A, we have rank((At )t ) = r. By
what we have just proved, rank(At ) = r.
Therefore, rank(At ) = r iff rank(A) = r. That is, rank(At ) = rank(A).
Then, it follows from Theorem 2.6 that the row rank of A is equal to rank(A),
which is equal to rank(At ), and that is equal to the column rank of A. 
⎡ ⎤
1 1 1 2 1
⎢ 1 2 1 1 1⎥
Example 2.6 Let A = ⎢ ⎥
⎣ 3 5 3 4 3⎦ . We compute its RREF as follows:
−1 0 −1 −3 −1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 2 1 1 1 1 2 1 1 0 1 3 1
⎢ 1 2 1 1 1⎥ R1 ⎢ 0 1 0 −1 0⎥ R2 ⎢ 0 −1 0⎥
⎢ ⎥ −→ ⎢ ⎥ −→ ⎢ 0 1 ⎥.
⎣ 3 5 3 4 3 ⎦ ⎣ 0 2 0 −2 0 ⎦ ⎣ 0 0 0 0 0⎦
−1 0 −1 −3 −1 0 1 0 −1 0 0 0 0 0 0

Here, R1 = E −1 [2, 1], E −3 [3, 1], E 1 [4, 1]; R2 = E −1 [1, 2], E −2 [3, 2],
E −1 [4, 2].
Thus, rank(A) = 2.
Also, we see that the first two rows of A are linearly independent, and
r ow(3) = r ow(1) + 2 × r ow(2), r ow(4) = r ow(2) − 2 × r ow(1).
Thus, the row rank of A is 2.
From the RREF of A, we observe that the first two columns of A are linearly
independent, and
col(3) = col(1), col(4) = 3 × col(1) − 2 × col(2), col(4) = col(1).
Therefore, the column rank of A is also 2. 
Example 2.7 Determine the rank of the matrix A in Example 1.4, and point out
which rows of A are linear combinations of other rows and which columns are linear
combinations of other columns, by reducing A to its RREF.
From Example 1.4, we have seen that
⎡ ⎤ ⎡ ⎤
1 0 3/2 0
1 1 2 0
⎢3 5 7 1⎥ E ⎢ 0 1 1/2 0 ⎥

A=⎣ ⎥ −→ ⎢ ⎥.
1 5 4 5 ⎦ ⎣ 0 0 0 1⎦
2 8 7 9 0 0 0 0
42 2 Systems of Linear Equations

The row operation E is given by

E = E −3 [2, 1], E −1 [3, 1], E −2 [4, 1] E −1 [2, 1], E −4 [3, 2], E −6 [4, 2],
E 1/2 [1, 3], E −1/2 [2, 3], E −6 [4, 3].

We see that rank(A) = 3, the number of pivots in the RREF of A. In this reduction,
no row exchanges have been used. Thus, the first three rows of A are the required
linearly independent rows. The fourth row is a linear combination of these three
rows. In fact,
r ow(4) = 3 r ow(1) + (−1) r ow(2) + 2 r ow(3).

The RREF also says that the third column is a linear combination of first and second.
Notice that the coefficients in such a linear combination are given by the entries of
the third column in the RREF. As we have seen earlier,
col(3) = 23 col(1) + 21 col(2). 

Since each entry of A is the complex conjugate of the corresponding entry of
At , Theorem 2.7 implies that

rank(A∗ ) = rank(At ) = rank(A).

Theorem 2.3 also implies that the column rank is well defined and is equal to the
rank. We generalize this theorem a bit.
Theorem 2.8 Let A ∈ Fm×n . Let P ∈ Fm×m and Q ∈ Fn×n be invertible matrices.
Then,
rank(P AQ) = rank(P A) = rank(AQ) = rank(A).
Proof Theorem 2.3 implies that the column rank of P AQ is same as the column
rank of AQ. Therefore, rank(P AQ) = rank(AQ). Also, since Q t is invertible, we
have rank(AQ) = rank(Q t At ) = rank(At ) = rank(A). 
In general, when the matrix product P AQ is well defined, we have

rank(P AQ) ≤ rank(A)

irrespective of whether P and Q are invertible or not.


Exercises for Sect. 2.3
1. Determine rank r of the following matrices. Find out the r linearly independent
rows and also the r linearly independent columns of the matrix. And then, express
4 − r rows as linear combinations of those r rows and 5 − r columns as linear

combinations ⎤ r columns.
of those ⎡ ⎤ ⎡ ⎤
1 2 1 1 1 0 2 1 0 1 1 0 0 1 0
⎢3 5 3 4 3⎥ ⎢ ⎥ ⎢ ⎥
(a) ⎢ ⎥ (b) ⎢3 0 3 0 3⎥ (c) ⎢0 5 0 4 0⎥
⎣1 1 1 2 1⎦ ⎣1 1 0 2 0 ⎦ ⎣0 0 1 0 1⎦
5 8 5 7 5 5 0 5 0 5 0 8 0 7 0
2.3 Rank of a Matrix 43

2. Let A ∈ Fn×n . Prove that A is invertible iff rank(A) = n iff det(A) = 0.


3. Let A ∈ Fm×n . Let P ∈ Fm×m . When does the RREF of P A coincide with the
RREF of A?
4. Let u, v ∈ Fn×1 . What is the rank of the matrix uvt ?
5. Show that if A ∈ Fn×n has rank 1, then there exist u, v ∈ Fn×1 such that A = uvt .

2.4 Solvability of Linear Equations

We now use matrices to settle some issues regarding solvability of systems of lin-
ear equations, also called linear systems. A linear system with m equations in n
unknowns looks as follows:

a11 x1 + a12 x2 + · · · a1n xn = b1


a21 x1 + a22 x2 + · · · a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · amn xn = bm

Solving such a linear system amounts to determining the unknowns x1 , . . . , xn


with known scalars ai j and bi . Using the abbreviation x = [x1 , . . . , xn ]t , b =
[b1 , . . . , bm ]t and A = [ai j ], the system can be written as

Ax = b.

Here, A ∈ Fm×n , b ∈ Fm×1 , and x is an unknown vector in Fn×1 . We also say that
the matrix A is the system matrix of the linear system Ax = b.
There is a slight deviation from our accepted symbolism. In case of linear systems,
we write b as a column vector and xi are unknown scalars.
Notice that if the system matrix A ∈ Fm×n , then the linear system Ax = b has m
number of equations and n number of unknowns.
A solution of the system Ax = b is any vector y ∈ Fn×1 such that Ay = b. In
such a case, if y = [c1 , . . . , cn ]t , then ci is called as the value of the unknown xi in
the solution y. A solution of the system is also written informally as

x1 = c1 , . . . , xn = cn .

If y = [c1 , . . . , cn ]t is a solution of Ax = b, and v1 , . . . , vn ∈ Fm×1 are the


columns of A, then from the above system, we see that

c1 v1 + · · · + cn vn = b.

Conversely, if b can be written this way for some scalars c1 , . . . , cn , then the vector
y = [c1 , . . . , cn ]t is a solution of Ax = b. So, we conclude that
44 2 Systems of Linear Equations

Ax = b has a solution iff b can be written as a linear combination of the columns of A.

Corresponding to the linear system, Ax = b is the homogeneous system

Ax = 0.

The homogeneous system always has a solution since y = 0 is a solution. If y is


a solution of Ax = 0, then so is αy for any scalar α. Therefore, the homogeneous
system has infinitely many solutions when it has a nonzero solution.

Theorem 2.9 Let A ∈ Fm×n , and let b ∈ Fm×1 . Then, the following statements are
true:
(1) If [A | b ] is obtained from [A | b] by applying a finite sequence of elementary
row operations, then each solution of Ax = b is a solution of A x = b and vice
versa.
(2) (Consistency) Ax = b has a solution iff rank([A | b]) = rank(A).
(3) If u is a (particular) solution of Ax = b, then each solution of Ax = b is given
by u + y, where y is a solution of the homogeneous system Ax = 0.
(4) If r = rank([A | b]) = rank(A) < n, then there are n − r unknowns which can
take arbitrary values and other r unknowns can be determined from the values
of these n − r unknowns.
(5) If m < n, then the homogeneous system has infinitely many solutions.
(6) Ax = b has a unique solution iff rank([A | b]) = rank(A) = n.
(7) If m = n, then Ax = b has a unique solution iff det(A) = 0.
(8) (Cramer’s Rule) If m = n and det(A) = 0, then the solution of Ax = b is given
by x j = det( A j (b) )/det(A) for each j ∈ {1, . . . , n}.

Proof (1) If [A | b ] has been obtained from [A | b] by a finite sequence of elementary
row operations, then A = E A and b = Eb, where E is the product of corresponding
elementary matrices. The matrix E is invertible. Now, A x = b iff E Ax = Eb iff
Ax = E −1 Eb = b.
(2) Due to (1), we assume that [A | b] is in RREF. Suppose Ax = b has a solution. If
there is a zero row in A, then the corresponding entry in b is also 0. Therefore, there
is no pivot in b. Hence, rank([A | b]) = rank(A).
Conversely, suppose that rank([A | b]) = rank(A) = r. Then, there is no pivot in
b. That is, b is a non-pivotal column in [A | b]. Thus, b is a linear combination of
pivotal columns, which are some columns of A. Therefore, Ax = b has a solution.
(3) Let u be a solution of Ax = b. Then, Au = b. Now, z is a solution of Ax = b
iff Az = b iff Az = Au iff A(z − u) = 0 iff z − u is a solution of Ax = 0. That is,
each solution z of Ax = b is expressed in the form z = u + y for a solution y of the
homogeneous system Ax = 0.
(4) Let rank([A | b]) = rank(A) = r < n. By (2), there exists a solution. Due to (3),
we consider solving the corresponding homogeneous system. Due to (1), assume
that A is in RREF. There are r number of pivots in A and m − r number of zero
rows. Omit all the zero rows; it does not affect the solutions. Write the system as
2.4 Solvability of Linear Equations 45

linear equations. Rewrite the equations by keeping the unknowns corresponding to


pivots on the left-hand side and taking every other term to the right-hand side. The
unknowns corresponding to pivots are now expressed in terms of the other n − r
unknowns. For obtaining a solution, we may arbitrarily assign any values to these
n − r unknowns, and the unknowns corresponding to the pivots get evaluated by the
equations.
(5) Let m < n. Then, r = rank(A) ≤ m < n. Consider the homogeneous system
Ax = 0. By (4), there are n − r ≥ 1 number of unknowns which can take arbitrary
values, and other r unknowns are determined accordingly. Each such assignment of
values to the n − r unknowns gives rise to a distinct solution resulting in infinite
number of solutions of Ax = 0.
We may generate infinite number of solutions as follows. Since r < n, there exists
an equation where on the left-hand side is an unknown corresponding to a pivot, and
on the right-hand side there is an unknown that does not correspond to a pivot having
a nonzero coefficient. Fix one such equation. Now, assign this unknown a nonzero
value and all other unknowns zero. By varying the assigned values, we get infinite
number of values for the unknown on the left-hand side.
(6) It follows from (3) and (4). The unique solution is given by x = A−1 b.
(7) If A ∈ Fn×n , then it is invertible iff rank(A) = n iff det(A) = 0. Then, the state-
ment follows from (6).
(8) Recall that A j (b) is the matrix obtained from A by replacing the jth column of
A with the vector b. Since det(A) = 0, by (6), Ax = b has a unique solution, say
y ∈ Fn×1 . Write the identity Ay = b in the form:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 a1 j a1n b1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
y1 ⎣ . ⎦ + · · · + y j ⎣ . ⎦ + · · · + yn ⎣ . ⎦ = ⎣ . ⎦ .
an1 an j ann bn

This gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 (y j a1 j − b1 ) a1n
⎢ ⎥ ⎢ ⎥
y1 ⎣ ... ⎦ + · · · + ⎣ ..
. ⎦ + · · · + yn
⎣ · · · ⎦ = 0.
an1 (y j an j − bn ) ann

In this sum, the jth vector is a linear combination of other vectors, where −y j s are
the coefficients. Therefore,

a11 · · · (y j a1 j − b1 ) · · · a1n
.. = 0.
.
an1 · · · (y j an j − bn ) · · · ann

From Property (6) of the determinant, it follows that


46 2 Systems of Linear Equations

a11 · · · a1 j · · · a1n a11 · · · b1 · · · a1n


yj .
. − .. = 0.
. .
an1 · · · an j · · · ann an1 · · · bn · · · ann

Therefore, y j = det( A j (b) )/det(A). 

A linear system Ax = b is said to be consistent iff rank([A | b]) = rank(A). The-


orem 2.9 says that only consistent systems have solutions. The homogeneous system
Ax = 0 is always consistent. It always has a solution, namely the zero solution,
x = 0, which is also called the trivial solution.
For a matrix A ∈ Fm×n , the number n − rank(A) is called the nullity of A and is
denoted by null(A).
The unknowns that correspond to the pivots are called the basic variables, and
the other unknowns are called the free variables. Thus, there are rank(A) number of
basic variables and null(A) number of free variables, which may be assigned arbitrary
values for obtaining a solution. The statement in Theorem 2.9(4) is informally stated
as follows:
A consistent system has null(A) number of linearly independent solutions.
This terminology of linearly independent solutions is meaningful when we con-
sider a solution of Ax = 0 as a vector y satisfying this equation. They are actually
linearly independent vectors; it will be obvious once we introduce the set of all solu-
tions in the next section. In fact, any solution of the homogeneous system is a linear
combination of these null(A) number of solutions.
We may further summarize our results for linear systems as follows.
Let a linear homogeneous system Ax = 0 have m equations and n unknowns.
1. If rank(A) = n, then Ax = 0 has a unique solution, the trivial solution.
2. If rank(A) < n, then Ax = 0 has infinitely many solutions.
Notice that if m ≥ n, then both the cases are possible, whereas if m < n then
rank(A) ≤ m < n; consequently, there must exist infinitely many solutions to the
homogeneous linear system.
For non-homogeneous linear systems, the same conclusion is drawn provided that
the system is consistent. To say it explicitly, let the linear system Ax = b have m
equations and n unknowns.
1. If rank([A | b]) > rank(A), then Ax = b has no solutions.
2. If rank([A | b]) = rank(A) = n, then Ax = b has a unique solution.
3. If rank([A | b]) = rank(A) < n, then Ax = b has infinitely many solutions.
Notice that the number of equations plays no role in the nature of solutions, but
the rank of the system matrix, which is equal to the number of linearly independent
equations, is important.
Exercises for Sect. 2.4
1. Determine whether the following linear systems are consistent or not.
(a) 2x1 − x2 − x3 = 2, x1 + x3 − 4x4 = 1, x2 − x3 − 4x4 = 4
2.4 Solvability of Linear Equations 47

(b) 2x1 − x2 − x3 = 2, x1 + x3 − 4x4 = 1, 3x1 − x2 − 4x4 = 4


2. For each of the following augmented matrices, determine whether the correspond-
ing linear system has (i) no solutions, (ii) a unique solution, and (iii) infinitely
many⎡solutions. ⎤ ⎡ ⎤ ⎡ ⎤
1 −2 4 1 1 −2 2 −2 1 −1 3 8
(a) ⎣ 0 0 1 3 ⎦ (b) ⎣ 0 1 −1 3 ⎦ (c) ⎣ 0 1 2 7 ⎦
0 000 0 0 1 2 0 012
3. In the following cases, what do you conclude about the number of solutions of
the linear system Ax = b?
(a) A ∈ F5×3 and b = A1 + A2 = A2 − A3
(b) A ∈ F3×4 and b = A1 + A2 + A3 + A4
(c) A ∈ F3×3 and b = 3A1 + 2 A2 + A3 = 0
4. Let A be a 7 × n matrix of rank r, and let b ∈ F7×1 . For each of the following
cases, determine, if possible, the number of solutions of the linear system Ax = b.
(a) n = 8, r = 5 (b) n = 8, r = 7 (c) n = 6, r = 6 (d) n = 6, r = 5

2.5 Gauss–Jordan Elimination

Gauss–Jordan elimination is an application of converting the augmented matrix to


its row reduced echelon form for solving a linear system. We start with determining
whether a system of linear equations has a solution or not. The consistency condition
implies that if an entry in the b portion of the RREF of [A | b] has become a pivot,
then the system is inconsistent; otherwise, the system is consistent.

Example 2.8 Does the following system of linear equations have a solution?

5x1 + 2x2 − 3x3 + x4 = 7


x1 − 3x2 + 2x3 − 2x4 = 11
3x1 + 8x2 − 7x3 + 5x4 = 8

We take the augmented matrix and reduce it to its row reduced echelon form by
elementary row operations.
⎡ ⎤ ⎡ ⎤
5 2 −3 1 7 1 2/5 −3/5 1/5 7/5
⎣ 1 −3 2 −2 11 ⎦ −→ ⎣ 0 −17/5 13/5 −11/5 48/5 ⎦
R1

3 8 −7 5 8 0 34/5 −26/5 22/5 −19/5


⎡ ⎤
1 0 −5/17 −1/17 43/17
R2 ⎢ ⎥
−→ ⎣ 0 1 −13/17 11/17 −48/17 ⎦ .
0 0 0 0 77/5
48 2 Systems of Linear Equations

Here, R1 = E 1/5 [1], E −1 [2, 1], E −3 [3, 1] and R2 = E −5/17 [2], E −2/5 [1, 2], E −34/5 [3, 2].
Since an entry in the b portion has become a pivot, the system is inconsistent. In
fact, you can verify that the third row in A is simply first row minus twice the second
row, whereas the third entry in b is not the first entry minus twice the second entry.
Therefore, the system is inconsistent. 

We write the set of all solutions of the system Ax = b as Sol(A, b). That is,

Sol(A, b) = y ∈ Fn×1 : Ay = b .

As in Example 2.8, if there is no solution, then Sol(A, b) = ∅.

Example 2.9 To illustrate the proof of Theorem 2.9, we change the last equation in
the previous example to make it consistent. We consider the new system

5x1 + 2x2 − 3x3 + x4 = 7


x1 − 3x2 + 2x3 − 2x4 = 11
3x1 + 8x2 − 7x3 + 5x4 = −15

Computation of the row reduced echelon form of the augmented matrix goes as
follows:
⎡ ⎤ ⎡ ⎤
5 2 −3 1 7 1 2/5 −3/5 1/5 7/5
⎣ 1 −3 2 −2 11 ⎦ −→ ⎣ R1
0 −17/5 13/5 −11/5 48/5 ⎦
3 8 −7 5 −15 0 34/5 −26/5 22/5 −96/5
⎡ ⎤
1 0 −5/17 −1/17 43/17

−→ ⎣ 0 1 −13/17 11/17 −48/17 ⎦ .


R2

0 0 0 0 0

Here, R1 = E 1/5 [1], E −1 [2, 1], E −3 [3, 1] and R2 = E −5/17 [2], E −2/5 [1, 2], E −34/5 [3, 2], as
earlier. The third row in the RREF is a zero row. Thus, the third equation is redundant.
Now, solving the new system in row reduced echelon form is easier. Writing as linear
equations, we have

1 x1 − 5
x − 17
17 3
1
x4 = 43
17

1 x2 − 13 x + 11
17 3
x
17 4
= − 48
17

The unknowns corresponding to the pivots, that is, x1 and x2 , are the basic variables,
and the other unknowns, x3 , x4 , are the free variables. The number of basic variables is
equal to the number of pivots, which is the rank of the system matrix. By assigning the
free variables xi to any arbitrary values, say, αi , the basic variables can be evaluated
in terms of αi .
We assign x3 to α3 and x4 to α4 . Then, we have
2.5 Gauss–Jordan Elimination 49

x1 = 43
17
+ 5
α
17 3
+ 1
α ,
17 4
x2 = − 48
17
+ 13
α
17 3
− 11
α .
17 4

Therefore, any vector y ∈ F4×1 in the form


⎡ ⎤
43
17
+ 17
5
α3 + 17
1
α4
⎢− + α3 − 11 α4 ⎥
48 13
y= ⎢ 17 17 17 ⎥ for α3 , α4 ∈ F
⎣ α3 ⎦
α4

is a solution of the linear system. Moreover, any solution of the linear system is in
the above form. That is, the set of all solutions is given by
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎫
⎪ 43/17 5/17 1/17 ⎪

⎨⎢ −48 ⎥ ⎪

/ ⎢13/17⎥ ⎢−11/17⎥
Sol(A, b) = ⎢ ⎥ + α3 ⎢ ⎥ + α4 ⎢ ⎥ : α3 , α4 ∈ F .
17
⎪ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎪

⎩ ⎪

0 0 1

t
Here, the vector 43/17, −48/17, 0, 0 is a particular solution of the original system.
t t
The two vectors 5/17, 13/17, 1, 0 and 1/17, −11/17, 0, 1 are linearly independent
solutions of the corresponding homogeneous system. Notice that the nullity of the
system matrix is 2. 
Instead of writing the RREF as a linear system again, we can reach at the set of
all solutions quite mechanically. See the following procedure.
Gauss–Jordan Elimination
1. Reduce the augmented matrix [A | b] to its RREF, say, [A |b ].
2. If a pivot has appeared in b , then Ax = b has no solutions.
3. Else, delete all zero rows from [A | b ].
4. Insert zero rows in [A | b ], if required, so that for each pivot, its row index is
equal to its column index.
5. Insert zero rows at the bottom, if required, to make A a square matrix. Call the
updated matrix [ Ã | b̃].
6. Change the diagonal entries of the zero rows in à from 0 to −1.
7. If the non-pivotal columns in à are u 1 , . . . , u k , then the set of all solutions is
given by Sol(A, b) = {b + α1 u 1 + · · · + αk u k }.
Example 2.10 We apply Gauss–Jordan elimination on the linear system of
Example 2.9. The RREF of the augmented matrix as computed there is
⎡ ⎤
1 0 −5/17 −1/17 43/17

[A | b ] = ⎣ 0 1 −13/17 11/17 −48/17 ⎦ .

0 0 0 0 0

We delete the zero row at the bottom. For each pivot, the row index is equal to its
column index; so, no new zero row is to be inserted. Next, to make A a square matrix,
50 2 Systems of Linear Equations

we adjoin two zero rows at the bottom. Next, we change the diagonal entries of all
zero rows to −1. It yields the following matrix:
⎡ ⎤
1 0 −5/17 −1/17 43/17
⎢ 0 −13/17 11/17 −48/17 ⎥
[ Ã | b̃] = ⎢
⎣ 0
1 ⎥
⎦.
0 −1 0 0
0 0 0 −1 0

The non-pivotal columns are the third and the fourth columns. According to Gauss–
Jordan elimination, the set of solutions is given by
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎫
⎪ 43/17 −5/17 −1/17 ⎪

⎨⎢−48 ⎥ ⎪

/17⎥ ⎢−13/17⎥ ⎢ 11/17 ⎥
Sol(A, b) = ⎣⎢ + α ⎢ ⎥ + α ⎢ ⎥ : α , α ∈ F .

⎪ 0 ⎦ 1 ⎣ −1 ⎦ 2 ⎣ 0⎦ 1 2


⎩ ⎭
0 0 −1

You may match this solution set with that in Example 2.9. 

There are variations of Gauss–Jordan elimination. Instead of reducing the aug-


mented matrix to its row reduced echelon form, if we reduce it to another intermediary
form, called the row echelon form, then we obtain the method of Gaussian elimina-
tion. In the row echelon form, we do not require the entries above a pivot to be 0;
also, the pivots need not be equal to 1. In that case, we will require back-substitution
in solving a linear system. To illustrate this process, we redo Example 2.9 starting
with the augmented matrix; it is as follows:
⎡ ⎤ ⎡ ⎤
5 2 −3 1 7 5 2 −3 1
7
R1 ⎢ ⎥
⎣ 1 −3 2 −2 11 ⎦ −→ ⎣ 0 −17/5 13/5 −11/5
48/5 ⎦

3 8 −7 5 −15 0 34/5 −26/5 22/5


−96/5
⎡ ⎤
5 2 −3 1 7
E 2 [3,2] ⎢ ⎥
−→ ⎣ 0 −17/5 13/5 −11/5 48/5 ⎦ .

0 0 0 0 0

Here, R1 = E −1/5 [2, 1], E −3/5 [3, 1]. The augmented matrix is now in row echelon
form. It is a consistent system, since no entry in the b portion is a pivot. The pivots
say that x1 , x2 are basic variables and x3 , x4 are free variables. We assign x3 to α3
and x4 to α4 . Writing in equation form, we have
 48 
x1 = 7 − 2 x2 + 3 α3 − α4 , x2 = − 17
5
5
− 13
α
5 3
+ 11
5 4
α .

First we determine x2 and then back-substitute. We obtain

x1 = 43
17
+ 5
α
17 3
+ 1
17
, x2 = − 48
17
+ 13
α
17 3
− 11
α ,
17 4
x3 = α3 , x4 = α4 .
2.5 Gauss–Jordan Elimination 51

Thus, the solution set is given by


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎫
⎪ 43/17 5/17 1/17 ⎪

⎨⎢−48 ⎥ ⎪

/ ⎢13/17⎥ ⎢−11/17⎥

Sol(A, b) = ⎣
17
⎥ + α ⎢ ⎥ + α ⎢ ⎥ : α , α ∈ F .

⎪ 0 ⎦ 3 ⎣ 1 ⎦ 4 ⎣ 0 ⎦ 3 4


⎩ ⎭
0 0 1

As you see we end up with the same set of solutions as in Gauss–Jordan elimination.
Exercises for Sect. 2.5
1. Using Gauss–Jordan elimination, and also Gaussian elimination, solve the fol-
lowing linear systems:
(a) 3w + 2x + 2y − z = 2, 2x + 3y + 4z = −2, y − 6z = 6
(b) w + 4x + y + 3z = 1, 2x + y + 3z = 0, w + 3x + y + 2z = 1,
2x + y + 6z = 0
(c) w − x + y − z = 1, w + x − y − z = 1, w − x − y + z = 2,
4w − 2x − 2y = 1
2. For each of the following augmented matrices, determine the solution set of the
corresponding
⎡ ⎤linear system:
⎡ ⎤ ⎡ ⎤
1002 1402 1 −3 0 2
(a) ⎣ 0 0 1 1 ⎦ (b) ⎣ 0 0 1 3 ⎦ (c) ⎣ 0 0 1 −2 ⎦
0013 0001 0 00 0
⎡ ⎤
⎡ ⎤ 3 ⎡ ⎤
1 0 2 0 −1 ⎢2⎥ 0
⎢ 0 1 3 0 −1 ⎥ ⎢ ⎥ ⎢5⎥
3. Let B = ⎢ ⎥ ⎢ ⎥
⎣ 0 0 0 1 5 ⎦ , w = ⎢ 0 ⎥ , and let b = ⎣ 3 ⎦ .
⎢ ⎥
⎣2⎦
0000 0 4
0
Suppose A is a matrix such that A1 = [2, 1, −3, 2]t , A2 = [−1, 2, 3, 1]t , B is
the RREF of A, and Aw = b. Determine the following:
(a) Sol(A, 0) (b) Sol(A, b) (c) A
4. Let A be a matrix with Sol(A, 0) = {α[1 0 − 1]t + β[0 1 3]t : α, β ∈ R}, and let
b = A1 + 2 A2 + A3 . Determine Sol(A, b).
5. Show that the linear system x + y + kz = 1, x − y − z = 2, 2x + y − 2z = 3
has no solution for k = 1 and has a unique solution for each k = 1.
6. Determine, if possible, the values of a, b so that the following systems have (i)
no solutions, (ii) unique solution, and (iii) infinitely many solutions:
(a) x1 + 2x2 + x3 = 1, −x1 + 4x2 + 3x3 = 2, 2x1 − 2x2 + ax3 = 3
(b) x1 + x2 + 3x3 = 2, x1 + 2x2 + 4x3 = 3, x1 + 3x2 + ax3 = b
52 2 Systems of Linear Equations

2.6 Problems

1. Let A be an m × n matrix of rank r = min{m, n}. Let u, v be vectors such that


Au = Av. In the following cases, can you conclude u = v? Explain.
(a) r = m (b) r = n (c) r < m (d) r < n
2. Let A j be the matrix obtained from the identity matrix of order n by replacing the
jth column by the vector [b1 , . . . , bn ]t . Using Cramer’s rule, show that det(A j ) =
b j for 1 ≤ j ≤ n.
3. Let A ∈ Fm×n , and let b ∈ Fm×1 . Prove the following:
(a) If the rows of A are linearly independent, then Ax = b has at least one solu-
tion.
(b) If the columns of A are linearly independent, then Ax = b has at most one
solution.
4. Let A ∈ Fn×n . Prove that the following are equivalent:
(a) A is invertible.
(b) Ax = 0 has no non-trivial solution.
(c) Ax = b has a unique solution for some b ∈ Fn×1 .
(d) Ax = b has at least one solution for each b ∈ Fn×1 .
(e) Ax = ei has at least one solution for each i ∈ {1, . . . , n}.
(f) Ax = b has at most one solution for each b ∈ Fn×1 .
(g) Ax = b has a unique solution for each b ∈ Fn×1 .
(h) rank(A) = n.
(i) The RREF of A is I.
(j) The rows of A are linearly independent.
(k) The columns of A are linearly independent.
(l) det(A) = 0.
(m) For each B ∈ Cn×n , AB = 0 implies that B = 0.
(n) For each B ∈ Cn×n , B A = 0 implies that B = 0.
5. Let A ∈ Fn×n . Prove that adj(A) is invertible iff A is invertible. Further, if A is
invertible, then show that (adj(A))−1 = adj(A−1 ) = (det(A))−1 A.
6. Let A be an n × n invertible matrix with n > 1. Show that det(adj(A)) =
(det(A))n−1 .
7. Show that if det(A) = 1, then adj(adj(A)) = A.
8. Show that a consistent linear system cannot have n number of solutions, where
n > 1 is a natural number.
9. Let A, B ∈ Fm×n be in RREF. Show that if Ax = 0 and Bx = 0 have the same
set of solutions, then A = B.
10. Let A ∈ Fn×n . Using Gaussian elimination, show that there exists a matrix P,
which is a product of elementary matrices of Type 1, a lower triangular matrix L ,
and an upper triangular matrix U such that A = P LU.
Chapter 3
Matrix as a Linear Map

3.1 Subspace and Span

Recall that F stands for either R or C; and Fn is either F1×n or Fn×1 . Also, recall
that a typical row vector in F1×n is written as [a1 , . . . , an ] and a column vector
in Fn×1 is written as [a1 , . . . , an ]t . Both the row and column vectors are written
uniformly as (a1 , . . . , an ); these constitute the vectors in Fn . In Fn , we have a special
vector, called the zero vector, which we denote by 0; that is, 0 = (0, . . . , 0). And if
x = (a1 , . . . , an ) ∈ Fn , then its additive inverse is −x = (−a1 , . . . , −an ).
The operations of addition and scalar multiplication in Fn enjoy the following
properties:
For all u, v, w ∈ Fn , and for all α, β ∈ F,
1. u + v = v + u.
2. (u + v) + w = u + (v + w).
3. u + 0 = 0 + u = u.
4. u + (−u) = −u + u = 0.
5. α(βu) = (αβ)u.
6. α(u + v) = αu + αv.
7. (α + β)u = αu + βu.
8. 1 u = u.
9. (−1)u = −u.
10. If u + v = u + w, then v = w.
11. If αu = 0, then α = 0 or u = 0.
It so happens that the last three properties follow from the earlier ones. Any
nonempty set where the two operations of addition and scalar multiplication are
defined, and which enjoy the first eight properties above, is called a vector space
over F. In this sense, Fn , that is, both F1×n and Fn×1 , are vector spaces over F. In
such a general setting if a nonempty subset of a vector space is closed under both

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 53


A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_3
54 3 Matrix as a Linear Map

the operations, then it is called a subspace. We may not need these general notions.
However, we define a subspace of our specific vector spaces.
Let V be a nonempty subset of Fn . We say that V is a subspace of Fn iff for
each scalar α ∈ F and for each pair of vectors u, v ∈ V, we have both u + v ∈ V and
αu ∈ V.
This is the meaning of the informal phrase: V is closed under addition and scalar
multiplication. It is easy to see that a nonempty subset V of Fn is a subspace of Fn
iff the following single condition is satisfied:
for each α ∈ F and for all vectors u, v from V, αu + v ∈ V.

Example 3.1 1. {0} and Fn are subspaces of Fn .


2. Let V = {(a, b, c) : 2a + 3b + 5c = 0, a, b, c ∈ F}. Clearly, (0, 0, 0) ∈ V. So,
V = ∅. If (a1 , b1 , c1 ), (a2 , b2 , c2 ) ∈ V, then

2a1 + 3b1 + 5c1 = 0, 2a2 + 3b2 + 5c2 = 0.

Adding them, we have 2(a1 + a2 ) + 3(b1 + b2 ) + 5(c1 + c2 ) = 0. Therefore,


(a1 , b1 , c1 ) + (a2 , b2 , c2 ) ∈ V.
If α ∈ F, then 2(αa1 ) + 3(αb1 ) + 5(αc1 ) = 0. So, α(a1 , b1 , c1 ) ∈ V.
Hence, V is a subspace of F3 .
3. Let V = {(a, b, c) : 2a + 3b + 5c = 1, a, b, c ∈ F}. Clearly, (1/2, 0, 0) ∈ V. So,
V = ∅. Also, (0, 1/3, 0) ∈ V.
We see that (1/2, 0, 0) + (0, 1/3, 0) = (1/2, 1/3, 0). And

2 × 1/2 + 3 × 1/3 + 5 × 0 = 2 = 1.

That is, (1/2, 0, 0) + (0, 1/3, 0) ∈/ V. Therefore, V is not a subspace of F3 .


Also, notice that 2 ( /2, 0, 0) ∈
1 / V.
4. Let β1 , . . . , βn ∈ F. Let V = {[a1 , . . . , an ] : β1 a1 + · · · + βn an = 0}. It is easy
to check that V is a subspace of F1×n .
5. Let α, β1 , . . . , βn ∈ F with α = 0. Then, the subset V = {[a1 , . . . , an ] : β1 a1 +
· · · + βn an = α} is not a subspace of F1×n . Why? 

Verify that in a subspace V, all the properties (1–8) above hold true. This is
the reason we call such a nonempty subset as a subspace. Further, if U and V are
subspaces of Fn and U ⊆ V, then we say that U is a subspace of V.
A singleton with a nonzero vector is not a subspace. For example, {(1, 1)} is not
a subspace of F2 since 2(1, 1) is not an element of this set.
What about the set {α(1, 1) : α ∈ F}? Take any two vectors from this set, say,
α(1, 1) and β(1, 1). Let γ ∈ F. Now,

γ (α(1, 1)) + β(1, 1) = (γ α + β)(1, 1)

is an element of the set. Therefore, the set is a subspace of F2 . Notice that this set is
the set of all linear combinations of the vector (1, 1).
3.1 Subspace and Span 55

Recall that a linear combination of vectors v1 , . . . , vm is any vector in the form

α1 v1 + · · · + αm vm

for scalars α1 , . . . , αm . If S is any nonempty subset of Fn , we define span(S) as the


set of all linear combinations of finite number of vectors from S. That is,

span(S) = {α1 v1 + · · · + αm vm : α1 , . . . , αm ∈ F, v1 , . . . , vm ∈ S for an m ∈ N}.

Also, we define span(∅) = {0}. We read span(S) as span of S.


When S = {v1 , . . . , vm }, we also write span(S) as span{v1 , . . . , vm }. Thus,

span{v1 , . . . , vm } = {α1 v1 + · · · + αm vm : α1 , . . . , αm ∈ F}.

For instance, v1 + · · · + vm and v1 + 5v2 are in span{v1 , . . . , vm }. In the first case,


each αi is equal to 1, whereas in the second case, α1 = 1, α2 = 5 and all other αi s
are 0.
Also, for S = ∅, we may write span(S) in the following way:

span(S) = ∪∞
m=1 {α1 v1 + · · · + αm vm : α1 , . . . , αm ∈ F, v1 , . . . , vm ∈ S}
= ∪∞
m=1 span{v1 , . . . , vm } for v1 , . . . , vm ∈ S
= ∪∞
m=1 span(A) for A ⊆ S with |A| = m.

Here, |A| means the number of elements of the set A. Further, when we speak of a
set of vectors, it is implicitly assumed that the set is a subset of Fn for some n.

Example 3.2 In R2 , U = span{(1, 1)} = {(a, a) : a ∈ R} is subspace of R2 ; it is


the straight line that passes through the origin with slope 1.
Again, let V = span{(1, 1), (1, 2)} = {(α(1, 1) + β(1, 2) : α, β ∈ R}, in R2 . Any
vector in V can be written as u + v, where u is a vector on the straight line y = x,
and v is vector on the straight line y = 2x. Here, a vector on the straight line means
vector directed from the origin to a point on the straight line. It seems any vector in the
plane can be written as u + v for some such u and v. To show that this is the case, we
try writing any point (a, b) ∈ R2 as u + v. So, suppose (a, b) = α(1, 1) + β(1, 2).
Then, a = α + β, b = α + 2β. Solving, we obtain β = b − a, α = 2a − b. We may
now verify that
(a, b) = (2a − b)(1, 1) + (b − a)(1, 2).

Therefore, R2 ⊆ V. On the other hand, V ⊆ R2 , since each linear combination of


(1, 1) and (1, 2) is in R2 . Therefore, V = R2 .
Notice that both U and V are subspaces of R2 . Further, U is a proper subspace
of R2 , whereas V = R2 . 

Theorem 3.1 Let S ⊆ Fn . Then, S ⊆ span(S), and span(S) is a subspace of Fn .


56 3 Matrix as a Linear Map

Proof Any vector u ∈ S is a linear combination of vectors from S, for u = 1 u.


Thus, S ⊆ span(S).
Now, if S = ∅, then span(S) = {0}. And if S = ∅, then S ⊆ span(S) implies
that span(S) = ∅. In any case, span(S) is nonempty.
If u, v ∈ span(S), then both of them are linear combinations of vectors from S.
So, αu + v is also a linear combination of vectors from S for any scalar α. Therefore,
span(S) is a subspace of Fn . 

We see that span(span(S)) = span(S). (How?) In general, span of any subspace


is the subspace itself.
Let V be a subspace of Fn , and let S ⊆ V. We say that S is a spanning subset of
V, or that S spans V iff V = span(S). In this case, each vector in V can be expressed
as a linear combination of vectors from S; we informally say that the vectors in S
span V.
Notice that (2, 2) ∈ span{(1, 1)}. So, span{(1, 1), (2, 2)} = span{(1, 1)}. As in
Example 3.2, we have in F2 ,

span{(1, 1), (2, 2)} = span{(1, 1)} = F2 , span{(1, 1), (1, 2)} = F2 .

Further, the vectors (1, 1), (1, 2), (1, −1) also span F2 . In fact, since the first two
vectors span F2 , any list of vectors from F2 containing these two will also span F2 .
Similarly, the vectors e1 , . . . , en in Fn×1 span Fn×1 , where ei is the column vector
in Fn×1 whose ith component is 1 and all other components are 0.
In this terminology, vectors v1 , . . . , vn are linearly dependent iff one of the vectors
in this list is in the span of the rest. If no vector in the list is in the span of the rest,
then the vectors are linearly independent.
Exercises for Sect. 3.1

1. Let V be a nonempty subset of Fn . Prove the following:


(a) V is a subspace of Fn iff for all α, β ∈ F and for all vectors u, v ∈ V, αu +
βv ∈ V.
(b) V is a subspace of Fn iff V with addition as in Fn and scalar multiplication
as in Fn satisfies the first eight properties of a vector space, as mentioned in
the text.
2. Let U be a subspace of V , and let V be a subspace of W, where W is a subspace
of Fn . Is U a subspace of W ?
3. Let u, v1 , v2 , . . . , vn be n + 1 distinct vectors in Fn , S1 = {v1 , v2 , . . . , vn }, and
let S2 = {u, v1 , v2 , . . . , vn }. Prove that span(S1 ) = span(S2 ) iff u ∈ span(S1 ).
4. Let A and B be subsets of Fn . Prove or disprove the following:
(a) A is a subspace of Fn if and only if span(A) = A.
(b) If A ⊆ B, then span(A) ⊆ span(B).
(c) span(A ∪ B) = {u + v : u ∈ span(A), v ∈ span(B)}.
(d) span(A ∩ B) ⊆ span(A) ∩ span(B).
(e) span(A) ∩ span(B) ⊆ span(A ∩ B).
3.1 Subspace and Span 57

5. Let S ⊆ U ⊆ Fn , where U is a subspace of Fn . Suppose for any subspace V of


Fn , if S ⊆ V then U ⊆ V. Prove that U = span(S).
6. Let A and B be subsets of Fn . Prove or disprove: A ∪ B is linearly independent
iff span(A) ∩ span(B) = {0}.

3.2 Basis and Dimension

Suppose we are given with a subset S of a subspace V of Fn . The subset S may or


may not span V. If it spans V, it is possible that it has a proper subset which also
spans V. For instance,

S = {[1, 2, −3], [1, 0, −1], [2, −4, 2], [0, −2, 2]}

spans the subspace V = {[a, b, c] : a + b + c = 0} of F1×3 . Also, the subset

{[1, 2, −3], [1, 0, −1], [2, −4, 2]}

of S spans the same subspace V. Notice that S is linearly dependent. Reason:

[0, −2, 2] = (−1)[1, 2, −3] + [1, 0, −1].

On the other hand, the linearly independent subset {[1, 2, −3]} of S does not span
V. For instance,

[1, 0, 1] ∈ V, [1, 0, −1] = α[1, 2, −3], for any α ∈ F.

That is, a spanning subset may be superfluous and a linearly independent set may
be deficient. A linearly independent set which also spans a subspace may be just
adequate in spanning the subspace.
Let V be a subspace of Fn . Let B be a set of vectors from V. We say that the set
B is a basis of V iff B is linearly independent and B spans V. Also, we define ∅ as
the basis for the zero subspace {0}.
In what follows, we consider ordered sets, and the ordering of vectors in a set is
shown by the way they are written. For instance, in the ordered set {v1 , v2 , v3 }, the
vector v1 is the first vector, v2 is the second, and v3 is the third, whereas in {v2 , v3 , v1 },
the vector v2 is the first, v3 is the second, and v1 is the third. We assume implicitly
that each basis is an ordered set.
It follows from Theorem 2.5 that each basis for a subspace of Fn has at most n
vectors.

Example 3.3 1. It is easy to check that B = {e1 , . . . , en } is a basis of Fn×1 . Sim-


ilarly, E = {e1t , . . . , ent } is a basis of F1×n . These are the standard bases of the
spaces.
58 3 Matrix as a Linear Map

2. We show that B = {[1, 2, −3], [1, 0, −1]} is a basis of

V = {[a, b, c] : a + b + c = 0, a, b, c ∈ F}.

First, B ⊆ V. Second, any vector in V is of the form [a, b, −a − b] for a, b ∈ F.


Now,  
[a, b, −a − b] = b2 [1, 2, −3] + a − b2 [1, 0, −1]

shows that span(B) = V. For linear independence, suppose

α[1, 2, −3] + β[1, 0, −1] = [0, 0, 0].

Then, α + β = 0, 2α = 0, −3α − β = 0. It implies that α = β = 0.


3. Also, E = {[1, −1, 0], [0, 1, −1]} is another basis for the subspace V of F1×3 in
(2). 

Let B be a basis of a subspace V of Fn . If C ⊆ V is any proper superset of B,


then any vector in C\B is a linear combination of vectors from B. So, C is linearly
dependent. On the other hand, if D is any proper subset of B, then each vector in
B\D fails to be a linear combination of vectors from D. Otherwise, B would be
linearly dependent. We thus say that
A basis is a maximal linearly independent set.
A basis is a minimal spanning set.

Of course, you can prove that a maximal linearly independent set is a basis, and a
minimal spanning set is a basis. This would guarantee that each subspace of Fn has
a basis. We take a more direct approach.

Theorem 3.2 Each subspace of Fn has a basis with at most n vectors.

Proof Let V be a subspace of Fn . If V = {0}, then ∅ is a basis of V. Otherwise,


choose a nonzero vector v1 from V. Take B = {v1 }. If V = span(B), then B is a
basis of V. Else, there exists a nonzero vector, say, v2 ∈ V \span(B). Update B to
B ∪ {v2 }. Notice that B is linearly independent. Continue this process to obtain larger
and larger linearly independent sets in V. By Theorem 2.5, a linearly independent
set in V cannot have more than n vectors. Thus, the process terminates with a basis
of V having at most n vectors. 

The zero subspace {0} has a single basis ∅. But other subspaces do not have a
unique basis. For instance, the subspace V in Example 3.3 has at least two bases.
However, something remains same in all these bases. In that example, both the bases
have exactly two vectors.

Theorem 3.3 Let V be a subspace of Fn . All bases of V have the same number of
vectors.
3.2 Basis and Dimension 59

Proof Without loss of generality, let V be a subspace of F1×n . By Theorem 2.5, a


basis of V can have at most n vectors. So, let {u 1 , . . . , u k } and {v1 , . . . , vm } be bases
for V, where k ≤ n and m ≤ n. Construct the matrix A ∈ F(k+m)×n by taking its rows
as u 1 , . . . , u k , v1 , . . . , vm in that order.
The first k rows of A are linearly independent, and other rows are linear
combinations of these k rows. Also, the last m rows of A are linearly indepen-
dent and other rows are linear combinations of these m rows. By Theorem 2.6,
k = rank(A) = m. 

In view of Theorem 3.3, there exists a unique non-negative number associated with
each subspace of Fn , which is the number of vectors in any basis of the subspace.
Let V be a subspace of Fn . The number of vectors in some (or any) basis for V is
called the dimension of V. We write this number as dim(V ) and also as dim V.
Since {e1 , . . . , en } is a basis for Fn×1 , dim(Fn×1 ) = n. Similarly, dim(F1×n ) = n.
Remember that when we consider Cn×1 or C1×n , the scalars in any linear combination
are complex numbers, and for Rn×1 or R1×n , the scalars are real numbers. Notice
that dim({0}) = dim(span(∅)) = 0; the dimension of any subspace of Fn is at most
n.

Example 3.4 The subspace U := {[a, b, c, d] : a − 2b + 3c = 0 = d + a, a, b,


c, d ∈ F} may be written as

U = {[a, b, c, d] : [2b − 3c, b, c, −2b + 3c] : b, c ∈ F}


= {b [2, 1, 0, −2] + c [−3, 0, 1, 3] : b, c ∈ F}.

The vectors [2, 1, 0, −2] and [−3, 0, 1, 3] are linearly independent. Therefore, U
has a basis {[2, 1, 0, −2], [−3, 0, 1, 3]}. So, dim(U ) = 2. 

Recall that |B| stands for the number of elements in a set B. For any subspace V
of Fn , and any subset B of Fn , the following statements should be obvious:
1. If |B| < dim(V ), then span(B) is a proper subspace of V.
2. If |B| > dim(V ), then B is linearly dependent.
3. If |B| = dim(V ) and span(B) = V, then B is a basis of V.
4. If |B| = dim(V ) and B is linearly independent, then B is a basis of V.
5. If U is a subspace of V , then dim(U ) ≤ dim(V ) ≤ n.
6. If B is a superset of a spanning set of V, then B is linearly dependent.
7. If B is a proper subset of a linearly independent subset of V, then B is linearly
independent, and span(B) is a proper subspace of V.
8. Each spanning set of V contains a basis of V.
9. Each linearly independent subset of V can be extended to a basis of V.
For (8)–(9), we may employ the same construction procedure as in the proof of
Theorem 3.2. A statement equivalent to (9) is proved below.
60 3 Matrix as a Linear Map

Theorem 3.4 (Basis Extension Theorem) Let V be a subspace of Fn . Each basis


of a subspace of V can be extended to a basis of V.

Proof Let B be a basis for a subspace U of V. If U = V, then B is a basis of V. Else,


let v1 ∈ V \U. Now, B ∪ {v1 } is linearly independent. If this set spans V, then it is a
basis of V. Otherwise, let v2 ∈ V \span(B ∪ {v1 }). If B ∪ {v1 , v2 } is not a basis of V,
then let v3 ∈ V \span(B ∪ {v1 , v2 }). Continue this process. The process terminates
since dim(V ) ≤ n. Upon termination, we obtain a basis for V. 

We can use elementary row operations to extract a basis for a subspace which is
given in the form of span of some finite number of vectors. We write the vectors as
row vectors, form a matrix A, and convert it to its RREF. Then, the pivotal rows of
the RREF form the required basis. Also, those rows of A which have become the
pivotal rows (monitoring row exchanges) form a basis.

Example 3.5 Let U = span{(1, 1, 1, 1), (2, 1, 0, 3), (−1, 0, 1, −2), (0, 3, 2, 1)}.
Find a basis for the subspace U of F4 .
We start with the matrix with these vectors as its rows and convert it to its RREF
as follows:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 1 1 1 0 −1 2
⎢ 2 1 0 3⎥ R1 ⎢ 0 −1 −2 ⎥
1⎥ ⎢ 2 −1⎥
⎢ ⎥ −→ ⎢ −→ ⎢
R2 0 1 ⎥
⎣−1 0 1 −2⎦ ⎣0 1 2 −1⎦ ⎣0 0 0 0⎦
0 3 2 1 0 3 2 1 0 0 −4 4
⎡ ⎤
1 0 0 1
R3 ⎢ 0 1 0 1⎥
−→ ⎢ ⎥
⎣ 0 0 1 −1⎦ .
0 0 0 0

Here, R1 = E −2 [2, 1], E 1 [3, 1]; R2 = E −1 [2], E −1 [1, 2], E −1 [3, 2], E −3 [4, 2];
and R3 = E[3, 4], E −1/4 [3], E 1 [1, 3], E −2 [2, 3].
The pivotal rows show that {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, −1)} is a basis for
the given subspace. Notice that only one row exchange has been done in this reduction
process, which means that the third row in the RREF corresponds to the fourth vector
and the fourth row corresponds to the third vector. Thus, the pivotal rows correspond
to the first, second, and the fourth vector, originally. This says that a basis for the
subspace is also given by {(1, 1, 1, 1), (2, 1, 0, 3), (0, 3, 2, 1)}.
The reduction process confirms that the third vector is a linear combination of the
first, second, and the fourth. 

The method illustrated in the above example can be used in another situation.
Suppose B is a basis for U, which is a subspace of V ⊆ Fn . Assume that a basis E
for V is also known. Then, we can extend B to a basis for V. Towards this, we may
form a matrix with the vectors in B as row vectors and then place the vectors from
3.2 Basis and Dimension 61

E as rows below those from B. Then, reduction of this matrix to RREF produces a
basis for V, which is an extension of B.
Linear independence has something to do with invertibility of a square matrix.
Suppose the rows of a matrix A ∈ Fn×n are linearly independent. Then, the RREF
of A has n number of pivots. That is, rank(A) = n. Consequently, A is invertible.
On the other hand, if a row of A is a linear combination of other rows, then this row
appears as a zero row in the RREF of A. That is, A is not invertible.
Considering At instead of A, we conclude that At is invertible iff the rows of A
are linearly independent. However, A is invertible iff At is invertible. Therefore, A
is invertible iff its columns are linearly independent.

Theorem 3.5 A square matrix is invertible iff its rows are linearly independent iff
its columns are linearly independent.

From Theorem 3.5, it follows that an n × n matrix is invertible iff its rows form
a basis for F1×n iff its columns form a basis for Fn×1 .
Exercises for Sect. 3.2

1. Let V be a subspace of Fn , and let B ⊆ V. Prove the following:


(a) B is a basis of V iff B is linearly independent and each proper superset of B
is linearly dependent.
(b) B is a basis of V iff B spans V and no proper subset of B spans V.
2. Prove statements in (1)–(7) listed after Example 3.4.
3. Let {x, y, z} be a basis for a vector space V. Is {x + y, y + z, z + x} also a basis
for V ?
4. Find a basis for the subspace {(a, b, c) ∈ R3 : a + b − 5c = 0} of R3 .
5. Find bases and dimensions of the following subspaces of R5 :
(a) {(a, b, c, d, e) ∈ R5 : a − c − d = 0}.
(b) {(a, b, c, d, e) ∈ R5 : b = c = d, a + e = 0}.
(c) span{(1, −1, 0, 2, 1), (2, 1, −2, 0, 0), (0, −3, 2, 4, 2),
(3, 3, −4, −2, −1), (5, 7, −3, −2, 0)}.
6. Extend the set {(1, 0, 1, 0), (1, 0, −1, 0)} to a basis of R4 .
7. Let u 1 , v, w, x1 , x2 , x3 , x4 ∈ R5 satisfy x1 = u + 2v + 3w, x2 = 2u − 3v + 4w,
x3 = −3u + 4v − 5w, and x4 = −u + 6v + w. Is {x1 , x2 , x3 , x4 } linearly depen-
dent?
8. Prove that the only proper subspaces of R2 are the straight lines passing through
the origin.
9. Describe all subspaces of R3 .
10. Let A ∈ Fn×n . Let {v1 , . . . , vn } be any basis of Fn×1 . Prove that A is invertible iff
Ax = vi has at least one solution for each i = 1, . . . , n.
11. Let A be a matrix. Call a row (column) of A redundant if it can be expressed as
a linear combination of other rows (columns). Show the following:
62 3 Matrix as a Linear Map

(a) If a redundant row of A is deleted, then the column rank of A remains


unchanged.
(b) If a redundant column of A is deleted, then the row rank of A remains
unchanged.
(c) The row rank of A is equal to the column rank of A.

3.3 Linear Transformations

Let A ∈ Fm×n . If x ∈ Fn×1 , then Ax ∈ Fm×1 . Thus, we may view the matrix A as a
function from Fn×1 to Fm×1 which maps x to Ax. That is, the map A : Fn×1 → Fm×1
is given by
A(x) = Ax for x ∈ Fn×1 .

Notice that instead of using another symbol, we write the map obtained this way
from a matrix A as A itself. Due to the properties of matrix product, the following
are true for the map A:
1. A(u + v) = A(u) + A(v) for all u, v ∈ Fn×1 .
2. A(αv) = α A(v) for all v ∈ Fn×1 and for all α ∈ F.
In this manner, a matrix is considered as a linear transformation. In fact, any
function A from a vector space to another (both over the same field) satisfying the
above two properties is called a linear transformation or a linear map.
To see the connection between the matrix as a rectangular array and as a function,
consider the values of the matrix A at the standard basis vectors e1 , . . . , en in Fn×1 .
The column vector e j ∈ Fn×1 has the jth entry as 1 and all other entries 0. Let
A = [ai j ] ∈ Fm×n . The product Ae j is a vector in Fm×1 , whose ith entry is

ai1 · 0 + · · · + ai( j−1) · 0 + ai j · 1 + ai( j+1) · 0 + · · · + ain · 0 = ai j .

That is, Ae j = [a1 j , . . . , ai j , . . . , am j ]t = A j , the jth column of A.

Observation 3.1 A matrix A ∈ Fm×n is viewed as the linear transformation A :


Fn×1 → Fm×1 , where A(e j ) = A j , the jth column of A for 1 ≤ j ≤ n, and A(v) =
Av for each v ∈ Fn×1 .

The range of the linear transformation A is the set R(A) = {Ax : x ∈ Fn×1 }. If
y ∈ R(A), then there exists an x = [α1 , . . . , αn ]t ∈ Fn×1 such that y = Ax. The
vector x can be written as x = α1 e1 + · · · + αn en . Thus, such a y ∈ R(A) is written
as
y = Ax = α1 Ae1 + · · · + αn Aen .

Conversely, each vector α1 Ae1 + · · · + αn Aen = A(α1 e1 + · · · + αn en ) ∈ R(A).


Hence,
3.3 Linear Transformations 63

R(A) = {α1 Ae1 + · · · + αn Aen : α1 , . . . , αn ∈ F}


= {α1 A1 + · · · + αn An : α1 , . . . , αn ∈ F}
= span of columns of A.

Therefore, R(A) is a subspace of Fm×1 . We refer to R(A) as the range space of


A. Since R(A) is the span of the columns of A, it is also called the column space of
A. Its dimension is the column rank of A. We know that

dim(R(A)) = the column rank of A = rank(A).

Similarly, the subspace of F1×n which is spanned by the rows of A is called the
row space of A. Notice that the nonzero rows in the RREF of A form a basis for the
row space of A. The dimension of the row space is the row rank of A, which we
know to be equal to rank(A) also.
For an m × n matrix A, viewed as a linear transformation, the set of all vectors
which map to the zero vector is denoted by N (A). That is,

N (A) = {x ∈ Fn×1 : Ax = 0}.

We find that N (A) is the set of all solutions of the linear system Ax = 0. Also,
if u, v ∈ N (A) and α ∈ F, then A(αu + v) = α Au + Av = 0.

Therefore, N (A) is a subspace of Fn×1 . We refer to N (A) as the null space of A.


The dimension of the null space is called the nullity of the matrix A and is denoted
by null(A). That is,
null(A) = dim(N (A)).

Theorem 2.9 (4) implies that dim(R(A)) + dim(N (A)) = n. Since this will be used
often, we mention it as a theorem. An alternate proof of this theorem is given in
Problem 12.

Theorem 3.6 (Rank Nullity) Let A ∈ Cm×n . Then, dim(R(A)) + dim(N (A)) =
rank(A) + null(A) = n.

Gauss–Jordan elimination process gives us the following mechanical way of con-


struction of a basis for N (A). First, we reduce A to its RREF B. Next, we throw
away the zero rows at the bottom of B to obtain an r × n matrix C. If necessary, we
insert zero rows in C to obtain an n × n matrix D so that the pivots become diagonal
entries in D. Each diagonal entry of D which is on a non-pivotal column is now 0.
We change each such 0 to −1. Call the new matrix as E. The non-pivotal columns
in E form a basis for N (A).

Example 3.6 Consider the system matrix in Example 2.9. We had its RREF with
(boxed) pivots as shown below:
64 3 Matrix as a Linear Map
⎡ ⎤ ⎡ ⎤
5 2 −3 1 1 0 −5/17 −1/17

A = ⎣ 1 −3 2 −2 ⎦ −→ ⎣ 0 1 −13/17 11/17 ⎦ = RREF(A).


3 8 −7 5 0 0 0 0

The first two columns in RREF(A) are the pivotal columns. So, the first two
columns in A form a basis for R(A). That is,

a basis for R(A) is [5, 1, 3]t , [2, −3, 8]t .

For a basis of N (A), notice that each pivot has the row index equal to the column
index; so, we do not require to insert zero rows between pivotal rows. To make it a
square matrix, we attach a zero row to the RREF at the bottom:
⎡ ⎤
1 0 −5/17 −1/17
⎢ 0 −13/17 11/17 ⎥
D=⎢
⎣ 0
1 ⎥.

0 0 0
0 0 0 0

Then, we change the diagonal entries in the non-pivotal columns to −1. These
changed non-pivotal columns form a basis for N (A). That is,
t t
a basis for N (A) is − 5
17
, − 13
17
, −1, 0 , − , , 0, −1
1 11
17 17
.

Thus, dim(R(A)) + dim(N (A)) = 4 = the number of columns in A. 


⎡ ⎤
5 2 −3 2
Example 3.7 Find bases for R(A) and N (A), where A = ⎣ 10 4 −5 5 ⎦ .
−5 −2 3 −2
We reduce A to its RREF as in the following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
5 2 −3 2 1 2/5 −3/5 2/5 1 2/50 1
⎣ 10 4 −5 5 ⎦ −→ ⎣ 0 0 1 1 ⎦ −→ ⎣ 0 0 1 1 ⎦.
−5 −2 3 −2 0 0 0 0 0 0 0 0

A basis for R(A) is provided by the columns of A that correspond to the pivotal
columns, that is, the first and the third columns of A. So,

a basis for R(A) = [5, 10, −5]t , [−3, −5, 3]t .

The second and fourth columns are linear combinations of these basis vectors,
where the coefficients are provided by the entries in the non-pivotal columns of the
RREF. That is, the second column of A is 25 times the first column, and the fourth
column is 1 times the first column plus 1 times the second column. Indeed, we may
verify that
3.3 Linear Transformations 65

[2, 4, −2]t = 25 [5, 10, −5]t , [2, 5, −2]t = [5, 10, −5]t + [−3, −5, 3]t .

Towards computing a basis for N (A), notice that the only zero row in the RREF
is the third row. So, we delete it. Next, the pivot on the second row has the column
index as 3. So, we insert a zero row between first and second rows. Next, we adjoin
a zero row at the bottom to make it a square matrix. Finally, we change the diagonal
entry of these new rows to −1. We then obtain the matrix
⎡ ⎤
1 2/5 0 1
⎢ 0 −1 0 0⎥
⎢ ⎥.
⎣ 0 0 1 1⎦
0 0 0 −1

A basis for N (A) is provided by the non-pivotal columns. That is,


a basis for N (A) = [ 25 , −1, 0, 0]t , [1, 0, 1, −1]t . 

Exercises for Sect. 3.3


⎡ ⎤
1 2 1 1 1
⎢3 5 3 4 3⎥
1. Let A = ⎢⎣1 1 1 2
⎥ . Write r = rank(A) and k = null(A).
1⎦
5 8 5 7 5
(a) Find bases for R(A) and N (A), and then determine r and k.
(b) Express suitable 4 − r rows of A as linear combinations of other r rows.
(c) Express suitable 5 − r columns of A as linear combinations of other r columns.
2. Construct bases for (i) the row space, (ii) the column space, and the null space for
each of the following matrices:
⎡ ⎤
⎡ ⎤ −3 1 −3 ⎡ ⎤
1 2 4 ⎢ 1 2 8⎥ 3 4 5 6
(a) ⎣ 3 1 7 ⎦ (b) ⎢ ⎣ 3 −1 4 ⎦
⎥ (c) ⎣ 2 1 3 2 ⎦
2 4 8 1 3 −2 1
4 −2 2
3. Let A ∈ Fm×n . Consider A as the linear transformation A : Fn×1 → Fm×1 given
by A(x) = Ax for each x ∈ Fn×1 . Prove the following:
(a) A is one-one iff N (A) = {0}.
(b) A is one-one and onto iff A maps any basis onto a basis of Fn×1 .
(c) A is one-one iff A is onto.
4. Let A ∈ Fm×n . Let P ∈ Fm×m be invertible. Is it true that the RREF of P A is same
as the RREF of A?
5. Let A and B be matrices having the same RREF. Then which of the following may
be concluded? Explain.
(a) R(A) = R(B) (b) N (A) = N (B)
(c) The row space of A is equal to the row space of B.
6. For A ∈ Fm×n and B ∈ Fn×k , prove: rank(AB) ≤ min{rank(A), rank(B)}.
66 3 Matrix as a Linear Map

3.4 Coordinate Vectors

Let B = {v1 , . . . , vm } be a basis of a subspace V of Fn . Recall that a basis is assumed


to be an ordered set of vectors. Let v ∈ V. As B spans V, the vector v is a linear com-
bination of vectors from B. Can there be two distinct linear combinations? Suppose
there exist scalars a1 , . . . , am , b1 , . . . , bm ∈ F such that

v = a1 v1 + · · · + am vm = b1 v1 + · · · + bm vm .

Then, (a1 − b1 )v1 + · · · + (am − bm )vm = 0. Since B is linearly independent, a1 =


b1 , . . . , am = bm . That is, such a linear combination is unique.
Conversely, suppose that each vector in V is written uniquely as a linear combi-
nation of v1 , . . . , vm . To show linear independence of these vectors, suppose that

α1 v1 + · · · + αm vm = 0

for scalars α1 , . . . , αm . We also have

0 v1 + · · · + 0 vm = 0.

From the uniqueness of writing the zero vector as a linear combination of


v1 , . . . , vm , we conclude that α1 = 0, . . . , αm = 0. Therefore, B is linearly inde-
pendent.
We note down the result we have proved.

Theorem 3.7 Let B = {v1 , . . . , vm } be an ordered set of vectors from a subspace V


of Fn . B is a basis of V iff for each v ∈ V, there exists a unique vector [α1 , . . . , αm ]t
in Fm×1 such that v = α1 v1 + · · · + αm vm .

Notice that in Theorem 3.7, dim(V ) = m.


Let B = {v1 , . . . , vm } be a basis for a subspace V of Fn . Let v ∈ V. For v =
α1 v1 + · · · + αm vm , the unique column vector [α1 , . . . , αm ]t ∈ Fm×1 is called the
coordinate vector of v with respect to the basis B; it is denoted by [v] B . In this
sense, we say that a basis provides a coordinate system in a subspace. Notice that
the ordering of basis vectors is important to obtain the coordinate vector.

Example 3.8 Let V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c ∈ R}. This subspace


has a basis
B = {[0, 3, −2]t , [2, −1, 0]t }.

For v = [−3, 0, 1]t ∈ V, we have

v = [−3, 0, 1]t = − 21 [0, 3, −2]t − 23 [2, −1, 0]t .

t
Therefore, [v] B = − 21 , − 23 .
3.4 Coordinate Vectors 67

If w ∈ V has coordinate vector given by [w] B = [1, 1]t , then

w = 1 [0, 3, −2]t + 1 [2, −1, 0]t = [2, 2, −2]t .

Since dim(V ) = 2, all the coordinate vectors are elements of F2×1 . 


The coordinate vector induces a linear map from V onto Fm×1 . To see this, let
B = {v1 , . . . , vm } be a basis for a subspace V of Fn . Suppose u, v ∈ V have the
coordinate vectors
t t
[u] B = a1 , . . . , am , [v] B = b1 , . . . , bm .

It means u = a1 v1 + · · · am vm and v = b1 v1 + · · · + bm vm . Then,

αu + v = (αa1 + b1 ) + · · · + (αam + bm )vm .

That is, [αu + v] B = α[u] B + [v] B . In a way, this is the linear property of the
coordinate vector map. Also,

v j = 0 · v1 + · · · + 0 · v j−1 + 1 · v j + 0 · v j+1 + · · · + 0 · vm .

Hence, [v j ] B = e j for each j ∈ {1, . . . , m}.


We summarize these simple facts about the coordinate vector.
Observation 3.2 Let B = {v1 , . . . , vm } be a basis for a subspace V of Fn . Let
α1 , . . . , αm ∈ F, and let u 1 , . . . , u m ∈ V. Then,
1. [α1 u 1 + · · · + αm u m ] B = α1 [u 1 ] B + · · · + αm [u m ] B .
2. [v j ] B = e j , the jth standard basis vector of Fm×1 for 1 ≤ j ≤ m.
Consider Fn×1 as its own subspace. It has many bases. If E is the standard basis
of Fn×1 and v = [a1 , . . . , an ]t ∈ Fn×1 , then [v] E = v. When we change the basis E
to another, say, B, then how does [v] B look like?
Theorem 3.8 Let {e1 , . . . , en } be the standard basis of Fn×1 , and let B = {u 1 , . . . ,
u n } be any basis of Fn×1 . Write P = [u 1 · · · u n ]. Then, P[e j ] B = e j = [u j ] B for 1 ≤
j ≤ n, and for each u ∈ Fn×1 , P[u] B = u and [u] B = P −1 u.
Proof Let j ∈ {1, . . . , n}. By Observation 3.2, [u j ] B = e j . Since u j is the jth col-
umn of P, we have u j = Pe j = P[u j ] B .
Let u ∈ Fn×1 . Then, u = a1 u 1 + · · · + an u n for unique scalars a1 , . . . , an . Now,
t
P[u] B = P a1 , . . . , an = P(a1 e1 + · · · + an en )
= a1 Pe1 + · · · + an Pen = a1 u 1 + · · · + an u n = u.

As the columns of P form a basis for Fn×1 , the matrix P is invertible. Therefore,
[u] B = P −1 u. 
68 3 Matrix as a Linear Map

Example 3.9 Consider the basis B = {u 1 , u 2 , u 3 } for R3×1 , where u 1 = [1, 1, 1]t ,
u 2 = [1, 0, 1]t , and u 3 = [1, 0, 0]t . The matrix P in Theorem 3.8 and its inverse are
given by
⎡ ⎤ ⎡ ⎤
1 1 1 0 1 0
P = [u 1 u 2 u 3 ] = ⎣1 0 0⎦ , P −1 = ⎣0 −1 1⎦ .
1 1 0 1 0 −1

We see that

e1 = [1, 0, 0]t = 0 · u 1 + 0 · u 2 + u 3
e2 = [0, 1, 0]t = u 1 − u 2 + 0 · u 3
e3 = [0, 0, 1]t = 0 · u 1 + u 2 − u 3

Also, since P[e j ] B = e j , we have [e j ] B = P −1 e j . That is, the columns of P −1 are


the coordinate vectors of e j with respect to the basis B. Either way,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0
[e1 ] B = ⎣ 0⎦ , [e2 ] B = ⎣−1⎦ , [e3 ] B = ⎣ 1⎦ .
1 0 −1

It leads to

P[e1 ] B = P[0, 0, 1]t = [1, 0, 0]t = e1


P[e2 ] B = P[1, −1, 0]t = [0, 1, 0]t = e2
P[e3 ] B = P[0, 1, −1]t = [0, 0, 1]t = e3

That is, P[e j ] B = e j for each j as stated in Theorem 3.8. We verify the other con-
clusion of the theorem for u = [1, 2, 3]t . Here, u = 2u 1 + u 2 − 2u 3 . Therefore,
⎡ ⎤ ⎡ ⎤⎡ ⎤
2 0 1 0 1
[u] B = ⎣ 1⎦ = ⎣0 −1 1⎦ ⎣2⎦ = P −1 u. 
−2 1 0 −1 3
It raises a more general problem: if a vector v has the coordinate vector [v] B with
respect to a basis B of Fn×1 , and also a coordinate vector [v]C with respect to another
basis C of Fn×1 , then how are [v] B and [v]C related? We will address this question
in Sect. 3.6.
Exercises for Sect. 3.4
In the following, a basis B for F3×1 and a vector u ∈ F3×1 are given. Compute the
coordinate vector [u] B .
1. B = {[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }, u = [1, 2, 3]t
2. B = {[0, 1, 1]t , [1, 0, 1]t , [1, 1, 0]t }, u = [2, 1, 3]t
3. B = {[1, 1, 1]t , [1, 2, 1]t , [1, 2, 3]t }, u = [3, 2, 1]t
3.5 Coordinate Matrices 69

3.5 Coordinate Matrices

Let A ∈ Fm×n . We view A as a linear transformation from Fn×1 to Fm×1 . Let D


and E be the standard bases of Fn×1 and Fm×1 , respectively. Then, [u] D = u and
[Au] E = Au. We thus obtain

[Au] E = A [u] D .

If we choose a different pair of bases, that is, a basis B for Fn×1 and a basis C for
Fm×1 , then u has the coordinate vector [u] B and Au has the coordinate vector [Au]C .
Looking at the above equation, we ask:
Does there exist a matrix M such that [Au]C = M[u] B ?
If so, is it unique?

The following theorem answers these questions.


Theorem 3.9 Let A ∈ Fm×n . Let B = {u 1 , . . . , u n } and C = {v1 , . . . , vm } be bases
for Fn×1 and Fm×1 , respectively. Then, there exists a unique matrix M ∈ Fm×n such
that [Au]C = M[u] B . Moreover, the entries of M = [αi j ] satisfy the following:

Au j = α1 j v1 + · · · + αm j vm for each j ∈ {1, . . . , n}.

Proof Let j ∈ {1, . . . , n}. Since Au j ∈ Fm×1 and C is a basis for Fm×1 , there exist
unique scalars α1 j , . . . , αm j such that

Au j = α1 j v1 + · · · + αm j vm .

Thus, we have unique mn scalars αi j so that the above equation is true for each
j. Construct the matrix M = [αi j ] ∈ Fm×n . By Observation 3.2,
t
[Au j ]C = α1 j , . . . , αm j = M e j = M [u j ] B for each j ∈ {1, . . . , n}.

We must verify that such an equality holds for each u ∈ Fn×1 . So, let u ∈ Fn×1 .
As B is a basis of Fn×1 , there exist unique scalars β1 , . . . , βn such that u = β1 u 1 +
· · · + βn u n . Now,

[Au]C = [β1 u 1 + · · · + βn u n ]C = β1 [Au 1 ]C + · · · + βn [Au n ]C


= β1 M [u 1 ] B + · · · + βn M [u n ] B = M [β1 u 1 + · · · + βn u n ] B = M [u] B .

This proves the existence of a required matrix M.


To prove uniqueness, let P and Q be matrices such that

[Au]C = P [u] B = Q [u] B for each u ∈ Fn×1 .

In particular, with u = u j , we have


70 3 Matrix as a Linear Map

[Au j ]C = P [u j ] B = Q [u j ] B for each j ∈ {1, . . . , n}.

Since [u j ] B = e j , we obtain Pe j = Qe j . That is, the jth column of P is equal to


the jth column of Q for each j ∈ {1, . . . , n}. Therefore, P = Q. 

In view of Theorem 3.9, we denote the matrix M as [A]C,B and give it a name.
Let A ∈ Fm×n . Let B be a basis for Fn×1 , and let C be a basis for Fm×1 . The unique
m × n matrix [A]C,B that satisfies

[Au]C = [A]C,B [u] B for each u ∈ Fn×1

is called the coordinate matrix of A with respect to the bases B and C.


Theorem 3.9 provides a method to compute the coordinate matrix. Let A ∈ Fm×n ,
B = {u 1 , . . . , u n } be a basis for Fn×1 , and let C = {v1 , . . . , vn } be a basis for Fm×1 .
We express the image of each basis vector u j in terms of the basis vector vi s as in
the following:

Au 1 = α11 v1 + α21 v2 + · · · + αm1 vm


Au 2 = α12 v1 + α22 v2 + · · · + αm2 vm
..
. (3.1)
Au n = α1n v1 + α2n v2 + · · · + αmn vm

Then, the coordinate matrix is (Mark which coefficient αi j goes where)


⎡ ⎤
α11 α12 · · · α1n
⎢ α21 α22 · · · α2n ⎥
⎢ ⎥
[A]C,B =⎢ .. ⎥. (3.2)
⎣ . ⎦
αm1 αm2 · · · αmn

Also, notice that if the coordinate matrix is given by (3.2), then the equalities in
(3.1) are satisfied.
⎡ ⎤
0 1
Example 3.10 Compute the coordinate matrix [A]C,B of A = ⎣ 1 1 ⎦ with
1 −1
respect to the bases B = {{[1, 2]t , [3, 1]t } for F2×1 , and C = {[1, 0, 0]t , [1, 1, 0]t ,
[1, 1, 1]t } for F3×1 .
Write u 1 = [1, 2]t , u 2 = [3, 1]t and v1 = [1, 0, 0]t , v2 = [1, 1, 0]t , v3 =
[1, 1, 1]t . We obtain

Au 1 = [2, 3, −1]t = −v1 + 4v2 − v3


Au 2 = [1, 4, 2]t = −3v1 + 2v2 + 2v3 .
3.5 Coordinate Matrices 71
⎡ ⎤
−1 −3
Therefore, [A]C,B = ⎣ 4 2 ⎦. 
−1 2
We use Theorem 3.8 in proving the following result, which shows another way
for computing the coordinate matrix.

Theorem 3.10 Let A ∈ Fm×n . Let B = {u 1 , . . . , u n } and C = {v1 , . . . , vm } be bases


for Fn×1 and Fm×1 , respectively. Construct the matrices

P = u 1 · · · u n , Q = v1 · · · vm ,

by taking the column vectors u j and vi as columns of the respective matrices. Then,
[A]C,B = Q −1 A P.

Proof Let u = a1 u 1 + · · · + an u n . By Theorem 3.8, we obtain P [u] B = u. Also,


Q [Au]C = Au.
Since the columns of Q ∈ Fm×m are linearly independent, Q is invertible. There-
fore, [Au]C = Q −1 Au = Q −1 A P [u] B .
It then follows that [A]C,B = Q −1 A P. 

To summarize, if A ∈ Fm×n , B is a basis for Fn×1 , and C is a basis for Fm×1 , then
the coordinate matrix [A]C,B can be obtained in two ways.
Due to Theorem 3.9, we express the A-image of the basis vectors in B as linear
combinations of basis vectors in C. The coefficients of each such image form the
corresponding columns of the matrix [A]C,B .
Alternatively, using Theorem 3.10, we construct P as the matrix whose jth column
is the jth vector in B and the matrix Q by taking its ith column as the ith vector in
C. Then, we have [A]C,B = Q −1 A P.
Since inverse of a matrix is computed using the reduction to RREF, the coordinate
matrix may be computed the same way. If B = {u 1 , . . . , u n } and C = {v1 , . . . , vm }
are bases for Fn×1 and Fm×1 , respectively, then

P = u1 · · · un , Q = v1 · · · vm .

Now, A P = Au 1 · · · Au n . To compute Q −1 A P, we start with the augmented


matrix [Q | A P] and then proceed towards its RREF. Since Q −1 exists, the RREF
will look like [I | Q −1 A P]. Schematically, the computation may be written as in the
following:
RRE F
v1 · · · vm | Au 1 · · · Au n −→ [I | [A]C,B ].
⎡ ⎤
0 1
Example 3.11 Compute the coordinate matrix [A]C,B of A = ⎣ 1 1 ⎦ with
1 −1
respect to the bases B = {[1, 2]t , [3, 1]t } for F2×1 , and C = {[1, 0, 0]t , [1, 1, 0]t ,
[1, 1, 1]t } for F3×1 .
72 3 Matrix as a Linear Map

Writing the vectors of B as u 1 and u 2 in that order, we have


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1   2 0 1   1
1 3
Au 1 = ⎣ 1 1 ⎦ = ⎣ 3⎦ , Au 2 = ⎣ 1 1 ⎦ = ⎣4 ⎦ .
2 1
1 −1 −1 1 −1 2

We then convert the augmented matrix of columns as vectors from C followed by


Au 1 and Au 2 to RREF. It is as follows:
⎡ ⎤ ⎡ ⎤
1 1 1 2 1 1 0 0 −1 −3
⎣ 0 1 1 3 4 ⎦ −→ ⎣ 0 1 0 4 2 ⎦ .
R1

0 0 1 −1 2 0 0 1 −1 2
⎡ ⎤
−1 −3
Here, R1 = E −1 [1, 2], E −1 [2, 3]. Therefore, [A]C,B = ⎣ 4 2 ⎦ .
−1 2
It is easy to verify that Au 1 = −v1 + 4v2 − v3 and Au 2 = −3v1 + 2v2 + 2v3 as
required by (3.1)–(3.2). 
The linear property of coordinate vectors can be extended to coordinate matrices;
see Exercise 2. Moreover, the product formula [Au]C = [A]C,B [u] B has a similar
looking expression for product of matrices.
Theorem 3.11 Let A ∈ Fm×k and B ∈ Fk×n be matrices. Let C, D and E be bases
for Fn×1 , Fk×1 and Fm×1 , respectively. Then, [AB] E,C = [A] E,D [B] D,C .
Proof For each v ∈ Fn×1 ,

[A] E,D [B] D,C [v]C = [A] E,D [Bv] D = [ABv] E = [AB] E,C [v]C .

Therefore, [A] E,D [B] D,C = [AB] E,C . 


Notice that in Theorem 3.11, the matrices are the linear transformations

A : Fk×1 → Fm×1 , B : Fn×1 → Fk×1 , AB : Fn×1 → Fm×1 .

Since ABv = (AB)(v) = A(Bv), the linear transformation AB is the composition


of the maps A and B. The composition map AB transforms a vector from Fn×1 to
one in Fk×1 by using B first, and then it uses A to transform that vector from Fk×1 to
one in Fm×1 . Our result says that the coordinate matrix of the composition map AB
is simply the product of the coordinate matrices of the maps A and B.
Exercises for Sect. 3.5
⎡ ⎤
1 1 1
1. Let A = ⎣1 2 2⎦ . Consider the bases B = {[1, 0, 1]t , [0, 1, 1]t , [1, 1, 0]t and
1 2 3
C = {[1, 1, 1]t , [1, 2, 1]t , [1, 1, 0]t } for F3×1 . Compute [A]C,B and [A] B,C in the
following ways:
3.6 Change of Basis Matrix 73

(a) By expressing A-images of the basis vectors of B in terms of those of C and


vice versa.
(b) By computing the inverses of two matrices constructed from the basis vectors.
(c) By using the RREF conversion as mentioned in the text.
2. Let A, B ∈ Fm×n , C be a basis of Fn×1 , and D be a basis for Fm×1 , and let α ∈ F.
Show that [α A + B] D,C = α[A] D,C + [B] D,C .
3. Take a 2 × 3 real matrix A and a 3 × 2 real matrix B. Also, take the bases C =
{[1, 1], [0, 1]}, D = {[1, 0, 0], [0, 1, 0], [0, 0, 1]} and E = {[1, 0], [1, 1]} for the
spaces R2×1 , R3×1 and R2×1 , respectively. Verify Theorem 3.11.
4. Construct a matrix A ∈ R2×2 , a vector v ∈ R2×1 , and a basis B = {u 1 , u 2 } for
R2×1 satisfying [Av] B = A[v] B .

3.6 Change of Basis Matrix

Recall that [A]C,B is that matrix in Fm×n which satisfies the equation:

[Au]C = [A]C,B [u] B for each u ∈ Fn×1 .

Further, [A]C,B is given by Q −1 A P, where P is the matrix whose jth column is the
jth basis vector of B, and Q is the matrix whose ith column is the ith basis vector
of C. In particular, taking A ∈ Fn×n as the identity matrix, writing B as O, and C as
N , we see that

[u] N = [I ] N ,O [u] O = Q −1 P[u] O for each u ∈ Fn×1 . (3.3)

In (3.3), P is the matrix whose jth column is the jth basis vector of O, and Q is the
matrix whose ith column is the ith basis vector of N for 1 ≤ i, j ≤ n. Both O and
N are bases for Fn×1 .
The formula in (3.3) shows how the coordinate vector changes when a basis
changes. The matrix I N ,O , which is equal to Q −1 P, is called the change of basis
matrix or the transition matrix when the basis changes from an old basis O to a
new basis N .
Observe that the change of basis matrix I N ,O can also be computed by expressing
the basis vectors of O as linear combinations of basis vectors of N as stipulated by
Theorem 3.9, or equivalently, in (3.1)–(3.2). Alternatively, if O = {u 1 , . . . , u n } and
N = {v1 , . . . , vn }, the change of basis matrix I N ,O , which is equal to Q −1 P, may
be computed by using reduction to RREF. Schematically, it is given as follows:

RRE F
O = {u 1 , . . . , u n }, N = {v1 , . . . , vn }, v1 · · · vn | u 1 · · · u n −→ [I | I N ,O ].

Example 3.12 Consider the following bases for R3×1 :


74 3 Matrix as a Linear Map

O = {[1, 0, 1]t , [1, 1, 0]t , [0, 1, 1]t }, N = {[1, −1, 1]t , [1, 1, −1]t , [−1, 1, 1]t }.

Find the change of basis matrix I N ,O when the basis ⎡ ⎤ O to N . Also,


changes from
1 1 1
find the coordinate matrix [A] N ,O of the matrix A = ⎣−1 0 1⎦ with respect to
0 1 0
the pair of bases O and N . Verify that
⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤⎤ ⎡ ⎤
1 1 1 1
⎣2⎦ = I N ,O ⎣2⎦ , ⎣ A ⎣2⎦⎦ = [A] N ,O ⎣2⎦ .
3 N 3 O 3 N
3 O

To solve this problem, we express the basis vectors of O as linear combinations


of those of N , as in Theorem 3.9. The details are as follows:

[1, 0, 1]t = 1[1, −1, 1]t + 21 [1, 1, −1]t + 21 [−1, 1, 1]t


[1, 1, 0]t = 1
2
[1, −1, 1]t + 1[1, 1, −1]t + 21 [−1, 1, 1]t
[0, 1, 1]t = 1
2
[1, −1, 1]t + 21 [1, 1, −1]t + 1[−1, 1, 1]t .

The coefficients from the first equation give the first column of I N ,O and so on.
Thus, we obtain ⎡ ⎤
1 1/2 1/2
I N ,O = ⎣1/2 1 1/2⎦ .
1/2 1/2 1

Alternatively, by Theorem 3.10, the change of basis matrix is given by


⎡ ⎤−1⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 −1 1 1 0 1/2 0 1/2 1 1 0 1 1/2 1/2
I N ,O ⎣
= −1 1 1 ⎦ ⎣ ⎦ ⎣
0 1 1 = /2 /2 0
1 1 ⎦⎣0 1 1⎦ = ⎣1/2 1 1/2⎦ .
1 −1 1 1 0 1 0 1/2 1/2 1 0 1 1/2 1/2 1

Again, using the RREF conversion we may obtain I N ,O as follows:


⎡ ⎤ ⎡ ⎤
1 1 −1 1 1 0 1 1 −1 1 1 0
⎣ −1 1 1 0 1 1 ⎦ −→ ⎣ 0 2 0 1 2 1
R1

1 −1 1 1 0 1 0 −2 2 0 −1 1
⎡ ⎤ ⎡ ⎤
1 0 −1 1/2 0 −1/2 1 0 0 1 1/2 1/2

−→ ⎣ 0 1 0 1/2 1 1/2 ⎦ −→ ⎣ 0 1 0 1/2 1 ⎦.


R2 R3
1/2

0 0 2 1 1 2 0 0 1 1/2 1/2 1

Here, R1 = E 1 [2, 1], E −1 [3, 1]; R2 = E 1/2 [2], E −1 [1, 2], E 2 [3, 2]; R3 = E 1/2 [3], E 1 [1, 3].
The matrix in the RREF, to the right of the vertical bar, is the required change of
basis matrix I N ,O .
To verify our result for [1, 2, 3]t , notice that
3.6 Change of Basis Matrix 75

[1, 2, 3]t = 1[1, 0, 1]t + 0[1, 1, 0]t + 2[0, 1, 1]t


[1, 2, 3]t = 2[1, −1, 1]t + 23 [1, 1, −1]t + 25 [−1, 1, 1]t .

Therefore, [[1, 2, 3]t ] O = [1, 0, 2]t and [[1, 2, 3]t ] N = [2, 3/2, 5/2]t . Then,
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1/2 1/2 1 2 1
I N ,O ⎣2⎦ = ⎣1/2 1 1/2⎦ ⎣0⎦ = ⎣3/2⎦ = ⎣2⎦ .

3 O 1/2 1/2 1 2 5/2 3 N

According to Theorem 3.10,


⎡ ⎤−1 ⎡ ⎤ ⎡ ⎤
1 1 −1 1 1 0 2 3 3
[A] N ,O = ⎣−1 1 1⎦ A ⎣0 1 1⎦ = 1 ⎣
2
2 1 3⎦ .
1 −1 1 1 0 1 0 0 2

As to the verification,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 6 1 1 −1
A ⎣2⎦ = ⎣2⎦ = 4 ⎣−1⎦ + 4 ⎣ 1⎦ + 2 ⎣ 1⎦ .
3 6 1 −1 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤⎤
1 1 4 1
We find that [A] N ,O ⎣2⎦ = [A] N ,O ⎣0⎦ = ⎣4⎦ = ⎣ A ⎣2⎦⎦ . 
3 O 2 2 3 N

Exercises for Sect. 3.4


⎡ ⎤
3 1 0
Let A = ⎣ 0 1 3⎦ , V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c ∈ R},
−1 −1 −2
B = {[0, 3, −2]t , [2, −1, 0]t }, and let C = {[1, 1, −1]t , [3, 0, −1]t }.
1. Show that V is a two-dimensional subspace of R3×1 .
2. Show that A : V → V defined by A(v) = Av for v ∈ V is a well-defined function.
3. Show that B and C are bases for V.
4. Extend the basis B of V to a basis O for F3×1 .
5. Extend the basis C of V to a basis N for F3×1 .
6. Find the change of basis matrix I N ,O .
7. Verify that [v] N = I N ,O [v] O for v = [−4, −1, 2]t .
8. Find the matrix [A] N ,O .
9. Verify that [Av] N = [A] N ,O [v] O for v = [−4, −1, 2]t .
76 3 Matrix as a Linear Map

3.7 Equivalence and Similarity

In view of Theorem 3.10, two matrices A, B ∈ Fm×n are called equivalent iff there
exist invertible matrices P ∈ Fn×n and Q ∈ Fm×m such that B = Q −1 A P.
Notice that ‘is equivalent to’ is an equivalence relation on Fm×n . Equivalent matri-
ces represent the same matrix (linear transformation) with respect to possibly differ-
ent pairs of bases.
From Theorem 2.8, it follows that ranks of two equivalent matrices are equal.
We can construct a matrix of rank r relatively easily. Let r ≤ min{m, n}. The
matrix Rr ∈ Fm×n whose first r columns are the standard basis vectors e1 , . . . , er of
Fm×1 and all other columns are zero columns has rank r. This matrix, in block form,
looks as follows:  
Ir 0
Rr = .
0 0

Such a matrix is called a rank echelon matrix. If Rr is an m × n matrix, then


necessarily r ≤ min{m, n}. For ease in writing, we do not show the size of a rank
echelon matrix in the notation Rr ; we rather specify it in different contexts. From
Theorem 2.8, it follows that any matrix which is equivalent to Rr has also rank r.
Conversely, if a row of a matrix is a linear combination of other rows, then the
rank of the matrix is same as that of the matrix obtained by deleting such a row.
Similarly, deleting a column which is a linear combination of other columns does
not change the rank of the matrix. It is thus possible to perform elementary row and
column operations to bring a matrix of rank r to its rank echelon form Rr . We state
and prove this result rigorously.
Theorem 3.12 (Rank factorization) A matrix is of rank r iff it is equivalent to the
rank echelon matrix Rr of the same size.
Proof Let A ∈ Fm×n , and let Rr be the rank echelon matrix of size m × n. If A is
equivalent to Rr , then rank(A) = rank(Rr ) = r by Theorem 2.8.
For the converse, suppose rank(A) = r. Convert A to its RREF C = E 1 A, where
E 1 is a suitable product of elementary matrices. Now, each non-pivotal column of C
is a linear combination of its r pivotal columns.
Consider the matrix C t . The pivotal columns of C are now the rows eit in C t . Each
other row in C t is a linear combination of these rows. Exchange the rows of C t so
that the first r rows of the new matrix are e1t , . . . , ert in that order. That is, we have
a matrix E 2 , which is a product of elementary matrices of Type 1, such that E 2 C t
has the first r rows as e1t , . . . , ert and the next m − r rows are linear combinations of
these r rows.
Use suitable elementary row operations to zero-out the bottom m − r rows. This
is possible since each non-pivotal row is a linear combination of the pivotal rows
e1t , . . . , ert . Thus, we obtain the matrix E 3 E 2 C t , where E 3 is the suitable product of
elementary matrices used in reducing the bottom m − r rows to zero rows.
Then taking transpose, we see that C E 3t E 2t is a matrix whose first r columns are
e1 , . . . , er and all other columns are zero columns.
3.7 Equivalence and Similarity 77

To summarize, we have obtained three matrices E 1 , E 2 , and E 3 , which are prod-


ucts of elementary matrices such that
 
I 0
C E 3t E 2t = E 1 AE 3t E 2t = r = Rr .
0 0

Since E 1 , E 2t and E 3t are invertible, A and Rr are equivalent. 


Theorem 3.12 asserts that given any matrix A ∈ Fm×n of rank r, there exist invert-
ible matrices P ∈ Fn×n and Q ∈ Fm×m such that A = Q −1 Rr P. Thus, it is named
as rank factorization.
The rank factorization can be used to characterize equivalence of matrices. If A
and B are equivalent matrices, then clearly, they have the same rank. Conversely, if
two m × n matrices have the same rank r, then both of them are equivalent to Rr .
Then, they are equivalent. We thus obtain the rank theorem:
Two matrices of the same size are equivalent iff they have the same rank.
   
Ir 0 I
Observe that Rr = = r Ir 0 . Due to rank factorization, there
0 0 0
exist invertible matrices P and Q such that
 
Ir
A = Q −1 Rr P = Q −1 Ir 0 P = B C,
0

where  
Ir
B = Q −1 ∈ Fm×r , C = Ir 0 P ∈ Fr ×n .
0

Here, rank(B) = r = the number of columns in B. Similarly, rank(C) = r = the


number of rows in C. A matrix whose rank is equal to the number of rows or the
number of columns is called a full rank matrix. The above factorization A = BC
shows that
each matrix can be written as a product of full rank matrices.

This result is known as the full rank factorization.


The notion of equivalence stems from the change of bases in both the domain
and the co-domain of a matrix viewed as a linear transformation. In case the matrix
is a square matrix of order n, it is considered as a linear transformation on Fn×1 . If
we change the basis in Fn×1 , we would have a corresponding representation of the
matrix in the new basis.
Let A ∈ Fn×n , a square matrix of order n. The matrix A is (viewed as) a linear
transformation from Fn×1 to Fn×1 . Let E = {e1 , . . . , en } be the standard basis of
Fn×1 . The matrix A acts in the usual way: Ae j is the jth column of A. Suppose we
change the basis of Fn×1 to C = {v1 , . . . , vn }. That is, in both the domain and the
co-domain space, we take the new basis as C. If the equivalent matrix of A is M,
then for each v ∈ Fn×1 ,
78 3 Matrix as a Linear Map

[Av]C = P −1 A P [v]C , P = v1 · · · vn .

That is, the coordinate matrix of A is P −1 A P. This leads to similarity of two matrices.
We say that two matrices A, B ∈ Fn×n are similar iff B = P −1 A P for some
invertible matrix P ∈ Fn×n . Observe that in this case, B is the coordinate matrix of
A with respect to the basis that comprises the columns of P.
Example 3.13 Consider the basis N = {[1, −1, ⎡ 1] , [1, 1,⎤−1] , [−1, 1, 1] } for
t t t

1 1 1
R3×1 . To determine the matrix similar to A = ⎣−1 0 1⎦ when the basis has
0 1 0
changed from the standard basis to N , we construct the matrix P by taking the basis
vectors of N as in the following:
⎡ ⎤
1 1 −1
P = ⎣−1 1 1⎦ .
1 −1 1

Then, the matrix similar to A when the basis changes to N is


⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 1 1 1 1 1 1 −1 0 2 2
B = P −1 A P = 21 ⎣1 1 0⎦ ⎣−1 0 1⎦ ⎣−1 1 1⎦ = 1 ⎣
2
1 −1 3⎦ .
0 1 1 0 1 0 1 −1 1 −1 −1 3

From Example 3.12, we know that for u = [1, 2, 3]t ,


⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
2 4 0 2 2 2 4
[u] N = ⎣3/2⎦ , [Au] N = ⎣4⎦ , B [u] N = 21 ⎣ 1 −1 3⎦ ⎣3/2⎦ = ⎣4⎦ .
5/2 2 −1 −1 3 5/2 2

This verifies the condition [Au] N = B [u] N for the vector u = [1, 2, 3]t . 
We emphasize that if B = P −1 A P is a matrix similar to A, then the matrix A
as a linear transformation on Fn×1 with standard basis and the matrix B as a linear
transformation on Fn×1 with the basis consisting of columns of P are the same linear
transformation. Moreover, if C is the basis whose jth element is the jth column of
P, then for each vector v ∈ Fn×1 , [Av]C = P −1 A P [v]C .
Though equivalence is easy to characterize by the rank, similarity is much more
difficult. And we postpone this to a later chapter.
Exercises for Sect. 3.7
1. Let A ∈ Cm×n . Define T : C1×m → C1×n by T (x) = x A for x ∈ C1×m . Show that
T is a linear transformation. Identify T (etj ).
2. In each of the following cases, show that T is a linear transformation. Find the
matrix A so that T (x) = Ax. Determine rank(A). And then, construct the full rank
factorization of A.
3.8 Problems 79

(a) T : R3×1 → R2×1 , T ([a, b, c]t ) = [c, b + a]t


(b) Let T : R3×1 → R3×1 , T ([a, b, c]t ) = [a + b, 2a − b − c, a + b + c]t
3. Which matrices are equivalent to, and which are similar to
(a) the zero matrix (b) the identity matrix?
4. Consider the bases O = {u 1 , u 2 } and N = {v1 , v2 } for F2×1 , where
 u 1 = [1, 1] ,
t

−1 0
u 2 = [−1, 1]t , v1 = [2, 1]t , and v2 = [1, 0]t . Let A = .
0 1
(a) Compute Q = [A] O,O and R = [A] N ,N .
(b) Find the change of basis matrix P = I N ,O .
(c) Compute S = P Q P −1 .
(d) Is it true that R = S? Why?
(e) If S = [si j ], verify that Av1 = s11 v1 + s21 v2 , Av2 = s12 v1 + s22 v2 .

3.8 Problems

1. Let A ∈ Fn×n . Show that the following are equivalent to det(A) = 0 :


(a) The columns of A are linearly independent vectors in Fn×1 .
(b) The rows of A are linearly independent vectors in F1×n .
2. Let x, y, z ∈ Fn be linearly independent. Determine whether the following are
linearly independent:
(a) x + y, y + z, z + x (b) x − y, y − z, z − x
3. Let U and V be subspaces of Fn . Show the following:
(a) U ∩ V is a subspace of Fn .
(b) U ∪ V need not be a subspace of Fn .
(c) U + V = {u + v : u ∈ U, v ∈ V } is a subspace of Fn .
(d) If W is any subspace of Fn , U ⊆ W , and V ⊆ W, then U + V ⊆ W.
(e) dim(U + V ) + dim(U ∩ V ) = dim(U ) + dim(V ).
4. Let A ∈ Fm×n . Show that if the columns of A are linearly independent, then
N (A) = {0}. What if the rows of A are linearly independent?
5. Let C = AB, where A ∈ Fm×k and B ∈ Fk×n . Show the following:
(a) If both A and B have linearly independent columns, then the columns of C
are linearly independent.
(b) If both A and B have linearly independent rows, then the rows of C are
linearly independent.
(c) If B has linearly dependent columns, then the columns of C are linearly
dependent.
(d) If A has linearly dependent rows, then the rows of C are linearly dependent.
6. Let A, B ∈ Fn×n be such that N (A − B) = Fn×1 . Show that A = B.
80 3 Matrix as a Linear Map

7. Let A, B ∈ Fn×n be such that AB = 0. Show the following:


(a) R(B) ⊆ N (A) (b) rank(A) + rank(B) ≤ n
8. Let A = x y t for nonzero vectors x ∈ Fm×1 and y ∈ Fn×1 . Show that
(a) span{x} = R(A) (b) span{y t } = the row space of A
9. Is it true that if A ∈ Fm×n
, then both A and At have the same nullity?
10. Let B be the RREF of an m × n matrix A. Are the following true?
(a) The row spaces of A and B are equal.
(b) The column spaces of A and B are equal.
11. Let U be a subspace of Fn×1 .
(a) Does there exist a matrix in Fn×n such that U = R(A)?
(b) Does there exist a matrix in Fn×n such that U = N (A)?
12. Let A ∈ Fm×n . Let {u 1 , . . . , u k } be a basis for N (A). Argue that there exist vectors
v1 , . . . , vn−k so that {u 1 , . . . , u k , v1 , . . . , vn−k } is a basis for Fn×1 . Show that
{Av1 , . . . , Avn−k } is a basis for R(A). This will give an alternate proof of the rank
nullity theorem.
13. Let A ∈ Fm×n . Let P ∈ Fn×n and Q ∈ Fm×m be invertible matrices. Show the
following:
(a) N (Q A) = N (A) (b) R(A P) = R(A)
14. Using rank nullity theorem, deduce Theorem 2.8.
15. Let A, B ∈ Fn×n have the same rank. Is it true that rank(A2 ) = rank(B 2 )?
16. Let A, B ∈ Fn×n . Show that A is similar to B iff there exist matrices P, Q ∈ Fn×n ,
with P invertible, such that A = P Q and B = Q P.
17. Let A and B be similar matrices. Are the following pairs similar?
(a) At , B t (b) A−1 , B −1 (c) Ak , B k for k ∈ N (d) A − α I, B − α I
18. If A and B are similar matrices, show that tr(A) = tr(B).
19. Let A, B ∈ Fn×n have the same trace. Are A and B
(a) similar? (b) equivalent?
20. Let A, B ∈ Fn×n have the same determinant. Are A and B
(a) similar? (b) equivalent?
21. Is a rank factorization of a matrix unique?
22. Is a full rank factorization of a matrix unique?
23. Using rank factorization, show that for any m × k matrix A and k × n matrix B,
rank(AB) ≤ min{rank(A), rank(B)}.
24. Let A and B be m × n matrices. Using full rank factorization, show that rank(A +
B) ≤ rank(A) + rank(B).
25. Using the rank theorem, prove that the row rank of a matrix is equal to its column
rank.
Chapter 4
Orthogonality

4.1 Inner Products

The dot product in R3 is used to define the length of a vector and the angle between
two nonzero vectors. In particular, the dot product is used to determine when two
vectors become perpendicular to each other. This notion can be generalized to Fn .
For vectors u, v ∈ F1×n , we define their inner product as

u, v = uv∗ .

In case, F = R, in the definition of inner product, x ∗ becomes x t . For example, if


u = [1, 2, 3], v = [2, 1, 3], then u, v = 1 × 2 + 2 × 1 + 3 × 3 = 13.
Similarly, for x, y ∈ Fn×1 , we define their inner product as

x, y = y ∗ x.

In numerical examples, if u = (a1 , . . . , an ) and v = (b1 , . . . , bn ) ∈ Fn , then we


may also write their inner product with the ‘dot’ notation. That is,

u, v = (a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn .

The inner product satisfies the following properties:


For all x, y, z ∈ Fn and for all α, β ∈ F,
1. x, x ≥ 0.
2. x, x = 0 iff x = 0.
3. x, y = y, x.
4. x + y, z = x, z + y, z.
5. z, x + y = z, x + z, y.
6. αx, y = αx, y.
7. x, βy = βx, y.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 81
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_4
82 4 Orthogonality

In any vector space V, a function  ·  : V × V → F that satisfies Properties (1)–


(4) and (6) is called an inner product. And any vector space with an inner product is
called an inner product space. Properties (5) and (7) follow from the others.
The inner product gives rise to the length of a vector as in the familiar case of
R1×3 . We now call the generalized version of length as the norm.
For u ∈ Fn , we define its norm, denoted by u, as the non-negative square root
of u, u. That is, 
u = u, u.

The norm satisfies the following properties:


For all x, y ∈ Fn and for all α ∈ F,
1. x ≥ 0.
2. x = 0 iff x = 0.
3. αx = |α| x.
4. |x, y| ≤ x y. (Cauchy-Schwartz inequality)
5. x + y = x + y. (Triangle inequality)
A proof of Cauchy–Schwartz inequality goes as follows:
If y = 0, then both sides of the inequality are equal to 0. Else, y, y = 0. Write
x, y |x, y|2
α= . Then αx, y = αy, x = α αy, y = . So,
y, y y2

|x, y|2
0 ≤ x − αy, x − αy = x, x − αx, y − αy, x + ααx, y = x2 − .
y2

The triangle inequality is proved using Cauchy–Schwartz:

x + y2 = x + y, x + y = x2 + y2 + x, y + y, x


= x2 + y2 + 2 Rex, y ≤ x2 + y2 + 2|x, y|
 2
≤ x2 + y2 + 2x y = x + y .

Using these properties, the acute (non-obtuse) angle between any two nonzero
vectors can be defined.
Let x and y be nonzero vectors in Fn . The angle between x and y, denoted by
θ (x, y), is defined by
|x, y|
cos θ (x, y) = .
x y

We single out a particular case. For vectors x, y we say that x is orthogonal to


y, written as x ⊥ y, iff x, y = 0.
Notice that this definition allows x and y to be zero vectors. Also, the zero vector
is orthogonal to every vector. And, if x ⊥ y, then y ⊥ x; in this case, we say that x
and y are orthogonal vectors.
4.1 Inner Products 83

It follows that if x ⊥ y, then x2 + y2 = x + y2 for all vectors x, y. This
is referred to as Pythagoras law. The converse of Pythagoras law holds in Rn . That
is, for all x, y ∈ Rn , if x + y2 = x2 + y2 then x ⊥ y. But it does not hold in
Cn , in general.
A set of nonzero vectors in Fn is called an orthogonal set in Fn iff each vector in
the set is orthogonal to every other vector in the set. A singleton set with a nonzero
vector in it is assumed to be orthogonal.

Theorem 4.1 An orthogonal set of vectors is linearly independent.

Proof Let {v1 , . . . , vm } be an orthogonal set of vectors in Fn . Then they are nonzero
vectors. Assume that α1 v1 + · · · + αm vm = 0. For 1 ≤ j ≤ m,


m
αi vi , v j  = α1 v1 + · · · + αm vm , v j  = 0, v j  = 0.
i=1

If i = j, then vi , v j  = 0; and the sum above evaluates to α j v j , v j . So, α j v j ,


v j  = 0. As v j = 0, v j , v j  = 0. Hence α j = 0.
Therefore, {v1 , . . . , vm } is linearly independent. 

A vector v with v = 1 is called a unit vector. An orthogonal set of unit vectors
is called an orthonormal set. For instance, in F3 ,

{(1, 2, 3), (2, −1, 0)}

is an orthogonal set, but not orthonormal; whereas the following set is an orthonormal
set:    
√1 , √2 , √3 , √2 , √ −1
14 14 14 5 5
, 0 .

The standard basis {e1 , . . . , en } is an orthonormal set in Fn .


Orthogonal and orthonormal sets enjoy nice properties; we will come across them
in due course.
Exercises for Sect. 4.1

1. Compute (2, 1, 3), (1, 3, 2) and (1, 2i, 3), (1 + i, i, −3i).
2. Is {[1, 2, 3, −1]t , [2, −1, 0, 0]t , [0, 0, 1, 3]t } an orthogonal set in F4×1 ?
3. Show that if x, y ∈ Fn , then x ⊥ y implies x + y2 = x2 + y2 .
4. Show that if x, y ∈ Rn , then x + y2 = x2 + y2 implies x ⊥ y.
5. In C, the inner product is given by x, y = yx. Let x = 1 and y = i be two
vectors in C. Show that x2 + y2 = x + y2 but x, y = 0.
6. Show that the parallelogram law holds in Fn . That is, for all x, y ∈ Fn , we have
x + y2 + x − y2 = 2(x2 + y2 ).
7. If x and y are orthogonal vectors in Fn , then show that x ≤ x + y.
8. Construct an orthonormal set from {[1, 2, 0]t , [2, −1, 0]t , [0, 0, 2]t }.
84 4 Orthogonality

4.2 Gram–Schmidt Orthogonalization

An orthogonal set of vectors is linearly independent. Conversely, given n linearly


independent vectors v1 , . . . , vn (necessarily all nonzero), how do we construct an
orthogonal set from these? Let us consider a particular case.
Consider two linearly independent vectors u 1 , u 2 in R3 . The span of {u 1 , u 2 } is
a plane passing through the origin. We would like to obtain two orthogonal vectors
in the plane whose span is the same plane. Now, the set {u 1 } is as such orthogonal.
To construct a vector orthogonal to u 1 in the plane, we draw a perpendicular from
u 2 (from its end-point while its initial point is the origin) on u 1 . Then, u 2 minus the
foot of the perpendicular is expected to be orthogonal to u 2 .
Suppose u 1 and u 2 have been drawn with their initial points at the origin O. Let
A be the end-point of u 2 . Draw a perpendicular from A to the straight line (extend if
required) containing u 1 . Suppose the foot of the perpendicular is B. Now, O B A is a
right angled triangle. Let θ be the measure of the angle ∠AO B. Then

OB u 2 , u 1 
= cos θ = .
OA u 2  u 1 
−→
The required vector O B is the length of O A times cos θ times the unit vector in the
direction of u 1 . That is,

−→ u 2 , u 1  u 1 u 2 , u 1 
O B = u 2  = u1.
u 2  u 1  u 1  u 1 , u 1 
−→
Next, we define v2 as the vector u 2 − O B. Our construction says that

u 2 , v1 
v1 = u 1 , v2 = u 2 − v1 .
v1 , v1 

Clearly, span{u 1 , u 2 } = span{v1 , v2 }; and here is a verification of v2 ⊥ v1 :


 u 2 , v1  u 2 , v1 
v2 , v1  = u 2 − v1 , v1 = u 2 , v1  − v1 , v1  = 0.
v1 , v1  v1 , v1 

If more than two linearly independent vectors in Fn are given, we may continue
this process of taking away feet of the perpendiculars drawn from the last vector on all
the previous ones, assuming that the previous ones have already been orthogonalized.
It results in the process given in the following theorem.
4.2 Gram–Schmidt Orthogonalization 85

Theorem 4.2 (Gram–Schmidt orthogonalization) Let u 1 , . . . , u m ∈ Fn . Let 1 ≤


k ≤ m. Define

v1 = u 1
u 2 , v1 
v2 = u 2 − v1
v1 , v1 
..
.
u k , v1  u k , vk−1 
vk = u k − v1 − · · · − vk−1 .
v1 , v1  vk−1 , vk−1 

Then the following are true:


(1) span{v1 , . . . , vk } = span{u 1 , . . . , u k }.
(2) If {u 1 , . . . , u k } is linearly independent, then {v1 , . . . , vk } is orthogonal.
(3) If u k ∈ span{u 1 , . . . , u k−1 }, then vk = 0.…,

Proof We use induction on k. For k = 1, v1 = u 1 . First, span{u 1 } = span{v1 }. Sec-


ond, if {u 1 } is linearly independent, then u 1 = 0. Thus, {v1 } = {u 1 } is orthogonal.
Third, If u 1 ∈ span{u 1 , . . . , u k−1 } = span(∅) = {0}, then v1 = u 1 = 0.
Lay out the induction hypothesis that the statements in (1)–(3) are true for k =
j ≥ 1. We will show their truth for k = j + 1. For (1), we start with

u j+1 , v1  u j+1 , v j 
v j+1 = u j+1 − v1 − · · · − vj.
v1 , v1  v j , v j 

Clearly, v j+1 ∈ span{u j+1 , v1 , . . . , v j } and u j+1 ∈ span{v1 , . . . , v j+1 }. By the induc-
tion hypothesis, we have span{v1 , . . . , v j } = span{u 1 , . . . , u j }. Thus, any vector
which is a linear combination of u 1 , . . . , u j , u j+1 is a linear combination of
v1 , . . . , v j , u j+1 ; and then it is a linear combination of v1 , . . . , v j , v j+1 . Similarly, any
vector which is a linear combination of v1 , . . . , v j , v j+1 is also a linear combination
of u 1 , . . . , u j , u j+1 . Hence

span{v1 , . . . , v j , v j+1 } = span{u 1 , . . . , u j , u j+1 }.

For (2), assume that {u 1 , . . . , u j , u j+1 } is linearly independent. Then as a subset


of a linearly independent set, {u 1 , . . . , u j } is linearly independent. The induction
hypothesis implies that {v1 , . . . , v j } is orthogonal. For any i with 1 ≤ i ≤ j, we
have
u j+1 , v1  u j+1 , v j 
v j+1 , vi  = u j+1 , vi  − v1 , vi  − · · · − v j , vi .
v1 , v1  v j , v j 

Now, v1 , vi  = · · · = vi−1 , vi  = vi+1 , vi  = · · · = v j , vi  = 0 due to the


orthogonality of {v1 , . . . , v j }. Thus
86 4 Orthogonality

u j+1 , vi 
v j+1 , vi  = u j+1 , vi  − vi , vi  = 0.
vi , vi 

Therefore, {v1 , . . . , v j , v j+1 } is an orthogonal set.


For (3), suppose that u j+1 ∈ span{u 1 , . . . , u j }. Since v j+1 is a linear com-
bination of u j+1 , v1 , . . . , v j , and span{u 1 , . . . , u j } = span{v1 , . . . , v j }, we have
v j+1 ∈ span{v1 , . . . , v j }. That is, there exist scalars α1 , . . . , α j such that

v j+1 = α1 v1 + · · · + α j v j .

It implies v j+1 , v j+1  = α1 v1 , v j+1  + · · · + α j v j , v j+1 . Due to the orthogonality


of {v1 , . . . , v j , v j+1 }, we have v j+1 , v1  = 0, . . . , v j+1 , v j  = 0. So,
v j+1 , v j+1  = 0. This implies v j+1 = 0. 

Theorem 4.2 helps in constructing an orthogonal basis, determining linear inde-


pendence, and also extracting a basis.
Given an ordered set (list) of vectors from Fn , we apply Gram–Schmidt orthogo-
nalization. If the zero vector has been obtained in the process, then the list is linearly
dependent. Further, discarding all zero vectors, we obtain an orthogonal set, which
is a basis for the span of the given vectors. Also, the vectors corresponding to the
(nonzero) orthogonal vectors form a basis for the same subspace.

Example 4.1 Apply Gram–Schmidt Orthogonalization process on the set


{u 1 , u 2 , u 3 }, where u 1 = (1, 0, 0, 0), u 2 = (1, 1, 0, 0) and u 3 = (1, 1, 1, 0).

v1 = (1, 0, 0, 0).
u 2 , v1 
v2 = u 2 − v1 = (1, 1, 0, 0) − 1 (1, 0, 0, 0) = (0, 1, 0, 0).
v1 , v1 
u 3 , v1  u 3 , v2 
v3 = u 3 − v1 − v2
v1 , v1  v2 , v2 
= (1, 1, 1, 0) − (1, 0, 0, 0) − (0, 1, 0, 0) = (0, 0, 1, 0).

The vectors (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) are orthogonal; and they form
a basis for the subspace U = span{u 1 , u 2 , u 3 } of F4 . Also, {u 1 , u 2 , u 3 } is linearly
independent, and it is a basis of U. 

Example 4.2 Apply Gram–Schmidt orthogonalization on {u 1 , u 2 , u 3 }, where u 1 =


[1, 1, 0, 1], u 2 = [0, 1, 1, −1] and u 3 = [1, 3, 2, −1].

v1 = [1, 1, 0, 1].
u 2 , v1 
v2 = u 2 − v1 = [0, 1, 1, −1] − 0 [1, 1, 0, 1] = [0, 1, 1, −1].
v1 , v1 
u 3 , v1  u 3 , v2 
v3 = u 3 − v1 − v2
v1 , v1  v2 , v2 
4.2 Gram–Schmidt Orthogonalization 87

[1, 3, 2, −1] · [1, 1, 0, 1]


= [1, 3, 2, −1] − [1, 1, 0, 1]
[1, 1, 0, 1] · [1, 1, 0, 1]
[1, 3, 2, −1] · [0, 1, 1, −1]
− [0, 1, 1, −1]
[0, 1, 1, −1] · [0, 1, 1, −1]
= [1, 3, 2, −1] − [1, 1, 0, 1] − 2[0, 1, 1, −1] = [0, 0, 0, 0].

Since v3 = 0, we have U = span{u 1 , u 2 , u 3 } = span{v1 , v2 }. Further, discarding


u 3 that corresponds to v3 = 0, we see that U = span{u 1 , u 2 }. Indeed, the process
also revealed that u 3 = v1 − 2u 2 = u 1 − 2u 2 . Hence, {u 1 , u 2 } and the orthogonal
set {v1 , v2 } are bases for U. 

Observe that in Gram–Schmidt process, at any stage if the computed vector turns
out to be a zero vector, it is to be discarded, and the process carries over to the next
stage.
Exercises for Sect. 4.2
1. Orthogonalize the vectors (1, 1, 1), (1, 0, 1), and (0, 1, 2) using Gram–Schmidt
process.
2. How do we use Gram–Schmidt to compute the rank of a matrix?
3. Construct orthonormal sets that are bases for span{u 1 , u 2 , u 3 }, where u 1 , u 2 , u 3
are the vectors in Examples 4.1–4.2.
4. Show that the cross product u × v of two linearly independent vectors u, v in
R1×3 is orthogonal to both u and v. How to obtain this third vector as u × v by
Gram–Schmidt process?

4.3 QR-Factorization

An orthogonal (ordered) set can be made orthonormal by dividing each vector by


its norm. Also you can modify Gram–Schmidt orthogonalization process to directly
output orthonormal vectors. The modified version is as follows.
Gram–Schmidt Orthonormalization:

v1 = u 1 ; w1 = v1 /v1 
v2 = u 2 − u 2 , w1  w1 ; w2 = v2 /v2 
..
.
vm = u m − u m , w1  w1 − · · · − u m , wm−1  wm−1 ; wm = vm /vm .

If during computation, we find that vk = 0, then we take wk = 0. We know that


this can happen when u k ∈ span{u 1 , . . . , u k−1 }. Thus, linear dependence can also be
determined in this orthonormalization process, as earlier. Further, such u k , vk and
wk may be discarded from the list for carrying out the process further in order to
compute an orthonormal set that is a basis for the span of the vectors u 1 , . . . , u m .
88 4 Orthogonality

Given a list of linearly independent vectors u 1 , . . . , u m , the orthonormalization


process computes a list of orthonormal vectors w1 , . . . , wm with the property that

span{u 1 , . . . , u k } = span{w1 , . . . , wk } for 1 ≤ k ≤ m.

Let V be a subspace of Fn . An orthogonal (ordered) subset of V which is also


a basis of V is called an orthogonal basis of V. Similarly, when an orthonormal
(ordered) set is a basis of V, the set is said to be an orthonormal basis of V.
For example, the standard basis {e1 , . . . , en } of Fn×1 is an orthonormal basis of
F . Similarly, {e1t , . . . , ent } is an orthonormal basis of F1×n .
n×1

Since Gram–Schmidt processes construct orthogonal and orthonormal bases from


a given basis of any subspace of Fn , it follows that
every subspace of Fn has an orthogonal basis, and also an orthonormal basis.

Example 4.3 Applying Gram–Schmidt orthonormalization, construct an orthonor-


mal basis for U = span{u 1 , u 2 , u 3 , u 4 }, where u 1 = (1, 1, 0), u 2 = (0, 1, 1), u 3 =
(1, 0, 1) and u 4 = (2, 2, 2).

v1 = u 1 = (1, 1, 0)
 
w1 = v1 /v1  = √12 , √12 , 0
 
v2 = u 2 − u 2 , w1  w1 = (0, 1, 1) − (0, 1, 1) · ( √12 , √12 , 0) ( √12 , √12 , 0)
   
= (0, 1, 1) − √12 √12 , √12 , 0 = − 21 , 21 , 1
√    √ 
w2 = v2 /v2  = √23 − 21 , 21 , 1 = − √16 , √16 , √23
v3 = u 3 − u 3 , w1  w1 − u 3 , w2  w2
   
= (1, 0, 1) − (1, 0, 1) · √12 , √12 , 0) √12 , √12 , 0
  √   √ 
− (1, 0, 1) · − √16 , √16 , √23 − √16 , √16 , √23
     
= (1, 0, 1) − 21 , 21 , 0 − − 16 , 16 , 13 = 23 , − 23 , 23
√    
w3 = v3 /v3  = 23 23 , − 23 , 23 = √13 , − √13 , √13
v4 = u 4 − u 4 , w1  w1 − u 4 , w2  w2 − u 4 , w3  w3 = 0
w4 = 0.
   √   1 
The set √1 , √1 , 0 , − √1 , √1 , √2 , √ , − √1 , √1 is the required
2 2 6 6 3 3 3 3
orthonormal basis of U. Notice that U = F3 . 

We have discussed two ways of extracting a basis for the span of a finite number
of vectors from Fn . One is the method of elementary row operations and the other is
Gram–Schmidt orthogonalization or orthonormalization. The latter is a superior tool
though computationally difficult. We now see one of its applications in factorizing a
matrix.
4.3 QR-Factorization 89

Theorem 4.3 (QR-factorization) Let the columns of A ∈ Fm×n be linearly indepen-


dent. Then there exist a matrix Q ∈ Fm×n with orthonormal columns, and an upper
triangular invertible matrix R ∈ Fn×n such that A = Q R. Further, if the diagonal
entries in R are real, and they are either all positive, or all negative, then both Q
and R are uniquely determined from A.

Proof Write the columns of A ∈ Fm×n as u 1 , u 2 , . . . , u n in that order. These vectors


are linearly independent in Fm×1 . So, m ≥ n. Using Gram–Schmidt orthonormal-
ization on the list of vectors u 1 , . . . , u n , we obtain an orthonormal list of vectors,
say, w1 , w2 , . . . , wn . Now, span{u 1 , . . . , u k } = span{w1 , . . . , wk } for 1 ≤ k ≤ n. In
particular, u k is a linear combination of w1 , . . . , wk .
Hence there exist scalars ai j , 1 ≤ i ≤ j ≤ n such that

u 1 = a11 w1
u 2 = a12 w1 + a22 w2
..
.
u k = a1k w1 + · · · + akk wk
..
.
u n = a1n w1 + · · · + akn wk + · · · + ann wn

We take ai j = 0 for i > j, and form the matrix


⎡ ⎤
a11 a12 · · · a1n
⎢ a22 · · · a2n ⎥
⎢ ⎥
R = [ai j ] = ⎢ .. ⎥.
⎣ . ... ⎦
ann

Since u 1 , . . . , u k are linearly independent,

uk ∈
/ span{u 1 , . . . , u k−1 } = span{w1 , . . . , wk−1 }.

Thus, akk is nonzero for each k. Then, R is an upper triangular invertible matrix.
Construct Q = [w1 · · · wn ]. Since {w1 , . . . , wn } is an orthonormal set, Q ∗ Q =
I. Of course, if all entries of A are real, then so are the entries of Q and R. In that
case, Q t Q = I. Moreover, for 1 ≤ k ≤ n,
 t
Q Rek = Q a1k , · · · , akk , 0, · · · , 0 = Q(a1k e1 + · · · + akk ek )
= a1k Qe1 + · · · + akk Qek = a1k w1 + · · · + akk wk = u k .

That is, the kth column of Q R is same as the kth column of A for each k ∈
{1, . . . , n}. Therefore, Q R = A.
For uniqueness of the factorization, suppose that
90 4 Orthogonality

A = Q 1 R1 = Q 2 R2 ,
Q 1 , Q 2 ∈ Fm×n satisfy Q ∗1 Q 1 = Q ∗2 Q 2 = I,
R1 = [ai j ], R2 = [bi j ] ∈ Fn×n are upper triangular, and
akk > 0, bkk > 0 for each k ∈ {1, . . . , n}.

We will see later what happens if the diagonal entries of R1 and of R2 are all
negative. Then

R1∗ R1 = R1∗ Q ∗1 Q 1 R1 = (Q 1 R1 )∗ Q 1 R1 = A∗ A = (Q 2 R2 )∗ Q 2 R2 = R2∗ R2 .

Notice that R1 , R2 , R1∗ and R2∗ are all invertible matrices. Multiplying (R2∗ )−1 on
the left, and (R1 )−1 on the right, we have

(R2∗ )−1 R1∗ R1 (R1 )−1 = (R2∗ )−1 R2∗ R2 (R1 )−1 .

It implies
(R2∗ )−1 R1∗ = R2 R1−1 .

Here, the matrix on the left is a lower triangular matrix and that on the right is an upper
triangular matrix. Therefore, both are diagonal. Comparing the diagonal entries in
the products we have

[(bii )−1 ]∗ aii∗ = bii (aii )−1 for 1 ≤ i ≤ n.

That is, |aii |2 = |bii |2 . Since aii > 0 and bii > 0, we see that aii = bii for 1 ≤ i ≤ n.
Hence (R2−1 )∗ R1∗ = R2 R1−1 = I. Therefore,

R2 = R1 , Q 2 = A R2−1 = A R1−1 = Q 1 .

Observe that in the factorization A = Q R, if all diagonal entries of R are negative,


then we have akk < 0 and bkk < 0 for all k. Now, |aii |2 = |bii |2 implies aii = bii , in
the above proof. Consequently, uniqueness of Q and R follows. 

The uniqueness of QR-factorization states that if Q is selected from the set of


all matrices with Q ∗ Q = I, and R is selected from the set of all upper triangular
matrices with positive (or negative) diagonal entries, then there is only one such Q
and there is only one such R that satisfy A = Q R.
Observe that the columns of the matrix Q in the QR-factorization of a
matrix A is computed by orthonormalizing the columns of A. And the the matrix R
is computed by collecting the coefficients from the orthonormalization process. We
may compute R by taking R = Q ∗ A.
⎡ ⎤
1 1
Example 4.4 Consider the matrix A = ⎣ 0 1 ⎦ for QR-factorization.
1 1
We orthonormalize the columns of A to get Q, and then take R = Q t A. So,
4.3 QR-Factorization 91
⎡ √ ⎤
1/ 2 0 √ √ 
2 2
Q=⎣ 0 1 ⎦, R=Q A= t
.

1/ 2
0 1
0

It is easy to check that A = Q R.


Further, if all diagonal entries of R are required to be positive, then the Q and R
we have computed are the only possible matrices satisfying A = Q R.
Observe that A = (−Q)(−R) is also another QR-factorization of A, where
the diagonal entries of R are negative. Again, this would be the only such QR-
factorization of A. 

Exercises for Sect. 4.3


 √ √ √ √ √ 
1. Find u ∈ R1×3 so that [1/ 3, 1/ 3, 1/ 3], [1/ 2, 0, −1/ 2], u is an orthonormal
set. Form a matrix with the vectors as rows, in that order. Verify that the columns
of the matrix are also orthonormal. 
−1
2. Show that 1+i 2
, 1−i
2
), ( √i , √
2 2
is an orthonormal set in C2 . Express (2 +
4i, −2i) as a linear combination of the given orthonormal vectors.
3. Let {u, v} be an orthonormal set in C2×1 . Let x = (4 + i)u + (2 − 3i)v. Deter-
mine the values of u ∗ x, v∗ x, x ∗ u, x ∗ v, and x.
4. Let {u, v} be an orthonormal basis for R2×1 . Let x ∈ R2×1 be a unit vector. If
x t u = 21 , then what could be x t v?
5. Let A ∈ Fm×n . Show the following:
(a) For m ≤ n, A A∗ = I iff the rows of A are orthonormal.
(b) For m ≥ n, A∗ A = I iff the columns of A are orthonormal.
6. Find a QR-factorization of each of the following
⎡ matrices:

⎡ ⎤ ⎡ ⎤ 1 1 2
0 1 1 0 2 ⎢0 1 −1⎥
(a) ⎣1 1⎦ (b) ⎣0 1 1⎦ (c) ⎢
⎣1 1

0⎦
0 1 1 2 0
0 0 1

4.4 Orthogonal Projection

Any vector v = (a, b, c) ∈ F3 can be expressed as v = ae1 + be2 + ce3 . Taking inner
product of v with the basis vectors we find that v, e1  = a, v, e2  = b and v, e3  =
c. Then v = v, e1 e1 + v, e2 e2 + v, e3 e3 .
Such an equality holds for any orthonormal basis for a subspace of Fn .

Theorem 4.4 Let {v1 , . . . , vm } be an orthonormal basis for a subspace V of Fn .


Then the following are true for each x ∈ V :
(1) (Fourier Expansion) x = x, v1 v1 + · · · + x, vm vm .
(2) (Parseval Identity) x2 = |x, v1 |2 + · · · + |x, vm |2 .
92 4 Orthogonality

Proof (1) Let x ∈ V. There exist scalars α1 , . . . , αm such that

x = α1 v1 + · · · + αm vm .

Let i ∈ {1, . . . , m}. Taking inner product of x with vi , and using the fact that
vi , v j  = 0 for j = i, and vi , vi  = 1, we have

x, vi  = αi vi , vi  = αi .

It then follows that x = x, v1 v1 + · · · + x, vm vm .



(2) Let x ∈ V. By (1), x = mj=1 x, v j v j . Then
 m m
m
|x2 = x, x = j=1 x, v j v j , x = j=1 x, v j v j , x = j=1 |x, v j | . 
2

Recall that in Gram–Schmidt orthogonalization, we compute v2 by taking away


the foot of the perpendicular drawn from u 2 on u 1 . For computation of vk+1 , we
take away the feet of the perpendiculars drawn from u k+1 on the already constructed
vectors v1 , . . . , vk . Let us use an abbreviation. For a subspace V of Fn , and any vector
w ∈ Fn , we abbreviate

for each v ∈ V, w ⊥ v to w ⊥ V.

If V = span{v1 , . . . , vm }, then corresponding to any v ∈ V, we have scalars


α1 , . . . , αm such that v = α1 v1 + · · · + αm vm . Now, if w is orthogonal to each vi ,
then w ⊥ v. Hence, w ⊥ V iff w ⊥ vi for each i = 1, . . . , m.
Write Vk = span{v1 , . . . , vk }, and denote the foot of the perpendicular drawn from
u k+1 on Vk by w. Gram–Schmidt orthogonalization process shows that w ∈ Vk and
vk+1 − w ⊥ Vk . It raises a question. Does the orthogonality condition vk+1 − w ⊥ Vk
uniquely determine a vector w from Vk ?

Theorem 4.5 Let V be a subspace of Fn . Corresponding to each x ∈ Fn , there


{v1 , . . . , vm } is any
exists a unique vector y ∈ V such that x − y ⊥ V. Further, if
orthonormal basis of V, then such a vector y is given by y = mj=1 x, v j v j .

Proof The subspace V of Fn has an orthonormal basis. Let {v1 , . . . , vn } be an


orthonormal basis of V. Write

y = x, v1 v1 + · · · + x, vm vm .

Now, y ∈ V. For 1 ≤ i, j ≤ m, we have v j , vi  = 0 for j = i, and vi , vi  = 1.


Then

m 
m
y, vi  = x, v j v j , vi = x, v j v j , vi  = x, vi .
j=1 j=1

That is, x − y ⊥ vi for each i ∈ {1, . . . , m}. It follows that x − y ⊥ V.


4.4 Orthogonal Projection 93

For uniqueness, let y ∈ V and z ∈ V be such that x − y, v = x − z, v = 0 for


each v ∈ V. Then

y − z, v = (x − z) − (x − y), v = 0 for each v ∈ V.

In particular, with v = y − z, we have y − z, y − z = 0. It implies y = z. 


In view of Theorem 4.5, we give a name to such a vector y satisfying the orthog-
onality condition x − y ⊥ V. Intuitively, such a vector y is the foot of the perpen-
dicular drawn from x to the subspace V. Notice that if x ∈ V, then the foot of the
perpendicular drawn from x on V is x itself. We call this foot of the perpendicular
as the orthogonal projection of x onto the subspace V, and write it as projV (x).
That is,
Given x ∈ Fn and a subspace V of Fn , projV (x) is the
 unique vector in V that satisfies
the condition x − projV (x) ⊥ V. Further, projV (x) = mj=1 x, v j v j , where {v1 , . . . , vm }
is any orthonormal basis of V.

Using this projection vector, we obtain the following useful inequality.


Theorem 4.6 (Bessel Inequality) Let {v1 , . . . , vm } be an orthonormal
 basis for a
subspace V of Fn . Then for each x ∈ Fn , x2 ≥ projV (x) = mj=1 |x, v j |2 .

Proof Let x ∈ Fn . Write y = projV (x) = mj=1 x, v j v j . Now, y ∈ V and x − y ⊥
v for each v ∈ V. In particular, x − y ⊥ y. Then Pythagoras theorem and Parseval
identity imply that

x2 = x − y2 + y2 ≥ y2 = mj=1 |x, v j |2 . 
An alternate proof of Bessel inequality may be constructed by using extension of
a basis of a subspace to the whole space. If x ∈ span{v1 , . . . , vm }, then by Parseval
identity, we have x2 = mj=1 |x, v j |2 . Otherwise, we extend the orthonormal set
{v1 , . . . , vm } to a basis of Fn ; and then, use Gram–Schmidt orthonormalization to
obtain an orthonormal
n basis {v1 , . 
. . , vm , vm+1 , . . . , vn } for Fn . Then Parseval identity
gives x = j=1 |x, v j | ≥ mj=1 |x, v j |2 .
2 2

When working with column vectors, the projection vector projV (x) can be seen
as a matrix product.
For this, let {v1 , . . . , vm } be an orthonormal basis of a subspace V of Fn×1 .
Let x ∈ Fn×1 . Write z = [c1 , · · · , cm ]t ∈ Fm×1 with c j = x, v j  = v∗j x, and P =
[v1 · · · vm ]. Then

⎡ ⎤ ⎡ ∗ ⎤
c1 v1 x
⎢ .. ⎥ ⎢ .. ⎥ 
m 
m
z = ⎣ . ⎦ = ⎣ . ⎦ = P ∗ x, y= x, v j v j = c j v j = Pz = P P ∗ x.
cm vm∗ x j=1 j=1

Notice that P P ∗ x = projV (x) for each x ∈ Fn×1 . Due to this reason, the matrix
P P ∗ ∈ Fn×n is called the projection matrix that projects vectors of Fn×1 onto the
subspace V.
94 4 Orthogonality

Example 4.5 Let V = span{[1, 0, −1]t , [1, 1, 1]t } and let x = [1, 2, 3]t . Compute
the projection matrix that projects vectors of F3×1 onto V, and projV (x).
Since the vectors [1, 0, −1]t and [1, 1, 1]t are orthogonal, an orthonormal basis
for V is given by {v1 , v2 }, where
√ √ √ √ √
v1 = [1/ 2, 0, −1/ 2]t , v2 = [1/ 3, 1/ 3, 1/ 3]t .

With P = [v1 v2 ], the projection matrix is given by


⎡ √ √ ⎤ ⎡ ⎤
1/ 3  √
1/ 2 √  5/6 1/3 −1/6
⎣ 0 1/√3⎦ /√2 0√
1 −1/ 2
P P∗ = √
1/ 3 1/ 3 1/ 3
= ⎣ 1/3 1/3 1/3 ⎦ .
√ √
−1/ 2 1/ 3 −1/6 1/3 5/6

Then projV (x) = P P ∗ x = [1, 2, 3]t .


Also, directly from the orthonormal basis for V, we obtain

projV (x) = x, v1 v1 + x, v2 v2 = [1, 2, 3]t .

Indeed, x = −1[1, 0, −1]t + 2[1, 1, 1]t ∈ V. Thus, projV (x) = x. 


Exercises for Sect. 4.4
1. Let {u, v, w} be an orthonormal basis for a subspace of R5×1 and let x = au +
bv + cw. If x = 5, x, u = 4 and x ⊥ v, then what are the possible values of
a, b and c?
2. Let A be the 4 × 2 matrix whose rows are [1/2, − 1/2], [1/2, 1/2], [1/2, − 1/2] and
[1/2, 1/2].
(a) Determine the projection matrix P that projects vectors onto R(A).
(b) Find an orthonormal basis for N (At ), and then determine the
projection matrix Q that projects vectors onto N (At ).
(c) Is it true that P 2 = P t = P, and Q 2 = Q t = Q?
3. Let y be the orthogonal projection of a vector x ∈ Fn×1 onto a subspace V of
Fn×1 . Show that y2 = x, y.

4.5 Best Approximation and Least Squares Solution

From the orthogonality condition, we guess that the length of x − projV (x) is the
smallest among the lengths of all vectors x − v, when v varies over V. If this intuition
goes well, then projV (x) would be closest to x compared to any other vector from
V. Further, we may think of projV (x) as an approximation of x from the subspace
V. We show that our intuition is correct.
Let V be a subspace of Fn and let x ∈ Fn . A vector u ∈ V is called a best approx-
imation of x from V iff x − u ≤ x − v for each v ∈ V.
Theorem 4.7 Let V be a subspace of Fn and let x ∈ Fn . Then projV (x) is the unique
best approximation of x from V.
4.5 Best Approximation and Least Squares Solution 95

Proof Let y = projV (x). We show the following:


1. x − y2 ≤ x − v2 for each v ∈ V.
2. If u ∈ V satisfies x − u ≤ x − v for each v ∈ V, then u = y.
(1) Let v ∈ V. Since y ∈ V and y − v ∈ V, by Theorem 4.5, x − y ⊥ y − v. By
Pythagoras theorem,

x − v2 = x − y + y − v2 = x − y2 + y − v2 ≥ x − y2 .

(2) Let u ∈ V satisfy x − u ≤ x − v for each v ∈ V. In particular, with v = y,


we have x − u ≤ x − y. By (1), x − y ≤ x − u. That is,

x − u = x − y.

Since y − u ∈ V, by Theorem 4.5, x − y ⊥ y − u. By Pythagoras theorem,

x − u2 = x − y + y − u2 = x − y2 + y − u2 = x − u2 + y − u2 .

Hence y − u2 = 0. Therefore, u = y. 

Given a subspace V of Fn and a vector x ∈ Fn , Theorem 4.7 implies that a vector


y ∈ V is a best approximation of x from V iff the orthogonality condition is satisfied.
Due to the uniqueness results, it provides two ways of computing the best approxima-
tion, which correspond to computing projV (x) by employing an orthonormal basis,
or by using the orthogonality condition directly.
Starting from any basis for V, we employ Gram–Schmidt orthonormalization to
obtain an orthonormal basis {v1 , . . . , vm } for V. Then the best approximation y of x
from V is given by
y = x, v1 v1 + · · · + x, vm vm .

Also, the best approximation may be computed by using the projection matrix.
In the second approach, we look for a vector y that satisfies the orthogonality
condition:
x − y ⊥ v for each vector v ∈ V.

If {v1 , . . . , vm } is a basis for V, then this condition is equivalent to

x − y ⊥ v j for each j = 1, . . . , m.

Since y ∈ V, we may write y = mj=1 α j v j . Then we determine the scalars α j
m
so that for i = 1, . . . , m, x − j=1 α j v j , vi  = 0. That is, we solve the following
linear system:
v1 , v1  α1 + · · · + vm , v1 αm = x, v1 
..
.
v1 , vm  α1 + · · · + vm , vm αm = x, vm 
96 4 Orthogonality

Theorem 4.7 guarantees that this linear system has a unique solution. Further, the
system matrix of this linear system is A = [ai j ], where ai j = v j , vi . Such a matrix
which results by taking the inner products of basis vectors is called a Gram matrix.
Theorem 4.7 implies that a Gram matrix is invertible. Can you prove directly that a
Gram matrix is invertible?

Example 4.6 Find the best approximation of x = (1, 0) ∈ R2 from V = {(a, a) :


a ∈ R}.  √ √ 
An orthogonal basis for V is 1/ 2, 1/ 2 . Thus, the best approximation of x
from V is given by
 √ √  √ √  
projV (x) = x, v1 v1 = (1, 0) · (1/ 2, 1/ 2) (1/ 2, 1/ 2) = 1/2, 1/2 .

For illustration, we redo it using the projection matrix. Instead of R2 , we now


workin R2×1 . We have x = [1, 0] and V = span{[1, 1] }. An orthonormal basis for
t t

V is √2 [1, 1] . Then the projection matrix onto V is given by


1 t

   
1 √1   1 1 1
P P ∗ = √12 1 1 = 2 1 1
.
1 2

And the best approximation of x from V is y = P P ∗ x = [1/2 1/2]t .


In the second approach, we seek (α, α) so that (1, 0) − (α, α) ⊥ (β, β) for all
 (α, α) so that (1 − α, −α) · (1, 1) = 0. So, α = /2. The best
β. That is, to find 1

approximation is /2, /2 as obtained earlier.


1 1 

We may use the technique of taking the best approximation for approximating a
solution of a linear system.
Let A ∈ Fm×n and let b ∈ Fm×1 . A vector u ∈ Fn×1 is a called a least squares
solution of the linear system Ax = b iff Au − b ≤ Az − b for all z ∈ Fm×1 .
Theorem 4.7 implies that u ∈ Fn×1 is a least squares solution of Ax = b iff Au is
the best approximation of b from R(A). This best approximation Au can be computed
uniquely from the orthogonality condition Au − b ⊥ R(A). However, the vector u
can be uniquely determined when A is one-one, that is, when the homogeneous
system Ax = 0 has a unique solution. We summarize these facts in the following
theorem.

Theorem 4.8 Let A ∈ Fm×n and let b ∈ Fm×1 .


(1) The linear system Ax = b has a least squares solution.
(2) A vector u ∈ Fn×1 is a least squares solution of Ax = b iff Au − b ⊥ v for each
v ∈ R(A).
(3) A vector u ∈ Fn×1 is a least squares solution of Ax = b iff it is a solution of
A∗ Ax = A∗ b.
(4) A least squares solution is unique iff N (A) = {0}.

Proof We prove only (3); others are obvious from the discussion we had. For this,
let u 1 , . . . , u n be the n columns of A. Since the range space R(A) is equal to
span{u 1 , . . . , u n }, by (2), we obtain
4.5 Best Approximation and Least Squares Solution 97

u is a least squares solution of Ax = b


iff Au − b, u i  = 0, for i = 1, . . . , n
iff u i∗ (Au − b) = 0 for i = 1, . . . , n
iff A∗ (Au − b) = 0
iff A∗ Au = A∗ b. 

If A ∈ Rm×n and b ∈ Rm×1 , then a least squares solution u of the system Ax = b


satisfies At Au = At b.
Least squares solutions are helpful in those cases where some errors in data lead
to an inconsistent system.
     
1 1 0 1
Example 4.7 Let A = , b= , and let u = .
0 0 1 −1
We see that At Au = At b. Hence u is a least squares solution of Ax = b.
Notice that Ax = b does not have a solution. 

Alternatively, we may use QR-factorization in computing a least squares solution


of a linear system.

Theorem 4.9 Let A ∈ Fm×n have linearly independent columns. Let A = Q R be


a QR-factorization of A. Then, the least squares solution of Ax = b is given by
u = R −1 Q ∗ b.

Proof As columns of A are linearly independent, null(A) = n − rank(A) = 0. That


is, N (A) = {0}. By Theorem 4.8, there exists a unique least squares solution u
of Ax = b. Moreover, the least squares solution u of Ax = b satisfies A∗ Au =
A∗ b. Plugging in A = Q R, we have R ∗ Q ∗ Q Ru = R ∗ Q ∗ b. As Q ∗ Q = I and R ∗ is
invertible, we obtain Ru = Q ∗ b. Thus, u = R −1 Q ∗ b. 

However, u = R −1 Q ∗ b need not be a solution of Ax = b. The reason is, Q has


orthonormal columns, but it need not have orthonormal rows. Consequently, Q Q ∗
need not be equal to I. Then Au = Q R R −1 Q ∗ b = Q Q ∗ b need not be equal to b.
On the other hand, if a solution v exists for Ax = b, then Av = b. It implies

0 = Av − b ≤ Aw − b for each w ∈ Fn×1 .

Therefore, every solution of Ax = b is a least squares solution.


Observe that when a QR-factorization of A has already been obtained one com-
putes the least squares solution by solving the linear system Rx = Q ∗ b, which is an
easy task since R is upper triangular.
However, for such a computation of the least squares solution of Ax = b, the
columns of A must be linearly independent. This implies that m ≥ n. That is, the
number of equations must not be less than the number of unknowns.
Exercises for Sect. 4.5
1. Find the best approximation of x ∈ U from V where
98 4 Orthogonality

(a) U = R3 , x = (1, 2, 1), V = span{(3, 1, 2), (1, 0, 1)}


(b) U = R3 , x = (1, 2, 1), V = {(α, β, γ ) ∈ R3 : α + β + γ = 0}
(c) U = R4 , x = (1, 0, −1, 1), V = span{(1, 0, −1, 1), (0, 0, 1, 1)}
= b, where
2. Find least squares solution(s) of the system Ax ⎡ ⎤ ⎡ ⎤
⎡ ⎤ ⎡ ⎤ 1 1 1 0
3 1 1 ⎢ −1 0 1 ⎥ ⎢ 1⎥
(a) A = ⎣ 1 2 ⎦ , b = ⎣ 0 ⎦ (b) A = ⎣ ⎢ ⎥, b=⎢ ⎥
1 −1 0 ⎦ ⎣ −1 ⎦
2 −1 −2
0 1 −1 −2

4.6 Problems
 u, v
1. For a nonzero vector v and any vector u, show that u − v, v = 0. Then
v2
use Pythagoras theorem to derive Cauchy–Schwartz inequality.
2. Let n > 1. Using a unit vector u ∈ Rn×1 construct infinitely many matrices
A ∈ Rn×n so that A2 = I.
3. Let x and y be linearly independent vectors in Fn×1 . Let A = x y ∗ + yx ∗ . Show
that rank(A) = 2.
4. Fundamental subspaces: Let A ∈ Fm×n . Prove the following:
(a) N (A) = {x ∈ Fn×1 : x ⊥ y for each y ∈ R(A∗ )}.
(b) N (A∗ ) = {y ∈ Fm×1 : y ⊥ z for each z ∈ R(A)}.
(c) R(A) = {y ∈ Fm×1 : y ⊥ z for each z ∈ N (A∗ )}.
(d) R(A∗ ) = {x ∈ Fn×1 : x ⊥ u for each u ∈ N (A)}.
5. Let A ∈ Fm×n . Let B and E be bases for the subspaces N (A) and R(A∗ ) of
Fn×1 , respectively. Show that B ∪ E is a basis of Fn×1 .
6. Find a basis for N (A), where A has rowsas [1, 1, 1, −1] and [1, 1, 3, 5]. Using
this basis, extend the orthonormal set 21 [1, 1, 1, −1]t , 16 [1, 1, 3, 5]t to an
orthonormal set for R4×1 .
7. Let A ∈ Fm×n . Show that N (A∗ A) = N (A).
8. Let x̂ be a least squares solution of the linear system Ax = b for an m × n matrix
A. Show that an n × 1 vector y is a solution of Ax = b iff y = x̂ + z for some
z ∈ N (A).
9. Let {v1 , . . . , vm } be a basis for a subspace V of Fn . Let A = [ai j ] be the Gram
matrix, where ai j = v j , vi . Show that the Gram matrix A satisfies A∗ = A,
and x ∗ Ax > 0 for each nonzero x ∈ Fm . Conclude that the Gram matrix is
invertible.
10. Let A ∈ Fm×n have orthogonal columns and let b ∈ Fm×1 . Suppose that
y = [y1 , · · · , yn ]t is a least squares solution of Ax = b. Show that Ai 2 yi =
b∗ Ai for i = 1, . . . , n.
11. Let A ∈ Fm×n have rank n. Let Q and R be the matrices obtained by Gram-
Schmidt process applied on the columns of A, as in the QR-factorization. If
v = a1 Q 1 + · · · + an Q n is the projection of a vector b ∈ Fn×1 onto R(A),
4.6 Problems 99

and u = [a1 , · · · , an ]∗ , then show the following:


(a) u = Q ∗ b (b) v = Q Q ∗ b (c) Q Q ∗ = A(A∗ A)−1 A∗
12. Let A ∈ F m×n
, P be the projection matrix that projects vectors of Fm×1 onto
R(A), and let Q be the projection matrix that projects vector of Fn×1 onto R(A∗ ).
Show the following:
(a) I − P is the projection matrix that projects vectors of Fm×1 onto N (A∗ ).
(b) I − Q is the projection matrix that projects vectors of Fn×1 onto N (A).
13. Let A ∈ F7×5 have rank 4. Let P and Q be projection matrices that project
vectors from F7×1 onto R(A) and N (A∗ ), respectively. Show that P Q = 0 and
P + Q = I.
14. Let A ∈ Rn×n and let b ∈ Rn×1 be a nonzero vector. Show that if b is orthogonal
to each column of A, then the linear system Ax = b is inconsistent. What are
least squares solutions of Ax = b?
15. Fredholm Alternative : Let A ∈ Fm×n and let b ∈ Fm×1 . Prove that
exactly one of the following is true:
(a) There exists a solution of Ax = b.
(b) There exists y ∈ Fm×1 such that A∗ y = 0 and y ∗ b = 0.
Chapter 5
Eigenvalues and Eigenvectors

5.1 Invariant Line


 
0 1
Let A = . We view A as the linear transformation: A : R2×1 → R2×1 . It
1 0
transforms straight lines to straight lines or points. Does there exist a straight line
which is transformed to itself? We see that
      
x 0 1 x y
A = = .
y 1 0 y x

Thus, the line {[x, x]t : x ∈ R} never moves. So also the line {[x, −x]t : x ∈ R}.
Observe that        
x x x x
A =1 , A = (−1) .
x x −x −x

Let A ∈ Fn×n . A scalar λ ∈ F is called an eigenvalue of A iff there exists a nonzero


vector v ∈ Fn×1 such that Av = λv. Such a nonzero vector v is called an eigenvector
of A for (associated with, corresponding to) the eigenvalue λ.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1
Example 5.1 Let A = ⎣0 1 1⎦ . We see that A ⎣0⎦ = 1 ⎣0⎦ . Therefore, A
0 0 1 0 0
has an eigenvector [1, 0, 0]t associated with the eigenvalue 1. Is [c, 0, 0]t also an
eigenvector associated with the same eigenvalue 1?
What are the other eigenvalues, and what are the corresponding eigenvectors of
A? If v = [a, b, c]t = 0 is an eigenvector for an eigenvalue λ, then the equation
Av = λv implies that

a + b + c = λa, b + c = λb, c = λc.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 101
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_5
102 5 Eigenvalues and Eigenvectors

The last equation says that either λ = 1 or c = 0. If c = 0, then the second equation
implies that either λ = 1 or b = 0. Then the first equation yields either λ = 1 or
a = 0. Now, c = 0, b = 0, a = 0 is not possible, since it would lead to v = 0. In
any case, λ = 1. Then the equations are simplified to

a + b + c = a, b + c = b, c = c.

It implies that b = c = 0. Then v = [a, 0, 0]t for any nonzero a. That is, such a
vector v is an eigenvector for the only eigenvalue 1. 
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 a 1
Example 5.2 Let A = ⎣2 2 2⎦ . We see that A ⎣b ⎦ = (a + b + c) ⎣2⎦ .
3 3 3 c 3
Therefore, when a + b + c = 0, we have 0 as an eigenvalue with eigenvector
[a, b, c]. Verify that the vectors [1, 0, −1]t , [0, 1, −1]t are eigenvectors for the eigen-
value 0.
For a = 1, b = 2, c = 3, we see that an eigenvalue is a + b + c = 6 with a
corresponding eigenvector as [1, 2, 3]t .
Does A have eigenvalues other than 0 and 6? 
Corresponding to an eigenvalue, there are infinitely many eigenvectors.
Exercises for Sect. 5.1
1. Suppose A ∈ Fn×n , λ ∈ F, and b ∈ Fn×1 are such that (A − λI )x = b has a
unique solution. Show that λ is not an eigenvalue of A.
2. Formulate a converse of the statement in Exercise 1 and prove it.
3. Let A ∈ Fn×n . Show that a scalar λ ∈ F is an eigenvalue of A iff the map A − λI :
Fn×1 → Fn×1 is not one-one.
4. Let A ∈ Fn×n . Show that A is invertible iff 0 is not an eigenvalue of A.
5. Let A ∈ Fn×n . If the entries in each column of A add up to a scalar λ, then find
an eigenvector of A.

5.2 The Characteristic Polynomial

Eigenvalues of small matrices can be computed using the following theorem.


Theorem 5.1 Let A ∈ Fn×n . A scalar λ ∈ F is an eigenvalue of A iff
det(A − λI ) = 0.
Proof A scalar λ ∈ F is an eigenvalue of A
iff Av = λv for some v = 0
iff (A − λI )v = 0 for some v = 0
iff the linear system (A − λI )x = 0 has a nonzero solution
iff rank(A − λI ) < n
iff det(A − λI ) = 0. 
5.2 The Characteristic Polynomial 103

The polynomial (−1)n det(A − t I ), which is also equal to det(t I − A), is called
the characteristic polynomial of the matrix A; and it is denoted by χA (t). Each
eigenvalue of A is a zero of the characteristic polynomial of A. Conversely, each
zero of the characteristic polynomial is said to be a complex eigenvalue of A.
If A is a complex matrix of order n, then χA (t) is a polynomial of degree n
in t. The fundamental theorem of algebra states that any polynomial of degree n,
with complex coefficients, can be written as a product of n linear factors each with
complex coefficients. Thus, χA (t) has exactly n, not necessarily distinct, zeros in C.
And these are the eigenvalues (complex eigenvalues) of the matrix A.
If A is a matrix with real entries, some of the zeros of χA (t) may turn out to be
complex numbers with nonzero imaginary parts. Considering A as a linear transfor-
mation on Rn×1 , the scalars are now real numbers. Thus each zero of the characteristic
polynomial may not be an eigenvalue; only the real zeros are.
However, if we regard a real matrix A as a matrix with complex entries, then A
is a linear transformation on Cn×1 . Each complex eigenvalue, that is, a zero of the
characteristic polynomial of A, is an eigenvalue of A.
Due to this obvious advantage, we consider a matrix in Rn×n as one in Cn×n so that
each root of the characteristic polynomial of a matrix is considered an eigenvalue of
the matrix. In this sense, an eigenvalue is taken as a complex eigenvalue, by default.
Observe that when λ is an (a complex) eigenvalue of A ∈ Fn×n , a corresponding
eigenvector x is a vector in Cn×1 , in general.

Example 5.3 Find the eigenvalues and corresponding eigenvectors of the matrix
⎡ ⎤
1 0 0
A=⎣1 1 0 ⎦.
1 1 1

1−t 0 0
Here, χA (t) = (−1)3 det(A − t I ) = − 1 1 − t 0 = (t − 1)3 .
1 1 1−t
The eigenvalues of A are its zeros, that is, 1, 1, 1. To get an eigenvector, we solve
A[a, b, c]t = [a, b, c]t or that

a = a, a + b = b, a + b + c = c.

It gives a = b = 0, and c ∈ F can be arbitrary. Since an eigenvector is nonzero, all


the eigenvectors are given by (0, 0, c)t , for c = 0. 
 
0 1
Example 5.4 For the matrix A = ∈ R2×2 , the characteristic polynomial
−1 0
is χA (t) = (−1)2 det(A − t I ) = t 2 + 1. It has no real zeros. Thus, A has no real
eigenvalues. However, i and −i are its complex eigenvalues. That is, the same matrix
A ∈ C2×2 has eigenvalues as i and −i. The corresponding eigenvectors are obtained
by solving
A[a, b]t = i[a, b]t and A[a, b] = −i[a, b]t .
104 5 Eigenvalues and Eigenvectors

For λ = i, we have b = ia, −a = ib. Thus, [a, ia]t is an eigenvector for a = 0. For
the eigenvalue −i, the eigenvectors are [a, − ia] for a = 0.
We consider A as a matrix with complex entries. With this convention, the matrix
A has (complex) eigenvalues i and −i. 
A polynomial with the coefficient of the highest degree as 1 is called a monic
polynomial. In the characteristic polynomial, the factor (−1)n is multiplied with the
determinant to make the result a monic polynomial. We see that

χA (t) = (−1)n det(A − t I ) = det(t I − A) = t n + an−1 t n−1 + · · · + a1 t + a0

for some scalars a0 , . . . , an−1 . If λ1 , . . . , λn are the complex eigenvalues of A, count-


ing multiplicities, then
χA (t) = (t − λ1 ) · · · (t − λn ).

Exercises for Sect. 5.2


1. Find all eigenvalues and corresponding
⎡ eigenvectors
⎤ of the following matrices:
⎡ ⎤ 1 2 3 4 5
3 0 0 0 ⎢1 2 3 4 5⎥
⎢0 2 0 0 ⎥ ⎢ ⎥

(a) ⎣ ⎥ (b) ⎢1 2 3 4 5⎥
0 0 0 −2⎦ ⎢ ⎥
⎣1 2 3 4 5⎦
0 0 2 0
1 2 3 4 5
2. Find all eigenvalues and their corresponding eigenvectors of the n × n matrix A
whose jth row has each entry j.
3. Let p(t) = t 3 − αt 2 − (α + 3)t − 1 for a real number α. Consider
⎡ ⎤ ⎡ ⎤
α α+3 1 −1 2 −α − 3
A=⎣1 0 0 ⎦ , P = ⎣ 1 −1 α + 2 ⎦ .
0 1 0 −1 1 −α − 1

(a) Compute the characteristic polynomials of A and P −1 A P.


(b) Show that all zeros of p(t) are real regardless of the value of α.

5.3 The Spectrum

If the characteristic polynomial of a 3 × 3 matrix A is χA (t) = (t − 1)2 (t − 2), then


its eigenvalues are 1 and 2. Thus, the set of eigenvalues {1, 2} does not give infor-
mation as to how many times an eigenvalue is a repeated zero of the characteristic
polynomial. In this case, the notion of a multiset comes of help. The sets {1, 1, 2} and
{1, 2, 2} are both equal to the set {1, 2}, but the multiset {1, 1, 2} is different from
the multiset {1, 2, 2}. Again, the multiset {1, 1, 2} is same as the multiset {1, 2, 1}.
That is, the number of times an element occurs in a multiset is significant whereas
the ordering of elements in a multiset is ignored.
5.3 The Spectrum 105

The multiset of all complex eigenvalues of a matrix A ∈ Fn×n is called the spec-
trum of A; and we denote it by σ (A). If A ∈ Fn×n has the (complex) eigenvalues
λ1 , . . . , λn , counting multiplicities, then the spectrum of A is σ (A) = {λ1 , . . . , λn };
and vice versa. Notice that the spectrum of A ∈ Fn×n always has n elements; though
they may not be distinct. The following theorem lists some important facts about
eigenvalues, using the notion of spectrum.
Theorem 5.2 Let A ∈ Fn×n . Then the following are true.
1. σ (At ) = σ (A).
2. If B is similar to A, then σ (B) = σ (A).
3. If A = [ai j ] is a diagonal or an upper triangular or a lower triangular matrix,
then σ (A) = {a11 , . . . , ann }.
4. If σ (A) = {λ1 , . . . , λn }, then det(A) = λ1 · · · λn and tr(A) = λ1 + · · · + λn .
Proof (1) χAt (t) = det(At − t I ) = det((A − t I )t ) = det(A − t I ) = χA (t).
(2) χP −1 A P (t) = det(P −1 A P − t I ) = det(P −1 (A − t I )P) = det(P −1 ) det(A − t I )
det(P) = det(A − t I ) = χA (t).
(3) In all these cases, χA (t) = det(A − t I ) = (a11 − t) · · · (ann − t).
(4) Let σ (A) = {λ1 , . . . , λn }. Then

χA (t) = det(A − t I ) = (λ1 − t) · · · (λn − t). (5.1)

Substituting t = 0, we obtain det(A) = λ1 · · · λn .


For the other equality, we compute the coefficient of t n−1 using (5.1). Expanding
det(A − t I ) in its first row, we find that the first term in the expansion is (a11 −
t)det(A11 ), where A11 is the minor corresponding to the (1, 1)th entry. All other
terms are polynomials of degree less than or equal to n − 2.
Next, we expand det(A11 ); and continue in a similar way to obtain the following:
Coefficient of t n−1 in det(A − t I )
= Coefficient of t n−1 in (a11 − t) × det(A11 )
= · · · = Coefficient of t n−1 in (a11 − t) × (a22 − t) × · · · × (ann − t)
= (−1)n−1 (a11 + · · · + ann ).
= Coefficient of t n−1 in (λ1 − t) · · · (λn − t), using (5.1)
= (−1)n−1 (λ1 + · · · + λn ).
Therefore, λ1 + · · · + λn = a11 + · · · + ann = tr(A). 
Theorem 5.3 Eigenvectors associated with distinct eigenvalues of a matrix are lin-
early independent.
Proof Let λ1 , . . . , λk be the distinct eigenvalues of a matrix A, and let v1 , . . . , vk be
corresponding eigenvectors. We use induction on j ∈ {1, . . . , k}.
For j = 1, since v1 = 0, {v1 } is linearly independent. Lay out the induction
hypothesis (for j = m): the set {v1 , . . . , vm } is linearly independent. To show that
(for j = m + 1): the set {v1 , . . . , vm , vm+1 } is linearly independent, let

α1 v1 + α2 v2 + · · · + αm vm + αm+1 vm+1 = 0. (5.2)


106 5 Eigenvalues and Eigenvectors

Then, A(α1 v1 + α2 v2 + · · · + αm vm + αm+1 vm+1 ) = 0 gives (since Avi = λi vi )

α1 λ1 v1 + α2 λ2 v2 + · · · + αm λm vm + αm+1 λm+1 vm+1 = 0.

Multiply (5.2) with λm+1 and subtract from the last equation to get

α1 (λ1 − λm+1 )v1 + · · · + αm (λm − λm+1 )vm = 0.

By the Induction Hypothesis, αi (λi − λm+1 ) = 0 for i = 1, 2, . . . , m. Since λi =


λm+1 , αi = 0 for each such i. Then, (5.2) yields αm+1 vm+1 = 0. Then vm+1 = 0
implies αm+1 = 0. This completes the proof of linear independence of eigenvectors
associated with distinct eigenvalues. 

Exercises for Sect. 5.3


1. Let A be a 4 × 4 non-invertible matrix with all diagonal entries 1. If 2 + 3i is an
eigenvalue of A, what are its other eigenvalues?
2. Show that if each eigenvalue of A ∈ Fn×n has absolute value less than 1, then
both I − A and I + A are invertible.
3. Show that if rank of an n × n matrix is 1, then its trace is one of its eigenvalues.
What are its other eigenvalues?
4. Find the spectrum of a matrix where the sum of entries in each row is same.
5. Let A, B, P ∈ Cn×n be such that B = P −1 A P. Let λ be an eigenvalue of A.
Show that a vector v is an eigenvector of B corresponding to the eigenvalue λ iff
Pv is an eigenvector of A corresponding to the same eigenvalue λ.
6. Do equivalent matrices have same eigenvalues?

5.4 Special Types of Matrices

A square matrix A is called hermitian iff A∗ = A; skew hermitian iff A∗ = −A;


symmetric iff At = A; and skew symmetric iff At = −A.
Thus, a real symmetric matrix is real hermitian; and a real skew symmetric matrix
is real skew hermitian. Hermitian matrices are also called self-adjoint matrices. In
the following, B is symmetric, C is skew symmetric, D is hermitian, and E is skew
hermitian; B is also hermitian and C is also skew hermitian.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 0 2 −3 1 −2i 3 0 2+i 3
B = ⎣2 3 4⎦ , C = ⎣−2 0 4⎦ , D = ⎣2i 3 4⎦ , E = ⎣2 − i i 4i ⎦ .
3 4 5 3 −4 0 3 4 5 3 −4i 0

Notice that all diagonal entries of a hermitian matrix are real since a ii = aii .
Similarly, each diagonal entry of a skew symmetric matrix must be zero, since aii =
−aii . And each diagonal entry of a skew hermitian matrix must be 0 or purely
imaginary, as a ii = −aii implies 2Re(aii ) = 0.
5.4 Special Types of Matrices 107

Let A be a square matrix. Since A + At is symmetric and A − At is skew sym-


metric, every square matrix can be written as a sum of a symmetric matrix and a
skew symmetric matrix:

A = 21 (A + At ) + 21 (A − At ).

Similar rewriting is possible with hermitian and skew hermitian matrices:

A = 21 (A + A∗ ) + 21 (A − A∗ ).

A square matrix A is called unitary iff A∗ A = I = A A∗ . In addition, if A is real,


then it is called an orthogonal matrix. That is, an orthogonal matrix is a matrix with
real entries satisfying At A = I = A At .
Thus, a square matrix is unitary iff it is invertible and its inverse is equal to its
adjoint. Similarly, a real matrix is orthogonal iff it is invertible and its inverse is its
transpose. In the following, B is a unitary matrix of order 2, and C is an orthogonal
matrix (also unitary) of order 3:
⎡ ⎤
  2 1 2
1 1+i 1−i 1⎣
B= , C= −2 2 1⎦ .
2 1−i 1+i 3 1 2 −2

Example 5.5 For any θ ∈ R, the following are orthogonal matrices of order 2:
   
cos θ − sin θ cos θ sin θ
O1 := , O2 := .
sin θ cos θ sin θ − cos θ

Let u be the vector in the plane that starts at the origin and ends at the point (a, b).
Writing the point (a, b) as a column vector [a, b]t , we see that
       
a a cos θ − b sin θ a a cos θ + b sin θ
O1 = , O2 = .
b a sin θ + b cos θ b a sin θ − b cos θ

Thus, O1 [a, b]t is the end-point of the vector obtained by rotating the vector u by
an angle θ. Similarly, O2 [a, b]t is a point obtained by reflecting u along a straight
line that makes an angle θ/2 with the x-axis. Accordingly, O1 is called a rotation by
an angle θ, and O2 is called a reflection along a line making an angle of θ/2 with
the x-axis.
If A = [ai j ] is an orthogonal matrix of order 2, then At A = I implies
2
a11 + a21
2
= 1 = a12
2
+ a22
2
, a11 a12 + a21 a22 = 0.

Then there exist α, β ∈ R such that a11 = cos α, a21 = sin α, a12 = cos β, a22 =
sin β and cos(α − β) = 0. Thus α − β = ±(π/2). It then follows that A is in either
108 5 Eigenvalues and Eigenvectors

of the forms O1 or O2 . That is, an orthogonal matrix of order 2 is either a rotation


or a reflection. 

Theorem 5.4 Let A be a unitary or an orthogonal matrix of order n. Then the


following are true:
1. For each pair of vectors x, y ∈ Fn×1 , Ax, Ay = x, y.
2. For each x ∈ Fn×1 , Ax = x.
3. The columns of A are orthonormal.
4. The rows of A are orthonormal.
5. |det(A)| = 1.

Proof (1) Ax, Ay = (Ay)∗ Ax = y ∗ A∗ Ax = y ∗ x = x, y.


(2) Take x = y in (1).
(3) Let the columns of A be u 1 , . . . , u n in that order. Then u j , u i  = u i∗ u j is the
(i, j)th entry of A∗ A. Since A∗ A = I, u j , u i  = δi j . Therefore, {u 1 , . . . , u n } is an
orthonormal set.
(4) Considering A A∗ = I, we get this result.
(5) Notice that det(A∗ ) = det(A) = det(A). Thus

det(A∗ A) = det(A∗ )det(A) = det(A)det(A) = |det(A)|2 .

However, det(A∗ A) = det(I ) = 1. Therefore, |det(A)| = 1. 

Theorem 5.4(1)–(2) show that unitary and orthogonal matrices preserve the inner
product and the norm. Thus they are also called isometries. The
condition A∗ A = I is equivalent to the condition that the columns of A are orthonor-
mal. Similarly, the rows of A are orthonormal iff A A∗ = I.
Theorem 5.4 implies that the determinant of an orthogonal matrix is either 1
or −1. It follows that the product of all eigenvalues, counting multiplicities, of an
orthogonal matrix is ±1.
The determinant of a hermitian matrix is a real number since A = A∗ implies

det(A) = det(A∗ ) = det(A) = det(A).

We prove some interesting facts about the eigenvalues and eigenvectors of these
special types of matrices.

Theorem 5.5 Let A ∈ Fn×n and let λ be any eigenvalue of A.


1. If A is hermitian, then λ ∈ R; and eigenvectors corresponding to distinct eigen-
values of A are orthogonal.
2. If A is real symmetric, then λ ∈ R; and there exists a real eigenvector corre-
sponding to λ.
3. If A is skew hermitian or real skew symmetric, then λ is purely imaginary or
zero.
4. If A is unitary or orthogonal, then |λ| = 1.
5.4 Special Types of Matrices 109

Proof (1) Let A be hermitian; that is, A∗ = A. Let v ∈ Cn×1 be an eigenvector


corresponding to the eigenvalue λ of A. Then

λv∗ v = v∗ (λv) = v∗ Av = v∗ A∗ v = (Av)∗ v = (λv)∗ v = λv∗ v.

Since v = 0, we have v∗ v = 0; consequently, λ = λ. Therefore, λ ∈ R.


For the second conclusion, let λ = μ be two distinct eigenvalues of A with cor-
responding eigenvectors as x and y. Both λ ∈ R and μ ∈ R. Then

λy ∗ x = y ∗ λx = y ∗ Ax = y ∗ A∗ x = (Ay)∗ x = (μ y)∗ x = μ y ∗ x = μ y ∗ x.

Hence (λ − μ)y ∗ x = 0. As λ = μ, we conclude that y ∗ x = 0. That is, x ⊥ y.


(2) Let A be real symmetric. Then λ ∈ R follows from (1). For the second statement,
let v = x + i y ∈ Cn×1 be an eigenvector of A corresponding to λ, with x, y ∈ Rn×1 .
Then
A(x + i y) = λ(x + i y).

Comparing the real and imaginary parts, we have

Ax = λx, Ay = λy.

Since x + i y = 0, at least one of x or y is nonzero. One such nonzero vector is a


real eigenvector corresponding to the (real) eigenvalue λ of A.
(3) Let A be skew hermitian. That is, A∗ = −A. Let v ∈ Cn×1 be an eigenvector
corresponding to the eigenvalue λ of A. Then

λv∗ v = v∗ (λv) = v∗ Av = −v∗ A∗ v = −(Av)∗ v = −(λv)∗ v = −λv∗ v.

Since v∗ v = 0, λ = −λ. That is, 2Re(λ) = 0. This shows that λ is purely imaginary
or zero.
When A is real skew symmetric, we take transpose instead of adjoint
everywhere in the above proof.
(4) Suppose A is unitary, i.e., A∗ A = I. Let v be an eigenvector corresponding to
the eigenvalue λ. Now, Av = λv implies v∗ A∗ = (λv)∗ = λv∗ . Then

v∗ v = v∗ I v = v∗ A∗ Av = λλv∗ v = |λ|2 v∗ v.

Since v∗ v = 0, |λ| = 1.
Replacing A∗ with At in the above proof yields the same conclusion when A is
an orthogonal matrix. 

Existence of a real eigenvector corresponding to a hermitian matrix cannot be


guaranteed, in general.
110 5 Eigenvalues and Eigenvectors
 
1 1−i
Example 5.6 Consider the matrix A = . It is a hermitian matrix. Its
1 + i −1

characteristic polynomial is χA (t) = t 2 − 3. Its eigenvalues
√ are ± 3. To compute
an eigenvector
√ corresponding to the eigenvalue 3, we set up the linear system
(A − 3I )[a, b]t = 0. That is,
√ √
(1 − 3)a + (1 − i)b = 0, (1 + i)a − (1 + 3)b = 0.

This equation
√ has a nonzero solution a = 1 − i, b = 3 − 1. So,√ the vector v =
[1 − i, 3 − 1]t is an eigenvector corresponding to the eigenvalue 3.
If both a, b ∈ R, then comparing the imaginary parts in the equations, we get
−b = 0 and a = 0. But that is impossible
√ since v = 0. Hence, there does not exist a
real eigenvector for the eigenvalue 3 of A. 

Similarly, a real skew symmetric matrix can have purely imaginary eigenvalues.
The only real eigenvalue of a real skew symmetric matrix is 0. Also, an orthogonal
matrix can have complex eigenvalues so that each eigenvalue need not be ±1. For
instance, the matrix  
0 −1
1 0

has eigenvalues ±i. However, all real eigenvalues of an orthogonal matrix are ±1;
and any eigenvalue of an orthogonal matrix is of the form eiθ for θ ∈ R.
Exercises for Sect. 5.4
1. Let A ∈ Fn×n . Show that for all x, y ∈ Fn×1 , Ax, Ay = x, y iff for all x ∈
Fn×1 , Ax = x.
2. Construct a 3 × 3 hermitian matrix with no zero entries whose eigenvalues are
1, 2 and 3.
3. Construct a 2 × 2 real skew symmetric matrix whose eigenvalues are purely imag-
inary.
4. Show that if an invertible matrix is real symmetric, then so is its inverse.
5. Show that if an invertible matrix is hermitian, then so is its inverse.
6. Construct an orthogonal 2 × 2 matrix whose determinant is 1.
7. Construct an orthogonal 2 × 2 matrix whose determinant is −1.

5.5 Problems

1. Let a + bi be an eigenvalue of A ∈ Rn×n with an associated eigenvector x + i y,


where a, b ∈ R and x, y ∈ Rn×1 . Show the following:
(a) The vectors x + i y and x − i y are linearly independent in Cn×1 .
(b) The vectors x and y are linearly independent in Rn×1 .
5.5 Problems 111


2. Let A = [ai j ] ∈ R2×2 , where a12 a21 > 0. Write B = diag a21 /a12 , 1 . Let
C = B −1 AB. What do you conclude about the eigenvalues and eigenvectors of
C; and those of A?
3. Show that if λ is a nonzero eigenvalue of an n × n matrix A, then N (A − λI ) ⊆
R(A).
4. An n × n matrix A is said to be idempotent if A2 = A. Show that the only
possible eigenvalues of an idempotent matrix are 0 or 1.
5. An n × n matrix A is said to be nilpotent if Ak = 0 for some natural number k.
Show that 0 is the only eigenvalue of a nilpotent matrix.
6. Let A ∈ Fn×n have an eigenvalue λ with an associated eigenvector x. Suppose
A∗ has an eigenvalue μ associated with y. If λ = μ, then show that x ⊥ y.
7. Let A, B ∈ Fn×n . Show that
     
I A 0 0 I −A AB 0
= .
0 I B BA 0 I B 0

Then deduce that χAB (t) = χB A (t).


8. Let a0 , a1 , . . . , an−1 ∈ F. Let p(t) = t n − an−1 t n−1 − · · · − a0 be a polyno-
mial. Construct the matrix
⎡ ⎤
an−1 an−2 · · · a1 a0
⎢ 1 0 ··· 0 0 ⎥
⎢ ⎥
⎢ 0 1 ··· 0 0 ⎥
C =⎢ ⎥.
⎢ .. ⎥
⎣ . ⎦
0 0 ··· 1 0

The matrix C is called the companion matrix of the polynomial p(t). Show the
following:
(a) χC (t) = p(t).
(b) If p(λ) = 0 for some λ ∈ F, then λ is an eigenvalue of the matrix C with an
associated eigenvector [λn−1 , λn−2 , · · · , λ, 1]t .
9. Let A ∈ Fn×n and let B = I − 2 A + A2 . Show the following:
(a) If 1 is an eigenvalue of A, then B is not invertible.
(b) If v is an eigenvector of A, then v is an eigenvector of B. Are the corresponding
eigenvalues equal?
10. For which scalars α, are the n × n matrices A and A + α I similar?
11. Show that there do not exist n × n matrices A and B with AB − B A = I.
12. Let A, B ∈ Fn×n . Show that if 1 is not an eigenvalue of A, then the matrix
equation AX + B = X has a unique solution in Fn×n .
13. Let A and B be hermitian matrices. Show that AB is hermitian iff AB = B A.
14. Let A and B be hermitian matrices. Determine whether the matrices A +
B, AB A, AB + B A, and AB − B A are hermitian or not.
112 5 Eigenvalues and Eigenvectors

15. Let A be a non-hermitian matrix. Determine whether the matrices A + A∗ , A −


A∗ , A∗ A − A A∗ , (I + A)(A + A∗ ), and (I + A)(I − A∗ ) are hermitian or
not.
16. Show that a square matrix A is invertible iff det(A∗ A) > 0.
17. Let A be a skew hermitian matrix. Show that I − A is invertible.
18. Let A ∈ Fm×n have rank n. Let P = A(A∗ A)−1 A∗ . Show the following:
(a) A∗ A is invertible so that the construction of P is meaningful.
(b) P is hermitian.
(c) P y = y for each y ∈ R(A).
(d) P k y = P y for each y ∈ Fm×1 and for each k ∈ N.
(e) Let y ∈ Fm×1 satisfy y ⊥ Ax for each x ∈ Fn×1 . Then P y = 0.
19. Let B ∈ Fn×n . Suppose that I + B is invertible. Let C = (I + B)−1 (I − B).
Show the following:
(a) If B is skew hermitian, then C is unitary.
(b) If B is unitary, then C is skew hermitian.
20. Show that in the plane,
(a) a rotation following a rotation is a rotation.
(b) a rotation following a reflection is a reflection.
(c) a reflection following a rotation is a reflection.
(d) a reflection following a reflection is a rotation.
21. Show that the eigenvalues of a reflection in the plane are ±1. What could be the
eigenvalues of a rotation?
22. Let A ∈ R2×2 be an orthogonal matrix. Suppose that A has a non-trivial fixed
point; that is, there exists a nonzero vector v ∈ R2×1 such that Av = v. Show
that with respect to any orthonormal basis B of R2×1 , the coordinate matrix
[A] B,B is in the form  
cos θ sin θ
.
sin θ − cos θ

23. Show that 1 is an eigenvalue of any 3 × 3 orthogonal matrix.


24. Using the previous problem,  show
 that an orthogonal matrix of order 3 is similar
1 0
to a matrix of the form , where R is either a reflection or a rotation in
0 R
the plane.
25. Let A be an orthogonal matrix of order 3. Show that either σ (A) = {1, 1, 1} or
σ (A) = {1, eiθ , e−iθ } for some θ ∈ R.
26. Show that the ith column of an orthogonal upper triangular matrix is either ei
or −ei .
27. Let A be an orthogonal matrix with an eigenvalue 1 and a corresponding eigen-
vector v. Show that v is also an eigenvector of At .
28. Let u 1 , . . . , u n be the columns of an orthogonal matrix of order n. Show that
u 1 u t1 + · · · + u n u tn = I.
5.5 Problems 113

29. Let {v1 , . . . , vn } be an orthonormal set in Fn×1 . Let λ1 , . . . , λn ∈ R. Show that


the matrix A = λ1 v1 v1∗ + · · · + λn vn vn∗ is hermitian. Can you find eigenvalues
and eigenvectors of A?
30. A permutation matrix of order n is an n × n matrix obtained from the identity
matrix In by reordering its rows. Show the following:
(a) A permutation matrix may be obtained by reordering the columns of the
identity matrix.
(b) A permutation matrix is a product of elementary matrices of the first type.
(c) Let P be a permutation matrix of order m whose ith row is the ki th row of
Im . Let A be an m × n matrix. Then P A is the m × n matrix whose ith row
is the ki th row of A.
(d) Let Q be a permutation matrix of order n whose jth column is the k j th column
of In . Let A be an m × n matrix. Then AQ is the m × n matrix whose jth
column is the k j th column of A.
(e) A permutation matrix is orthogonal.
(f) If a permutation matrix P is symmetric, then P 2k = I and P 2k+1 = P for
k = 0, 1, 2, . . ..
Chapter 6
Canonical Forms

6.1 Schur Triangularization

Eigenvalues and eigenvectors can be used to bring a matrix to nice forms using
similarity transformations. A very general result in this direction is Schur’s unitary
triangularization. It says that using a suitable similarity transformation, we can rep-
resent a square matrix by an upper triangular matrix. Thus, the diagonal entries of
the upper triangular matrix must be the eigenvalues of the given matrix.

Theorem 6.1 (Schur Triangularization) Let A ∈ Cn×n . Then there exists a unitary
matrix P ∈ Cn×n such that P ∗ A P is upper triangular. Moreover, if A ∈ Rn×n and
all eigenvalues of A are real, then P can be chosen to be an orthogonal matrix.

Proof We use induction on n. If n = 1, then clearly A is an upper triangular matrix,


and we take P = [1], the identity matrix with a single entry as 1, which is both
unitary and orthogonal.
Assume that for all B ∈ Cm×m , m ≥ 1, we have a unitary matrix Q ∈ Cm×m such
that Q ∗ B Q is upper triangular. Let A ∈ C(m+1)×(m+1) . Let λ ∈ C be an eigenvalue of
A with an associated eigenvector u. Recall that w, z = z ∗ w defines an inner product
on C(m+1)×1 . Then the unit vector v = u/u is an eigenvector of A associated with
the eigenvalue λ.
Extend the set {v} to an orthonormal (ordered) basis E = {v, v1 , . . . , vm } for
C(m+1)×1 . Here, you may have to use an extension of a basis, and then Gram–Schmidt
orthonormalization process. Next, construct the matrix R ∈ C(m+1)×(m+1) by taking
these basis vectors as its columns, in that order. That is, let
 
R = v v1 · · · vm .

Since E is an orthonormal set, R is unitary. Consider the coordinate matrix of


A with respect to the basis E. It is given by [A] E,E = R −1 A R = R ∗ A R. The first
column of R ∗ A R is
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 115
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_6
116 6 Canonical Forms

R ∗ A Re1 = R ∗ Av = R −1 λv = λR −1 v = λR −1 Re1 = λe1 ,

where e1 ∈ C(m+1)×1 has first component as 1 and all other components 0. Then
R ∗ A R can be written in the following block form:
 
∗ λ x
R AR = ,
0 C

where 0 ∈ Cm×1 , C ∈ Cm×m , and x = [v∗ Av1 v∗ Av2 · · · v∗ Avm ] ∈ C1×m .


Notice that if m = 1, the proof is complete. For m > 1, by induction hypothesis,
we have a matrix S ∈ Cm×m such that S ∗ C S is upper triangular. Then take
 
1 0
P=R .
0 S

Since R and S are unitary, P ∗ P = P P ∗ = I ;, that is, P is unitary. Moreover,


 ∗        
1 0 1 0 1 0 λ x 1 0 λ xS
P∗ A P = R∗ A R = = .
0 S 0 S 0 S∗ 0 C 0 S 0 S∗C S

Of course, x S ∈ C1×m . Since S ∗ C S is upper triangular, the induction proof is com-


plete.
When A ∈ Rn×n , and all eigenvalues of A are real, we use the transpose instead of
the adjoint everywhere in the above proof. Thus, P can be chosen to be an orthogonal
matrix. 
To eradicate possible misunderstanding, we recall that A has only real eigenvalues
means that when we consider this A as a matrix in Cn×n , all its complex eigenvalues
turn out to be real numbers. This again means that all zeros of the characteristic
polynomial of A are real.
We may unfold the inductive proof of Schur’s triangularization as follows. Once
we obtain a matrix similar to A in the form
 
λ w
,
0 S∗C S

we look for whether λ is still an eigenvalue of S ∗ C S. If so, we choose this eigenvalue


over others for further reduction. In the next step, we obtain a matrix similar to A in
the form ⎡ ⎤
λ w y
⎣0 λ z ⎦
0 0 M

where M is an (n − 2) × (n − 2) matrix. Continuing further this way, we see that a


Schur triangularization of A exists, where on the diagonal of the
final upper triangular matrix, equal eigenvalues occur together. This particular form
6.1 Schur Triangularization 117

will be helpful later. In general, the eigenvalues can occur on the diagonal of the
Schur form in any prescribed order, depending on our choice of eigenvalue in each
step.
⎡ ⎤
2 1 0
Example 6.1 Consider the matrix A = ⎣ 2 3 0⎦ for Schur triangularization.
−1 −1 1
We find that χA (t) = (t − 1)2 (t − 4). All eigenvalues of A are real; thus, there
exists an orthogonal matrix P such that P t A P is upper triangular. To determine such
a matrix P, we take one of the eigenvalues, say 1. An associated eigenvector of norm
1 is v = [0, 0, 1]t . We extend {v} to an orthonormal basis for C3×1 . For convenience,
we take the orthonormal basis as

{[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }.

Taking the basis vectors as columns, we form the matrix R as


⎡ ⎤
0 1 0
R = ⎣0 0 1⎦ .
1 0 0

We then find that ⎡ ⎤


1 −1 −1
R t A R = ⎣0 2 1⎦ .
0 2 3
 
2 1
Next, we triangularize the matrix C = . It has eigenvalues 1 and 4. The
2 3
√ √
eigenvector of norm 1 associated with the eigenvalue 1 is [1/ 2, −1/ 2]t . We extend
it to an orthonormal basis
√ √ √ √
[1/ 2, −1/ 2]t , [1/ 2, 1/ 2]t

for C2×1 . Then we construct the matrix S by taking these basis vectors as its columns,
that is,  √ √ 
1/ 2 1/ 2
S = −1 √ √ .
/ 2 1/ 2

1 −1
We find that S C S =
t
, which is an upper triangular matrix. Then
0 4
⎡ ⎤⎡ ⎤ ⎡ √ √ ⎤
  0 1 0 1 0 0 0 1/ 2 1/ 2
1 0 √ √ √ √
P=R = ⎣0 0 1⎦ ⎣0 1/ 2 1/ 2⎦ = ⎣0 −1/ 2 1/ 2⎦ .
0 S √
−1/ 2

1/ 2
1 0 0 0 1 0 0
118 6 Canonical Forms
⎡ √ ⎤
1 0 − 2
Consequently, P t A P = ⎣0 1 −1 ⎦ . 
0 0 4

Since P ∗ = P −1 , Schur triangularization is informally stated as follows:


Every square matrix is unitarily similar to an upper triangular matrix.
Further, there is nothing sacred about being upper triangular. For, given a matrix
A ∈ Cn×n , consider using Schur triangularization of A∗ . There exists a unitary matrix
Q such that Q ∗ A∗ Q is upper triangular. Then taking adjoint, we have Q ∗ AQ is lower
triangular. Thus, the following holds:

Every square matrix is unitarily similar to a lower triangular matrix.

Analogously, a real square matrix having only real eigenvalues is also orthogonally
similar to a lower triangular matrix. We remark that the lower triangular form of a
matrix need not be the transpose or the adjoint of its upper triangular form.
Moreover, neither the unitary matrix P nor the upper triangular matrix P ∗ A P in
Schur triangularization is unique. That is, there can be unitary matrices P and Q such
that both P ∗ A P and Q ∗ AQ are upper triangular, and P = Q, P ∗ A P = Q ∗ AQ. The
non-uniqueness stems from the choice of eigenvalues, their associated eigenvectors,
and in extending those to an orthonormal basis. For instance, in Example 6.1, if you
extend {[0, 0, 1]t } to the orthonormal basis {[0, 0, 1]t , [0, 1, 0]t , [1, 0, 0]t }, then you
end up with (Verify.)
⎡ √ √ ⎤ ⎡ √ ⎤
0 −1/ 2 1/ 2 1 0 − 2
√ √
P = ⎣0 1/ 2 1/ 2⎦ , P t A P = ⎣0 1 1 ⎦.
1 0 0 0 0 4

Exercises for Sect. 6.1


     
1 2 −1 3 0 3
1. Let A = , B= and C = . Show that both B and
−1 −2 0 0 0 −1
C are Schur
 triangularizations
  of
 A.  
1 2 0 1 0 2
2. Let A = , B= and C = . Show that B is a Schur tri-
3 6 0 7 0 7
angularization of A but C is not. ⎡ ⎤
4 0 0
3. Determine a matrix A such that A∗ A = ⎣ 0 1 i ⎦ .
0 −i 1
4. Using Schur triangularization prove the Spectral mapping theorem: for any poly-
nomial p(t), σ ( p(A)) = p(σ (A)). In other words, for any polynomial p(t), if
λ is an eigenvalue of A, then p(λ) is an eigenvalue of p(A); and all (complex)
eigenvalues of p(A) are of this form.
6.2 Annihilating Polynomials 119

6.2 Annihilating Polynomials

Schur triangularization brings a square matrix to an upper triangular form by a


similarity transformation. For an upper triangular matrix A ∈ Fn×n with diagonal
entries d1 , . . . , dn , we wish to compute the product

(A − d1 I )(A − d2 I ) · · · (A − dn I ).
⎡ ⎤
d1  
For example, let A = ⎣ 0 d2  ⎦ , where  stands for any entry possibly nonzero.
0 0 d3
We see that
⎡ ⎤
0  
A − d1 I = ⎣0  ⎦
0 0 
⎡ ⎤⎡ ⎤ ⎡ ⎤
0      0 0 
(A − d1 I )(A − d2 I ) = ⎣0  ⎦ ⎣0 0 ⎦ = ⎣0 0 ⎦
0 0  0 0  0 0 
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0     0 0 0
(A − d1 I )(A − d2 I )(A − d3 I ) = ⎣0 0 ⎦ ⎣0  ⎦ = ⎣0 0 0⎦ .
0 0  0 0 0 0 0 0

What do you observe?

Theorem 6.2 Let A ∈ Fn×n be an upper triangular matrix with diagonal entries
d1 , . . . , dn , in that order. Let 1 ≤ k ≤ n. Then the first k columns of the product
(A − d1 I ) · · · (A − dk I ) are zero columns.

Proof For k = 1, we see that the first column of A − d1 I is a zero column. It means
(A − d1 I )e1 = 0, where e1 , . . . , en are the standard basis vectors for Fn×1 . Assume
that the result is true for k = m < n. That is,

(A − d1 I ) · · · (A − dm I )e1 = 0, . . . , (A − d1 I ) · · · (A − dm I )em = 0.

Notice that

(A − d1 I ) · · · (A − dm I )(A − dm+1 I ) = (A − dm+1 I )(A − d1 I ) · · · (A − dm I ).

It then follows that for 1 ≤ j ≤ m,

(A − d1 I ) · · · (A − dm I )(A − dm+1 I )e j
= (A − dm+1 I )(A − d1 I ) · · · (A − dm I )e j = 0.
120 6 Canonical Forms

Next, (A − dm+1 I )em+1 is the (m + 1)th column of A − dm+1 I, which has all
zero entries beyond the first m entries. That is, there are scalars α1 , . . . , αm such that
(A − dm+1 I )em+1 = α1 e1 + · · · + αm em . Then

(A − d1 I ) · · · (A − dm I )(A − dm+1 I )em+1


= (A − d1 I ) · · · (A − dm I )(α1 e1 + · · · + αm em ) = 0.

Therefore, the first m + 1 columns of (A − d1 I ) · · · (A − dm I )(A − dm+1 I ) are


zero columns. By induction, the proof is complete. 

In what follows, we will be substituting the variable t by a square matrix in a


polynomial q(t). In such a substitution, we will interpret the constant term a0 as a0 I.
For instance,
if A ∈ Fn×n and q(t) = 2 + t + 5t 2 , then q(A) = 2In + A + 5A2 .
If q(A) turns out to be the zero matrix, then we say that the polynomial q(t)
annihilates the matrix A; we also say that q(t) is an annihilating polynomial of
A; and that A is annihilated by q(t). The zero polynomial annihilates every matrix.
Does every matrix have an annihilating nonzero polynomial?

Theorem 6.3 (Cayley–Hamilton) Each square matrix is annihilated by its charac-


teristic polynomial.

Proof Let A ∈ Fn×n . We consider A as a matrix in Cn×n . By Schur triangularization,


there exist a unitary matrix P ∈ Cn×n and an upper triangular matrix U ∈ Cn×n such
that P ∗ A P = U. That is, A = PU P ∗ . Then, for all scalars α and β, we have

α I + β A = α I + β PU P ∗ = P(α I )P ∗ + P(βU )P ∗ = P(α I + βU )P ∗ .

And A2 = PU P ∗ PU P ∗ = PU 2 P ∗ . By induction, it follows that for any polynomial


q(t), q(A) = P q(U ) P ∗ . In particular,

χA (A) = P χA (U ) P ∗ .

Suppose the diagonal entries of U are λ1 , . . . , λn . Since A and U are similar


matrices,
χA (t) = χU (t) = (t − λ1 ) · · · (t − λn ).

Theorem 6.2 implies that the n columns of (U − λ1 I ) · · · (U − λn I ) are zero


columns. That is, χA (U ) = 0. Therefore, χA (A) = P χA (U ) P ∗ = 0. 

Cayley–Hamilton theorem helps us in computing powers of matrices and also the


inverse of a matrix if it exists. Suppose A ∈ Fn×n has the characteristic polynomial

χA (t) = t n + an−1 t n−1 + · · · + a1 t + a0 .


6.2 Annihilating Polynomials 121

By Cayley–Hamilton theorem, An + an−1 An−1 + · · · + a1 A + a0 I = 0. Then

An = −(a0 I + a1 A + · · · + an−1 An−1 ).

Thereby, computation of An , An+1 , . . . can be reduced to that of A, . . . , An−1 .


Next, suppose that A is invertible. Then det(A) = 0. Since det(A) is the product
of all eigenvalues of A, λ = 0 is not an eigenvalue of A. It implies that t, which
is equal to (t − λ) for λ = 0, is not a factor of the characteristic polynomial of A.
Therefore, the constant term a0 in the characteristic polynomial of A is nonzero. By
Cayley–Hamilton theorem,

a0 I = −(a1 A + a2 A2 + · · · + an−1 An−1 + An ).

Multiplying A−1 and simplifying, we obtain

1
A−1 = − a1 I + a2 A + · · · + an−1 An−2 + An−1 .
a0

This way, A−1 can also be computed from A, A2 , . . . , An−1 .


⎡ ⎤
1 0 1
Example 6.2 Let A = ⎣0 1 1⎦ . Its characteristic polynomial is χA (t) = (t − 1)3 .
0 0 1
Cayley–Hamilton theorem says that (A − I )3 = 0. By direct computation, we find
that ⎡ ⎤2 ⎡ ⎤
0 0 1 0 0 0
(A − I ) = ⎣0 0 1⎦ = ⎣0 0 0⎦ .
2

0 0 0 0 0 0

Thus, it is not necessary that the degree of an annihilating polynomial of A has to


be the order of A. 
Let A ∈ Fn×n . A monic polynomial of least degree that annihilates A is called the
minimal polynomial of A.
In Example 6.2, no polynomial of degree 1 annihilates A. But the monic polyno-
mial (t − 1)2 annihilates it. Therefore, (t − 1)2 is the minimal polynomial of A.
Observe that the degree of a minimal polynomial is unique. Moreover, if q(t) is
a minimal polynomial of A, and q(t) has degree k, then no monic polynomial of
degree less than k annihilates A. Therefore, no nonzero polynomial of degree less
than k annihilates A. In other words,
if a polynomial of degree less than the degree of the minimal polynomial of A annihilates
A, then this polynomial must be the zero polynomial.

Cayley–Hamilton theorem implies that the degree of the minimal polynomial


does not exceed the order of the matrix. However, any multiple of an annihilat-
ing polynomial also annihilates the given matrix. For instance, the polynomials
122 6 Canonical Forms

10(t − 1)2 , t (t − 1)2 and (1 + 2t + t 2 + 3t 3 )(t − 1)2 annihilate the matrix A in


Example 6.2. Is the characteristic polynomial one among the multiples of the mini-
mal polynomial?
Theorem 6.4 The minimal polynomial of a square matrix is unique; and it divides all
annihilating polynomials of the matrix.
Proof Let A ∈ Fn×n . Since the degree of a minimal polynomial is unique, let p(t) =
t k + p1 (t) and q(t) = t k + q1 (t) be two minimal polynomials of A, where p1 (t) and
q1 (t) are polynomials of degree less than k. Then p(A) − q(A) = p1 (A) − q1 (A) =
0.
The polynomial r (t) = p1 (t) − q1 (t) has degree less than k and it annihilates A.
Thus, r (t) is the zero polynomial. In that case, p(t) = q(t). This proves that the
minimal polynomial of A is unique.
For the second statement, let q(t) be the minimal polynomial of A. Let p(t) be
an annihilating polynomial of A. The degree of p(t) is at least as large as that of
q(t). So, suppose the degree of q(t) is k and the degree of p(t) is m ≥ k. By the
division algorithm, we have p(t) = s(t)q(t) + r (t), where s(t) is a polynomial of
degree m − k, and r (t) is a polynomial of degree less than k. Then

0 = p(A) = s(A) q(A) + r (A) = 0 + r (A) = r (A).

That is, r (t) is a polynomial of degree less than k that annihilates A. Hence, r (t)
is the zero polynomial. Therefore, q(t) divides p(t). 
Theorem 6.3 implies that
the minimal polynomial of a matrix divides its characteristic polynomial.
This fact is sometimes mentioned as the Cayley–Hamilton theorem.
Exercises for Sect. 6.2
1. Compute A−1 and A10 where A is the matrix of Example 6.2.
2. Use the formula (A − t I ) adj(A − t I ) = det(A − t I )I to give another proof of
Cayley–Hamilton theorem. [Hint: For matrices B0 , . . . , Bn−1 , write adj(A −
t I ) = B0 + B1 t + · · · + Bn−1 t n−1 . ]
3. Let A ∈ Fn×n . Let p(t) be a monic polynomial of degree n. If p(A) = 0, then
does it follow that p(t) is the characteristic polynomial of A?
4. Basing on Theorem 6.4, describe an algorithm for computing the minimal poly-
nomial of a matrix.

6.3 Diagonalizability

Schur triangularization implies that each square matrix with complex entries is simi-
lar to an upper triangular matrix. Moreover, a square matrix with real entries is similar
to an upper triangular real matrix provided all zeros of its characteristic polynomial
6.3 Diagonalizability 123

are real. The upper triangular matrix similar to a given square matrix takes a better
form when the matrix is hermitian.

Theorem 6.5 (Spectral theorem for hermitian matrices) Each hermitian matrix is
unitarily similar to a real diagonal matrix. And, each real symmetric matrix is orthog-
onally similar to a real diagonal matrix.

Proof Let A ∈ Cn×n be a hermitian matrix. Due to Schur triangularization, we have


a unitary matrix P such that D = P ∗ A P is upper triangular. Now,

D ∗ = P ∗ A∗ P = P ∗ A P = D.

Since D is upper triangular and D ∗ = D, we see that D is a diagonal matrix.


Comparing the diagonal entries in D ∗ = D, we see that all diagonal entries in D are
real.
Further, if A is real symmetric, then all its eigenvalues are real. Due to Schur
triangularization, the matrix P above can be chosen to be an orthogonal matrix so
that D = P t A P is upper triangular. By a similar argument as above, it follows that
D is a diagonal matrix with real entries. 

A matrix A ∈ Fn×n is called diagonalizable if there exists an invertible matrix


P ∈ Fn×n such that P −1 A P is a diagonal matrix. When P −1 A P is a diagonal matrix,
we say that A is diagonalized by P. In this language, the spectral theorem for
hermitian matrices may be stated as follows:
Every hermitian matrix is unitarily diagonalizable; and every real symmetric matrix is orthog-
onally diagonalizable.

To see how the eigenvalues and eigenvectors are involved in the diagonalization
process, let A, P, D ∈ Fn×n be matrices, where

P = [v1 · · · vn ], D = diag(λ1 , . . . , λn ).

Then Pe j = v j and De j = λ j e j for each 1 ≤ j ≤ n. Suppose A P = P D. Then

Av j = A Pe j = P De j = P(λ j e j ) = λ j (Pe j ) = λ j v j for 1 ≤ j ≤ n.

Conversely, suppose Av j = λ j v j for 1 ≤ j ≤ n. Then

A Pe j = Av j = λ j v j = λ j Pe j = P (λ j e j ) = P De j for 1 ≤ j ≤ n.

That is, A P = P D. We summarize this as follows.

Observation 6.1 Let A, P, D ∈ Fn×n . For j = 1, . . . , n, let v j be the jth column


of P. Let D = diag(λ1 , . . . , λn ). Then, A P = P D iff Av j = λ j v j for each j =
1, . . . , n.
124 6 Canonical Forms

For diagonalization of a matrix A ∈ Fn×n , we find the eigenvalues of A and the


associated eigenvectors.
If F = R and there exists an eigenvalue with nonzero imaginary part, then A
cannot be diagonalized by a matrix with real entries.
Else, suppose we have n number of eigenvalues in F counting multiplicities.
If n linearly independent eigenvectors cannot be found, then A cannot be diago-
nalized.
Otherwise, we put the eigenvectors together as columns to form the matrix P;
and P −1 A P is a diagonalization of A. Further, by taking orthonormal eigenvectors,
the matrix P can be made unitary or orthogonal.
⎡ ⎤
1 −1 −1
Example 6.3 The matrix A = ⎣ −1 1 − 1 ⎦ is real symmetric. It has
−1 − 1 1
eigenvalues −1, 2 and 2, with associated orthonormal eigenvectors as
⎡ √ ⎤ ⎡ √ ⎤ ⎡ √ ⎤
1/ 3 −1/ 2 −1/ 6
√ √ √
⎣ 1/ 3 ⎦, ⎣ 1/ 2 ⎦, ⎣ −1/ 6 ⎦.
√ √
1/ 3 0 2/ 6

Thus, the diagonalizing orthogonal matrix is given by


⎡ √ √ √ ⎤
1/ 3 −1/ 2 −1/ 6
√ √ √
P=⎣ 1/ 3 1/ 2 −1/ 6 ⎦.
√ √
1/ 3 0 2/ 6

⎡ ⎤
−1 0 0
We see that P −1 = P t and P −1 A P = P t A P = ⎣ 0 2 0 ⎦. 
0 0 2
In fact, the spectral theorem holds for a bigger class of matrices. A matrix
A ∈ Cn×n is called a normal matrix iff A∗ A = A A∗ . All unitary matrices and all
hermitian matrices are normal matrices. All diagonal matrices are normal matrices.
In addition, a week converse to the last statement holds.
Theorem 6.6 Each upper triangular normal matrix is diagonal.
Proof Let U ∈ Cn×n be an upper triangular matrix. If n = 1, then clearly U is a
diagonal matrix. Suppose that each upper triangular normal matrix of order k is
diagonal. Let U be an upper triangular normal matrix of order k + 1. Write U in a
partitioned form as in the following:
 
R u
U=
0 a

where R ∈ Ck×k , u ∈ Ck×1 , 0 is the zero row vector in C1×k , and a ∈ C. Since U
is normal,
6.3 Diagonalizability 125
   ∗ 
R∗ R R∗u R R + uu ∗ ā u
U ∗U = = UU ∗
= .
u∗ R ∗
u u + |a| 2
a u∗ |a|2

It implies that u ∗ u + |a|2 = |a|2 . That is, u = 0. Plugging u = 0 in the above


equation, we see that R ∗ R = R R ∗ . Since R is upper triangular, by the induction
hypothesis, R is a diagonal matrix. Then with u = 0, U is also a diagonal matrix.
The proof is complete by induction. 

Using this result on upper triangular normal matrices, we can generalize the
spectral theorem to normal matrices.

Theorem 6.7 (Spectral theorem for normal matrices) A square matrix is unitarily
diagonalizable iff it is a normal matrix.

Proof Let A ∈ Cn×n . Let A be unitarily diagonalizable. Then there exist a unitary
matrix P and a matrix D = diag(λ1 , . . . , λn ) such that A = P D P ∗ . Then A∗ A =
P D ∗ D P ∗ and A A∗ = P D D ∗ P ∗ . However,

D ∗ D = diag |λ1 |2 , . . . , |λn |2 = D D ∗ .

Therefore, A∗ A = A A∗ . So, A is a normal matrix.


Conversely let A be a normal matrix; so A∗ A = A A∗ . Due to Schur triangu-
larization, let Q be a unitary matrix such that Q ∗ AQ = U, an upper triangular
matrix. Since Q ∗ = Q −1 , the condition A∗ A = A A∗ implies that U ∗ U = UU ∗ . By
Theorem 6.6, U is a diagonal matrix. 

There can be non-normal matrices which are diagonalizable. For example, with
⎡ ⎤ ⎡ ⎤
1 0 0 0 −1 0
A = ⎣4 3 −2⎦ , P = ⎣1 5 2⎦
2 1 0 1 3 1

we see that A∗ A = A A∗ , P ∗ P = I but


⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤
1 −1 2 1 0 0 0 −1 0 1 0 0
P −1 A P = ⎣−1 0 0 ⎦ ⎣4 3 −2⎦ ⎣1 5 2 ⎦ = ⎣0 1 0⎦ .
2 1 −1 2 1 0 1 3 1 0 0 2

Observe that in such a case, the diagonalizing matrix P is non-unitary.


In general, we have the following characterization of diagonalizability.

Theorem 6.8 A matrix A ∈ Fn×n is diagonalizable iff there exists a basis of Fn×1
consisting of eigenvectors of A.

Proof Let A ∈ Fn×n . Suppose A is diagonalizable. Then there exist an invertible


matrix P and a diagonal matrix D such that A P = P D. Let v j be the jth column of
126 6 Canonical Forms

P. By Observation 6.1, Av j = λ j v j . As P is invertible, its columns v1 , . . . , vn are


nonzero; thus they are eigenvectors of A, and they form a basis for Fn×1 .
Conversely, suppose that {v1 , . . . , vn } is a basis of Fn×1 and that for each j =
1, . . . , n, v j is an eigenvector of A. Then there exists λ j ∈ F such that Av j = λ j v j .
Construct n × n matrices
 
P = v1 · · · vn , D = diag(λ1 , . . . , λn ).

Observation 6.1 implies that A P = P D. Since the columns of P form a basis for
Fn×1 , P is invertible. Therefore, A is diagonalizable. 

In case, we have a basis B = {v1 , . . . , vn } for Fn×1 with Av j = λ j v j , due


to Theorems 3.9 and 3.10, the co-ordinate matrix [A] B,B is the diagonal matrix
diag(λ1 , . . . , λn ). The question is, when are there n linearly independent eigen-
vectors of A? The spectral theorem for normal matrices provides a partial answer.
Another partial answer on diagonalizability is as follows.

Theorem 6.9 If an n × n matrix has n distinct eigenvalues, then it is diagonalizable.

Proof Suppose A ∈ Cn×n has n distinct eigenvalues λ1 , . . . , λn with corresponding


eigenvectors v1 , . . . , vn . By Theorem 5.3, the vectors v1 , . . . , vn are linearly inde-
pendent, and thus form a basis for Cn×1 . Therefore, A is diagonalizable.
More directly, take P = [v1 · · · vn ]. Then P is invertible. By Observation 6.1,
P −1 A P = diag(λ1 , . . . , λn ). 

If λ is an eigenvalue of a matrix A with an associated eigenvector u, then Au = λu;


that is, u ∈ N (A − λI ). The number of linearly independent solution vectors of
Au = λu is dim(N (A − λI )), the nullity of A − λI. This number has certain relation
with diagonalizability of T.
Let λ be an eigenvalue of a matrix A ∈ Fn×n . The geometric multiplicity of λ is
dim(N (A − λI )); and the algebraic multiplicity of λ is the largest natural number
k such that (t − λ)k divides χA (t).
Observe that if λ is an eigenvalue of A, then its geometric multiplicity is the
maximum number of linearly independent eigenvectors associated with λ; and its
algebraic multiplicity is the number of times λ is a zero of the characteristic polyno-
mial. Thus the algebraic multiplicity of an eigenvalue is often called its multiplicity.
   
1 0 1 1
Example 6.4 Let A = and let B = .
0 1 0 1
The characteristic polynomials of both A and B are equal to (t − 1)2 . The eigen-
value λ = 1 has algebraic multiplicity 2 for both A and B.
For geometric multiplicities, we solve Ax = x and By = y.
Now, Ax = x gives x = x, which is satisfied by any vector in F2×1 . Thus, N (A −
I ) = F2×1 ; consequently, the geometric multiplicity of the only eigenvalue 1 of A
is dim(N (A − I )) = 2.
6.3 Diagonalizability 127

For the matrix B, take y = [a, b]t . Then By = y gives a + b = a and b = b.


That is, a = 0; and b can be any scalar. For example, [0, 1]t is a solution. Then the
geometric multiplicity of the eigenvalue 1 of B is dim(N (B − I )) = 1. 

Theorem 6.10 The geometric multiplicity of an eigenvalue of a matrix is less than


or equal to the algebraic multiplicity of that eigenvalue.

Proof Let λ be an eigenvalue of a matrix A. Let  be the geometric multiplicity


and let k be the algebraic multiplicity of the eigenvalue λ. We have  number of
linearly independent eigenvectors of A associated with the eigenvalue λ, and no
more. Extend the set of these eigenvectors to an ordered basis B of V. Let P be
the matrix whose columns are the vectors in B. Due to Theorems 3.9 and 3.10, the
matrix M = P −1 A P = [A] B,B may be written as
 
λI C
M=
0 D

for some matrices C ∈ C×(n−) and D ∈ C(n−)×(n−) . Since A and M are similar,
they have the same characteristic polynomial χ(t) = (λ − t) p(t) for some polyno-
mial p(t) of degree n − .
But the zero λ of χ(t) is repeated k times. That is, χ(t) = (λ − t)k q(t) for some
polynomial q(t) of which (λ − t) is not a factor.
Notice that λ − t may or may not be a factor of p(t). In any case,  ≤ k. 

Theorem 6.11 An n × n matrix A is diagonalizable iff the geometric multiplicity of


each eigenvalue of A is equal to its algebraic multiplicity iff the sum of geometric
multiplicities of all eigenvalues of A is n.

Proof Let λ be an eigenvalue of an n × n matrix A with geometric multiplicity


 and algebraic multiplicity k. Let A be diagonalizable. Then we have a basis E
of Fn×1 which consists of eigenvectors of A, with respect to which the matrix of
A is diagonal. In this diagonal matrix, there are exactly k number of entries equal
to λ. In the basis E, there are k number of eigenvectors associated with λ. These
eigenvectors corresponding to λ are linearly independent. There may be more number
of linearly independent eigenvectors associated with λ, but no less. So,  ≥ k. Then
Theorem 6.10 implies that  = k.
Conversely, suppose that the geometric multiplicity of each eigenvalue is equal to
its algebraic multiplicity. Then corresponding to each eigenvalue λ, we have exactly
that many linearly independent eigenvectors as its algebraic multiplicity. Moreover,
eigenvectors corresponding to distinct eigenvalues are linearly independent. Thus,
collecting together the eigenvectors associated with all eigenvalues, we get n linearly
independent eigenvectors which form a basis for Fn×1 . (See Problem 7.) Therefore,
A is diagonalizable.
The second ‘iff’ statement follows since geometric multiplicity of each eigenvalue
is at most its algebraic multiplicity. 
128 6 Canonical Forms

In Example 6.4, we see that the geometric multiplicity of each (the only) eigen-
value of the matrix A is equal to its algebraic multiplicity; both are 2. So, A is
diagonalizable; in fact, it is already diagonal. But the geometric multiplicity of the
(only) eigenvalue 1 of B is 1 while the algebraic multiplicity is 2. Therefore, B is
not diagonalizable.
Exercises for Sect. 6.3
1. Diagonalize the given matrix, and then compute its fifth power:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 7 −2 0 7 − 5 15
(a) ⎣1 0 1⎦ (b) ⎣−2 6 −2⎦ (c) ⎣ 6 − 4 15 ⎦
1 1 0 0 −2 5 0 0 1
2. Show that the following matrices are diagonalized by matrices in R3×3 .
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3/2 −1/2 0 3 −1/2 −3/2 2 −1 0
(a) ⎣ −1/2 3/2 0 ⎦ (b) ⎣ 1 3/2 3/2 ⎦ (c) ⎣ −1 2 0⎦
1/2 −1/2 1 −1 1/2 5/2 2 2 3
3. If possible, diagonalize the following matrices:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 0 0 1 −2 1 1 2 3
(a) ⎣ 2 1 0 ⎦ (b) ⎣ 0 1 1 ⎦ (c) ⎣ 2 4 6 ⎦
1 2 −1 0 3 −1 −1 −2 −3
4. Are the following matrices diagonalizable?
⎡ ⎤
⎡ ⎤ 2 1 0 0
  1 −10 0
2 3 ⎢0 2 0 0⎥
(a) (b) ⎣ −1 3 1⎦ (c) ⎢
⎣0

6 −1 0 2 0⎦
−1 0 4
0 0 0 5
5. Check whether each of the following matrices is diagonalizable. If diagonalizable,
find a basis of eigenvectors for C3×1 :
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 1 0 1
(a) ⎣ 1 −1 1 ⎦ (b) ⎣ 0 1 1 ⎦ (c) ⎣ 1 1 0 ⎦
1 1 −1 0 0 1 0 1 1
6. Find orthogonal or unitary diagonalizing matrices for the following:
⎡ ⎤
    −4 −2 2
2 1 1 3+i
(a) (b) (c) ⎣ −2 −1 1 ⎦
1 2 3−i 4
2 1 −1
that B 2 = A,
7. Determine A5 and a matrix B such ⎡ ⎤ where
  9 −5 3
2 1
(a) A = (b) A = ⎣ 0 4 3 ⎦
−2 −1
0 0 1
8. If A is a normal matrix, then show that A∗ , I + A and A2 are also normal.
6.4 Jordan Form 129

6.4 Jordan Form

All matrices cannot be diagonalized since corresponding to an eigenvalue, there may


not be sufficient number of linearly independent eigenvectors. Non-diagonalizability
of a matrix A ∈ Fn×n means that we cannot have a basis consisting of vectors v j for
Fn×1 so that Av j = λ j v j for scalars λ j . In that case, we would like to have a basis
which would bring the matrix to a nearly diagonal form. Specifically, if possible, we
would try to construct a basis {v1 , . . . , vn } such that

Av j = λ j v j or Av j = λ j v j + v j−1 for each j.

Notice that the matrix similar to A with respect to such a basis would have λ j s on
the diagonal, and possibly nonzero entries on the super diagonal (entries above the
diagonal); all other entries being 0.
We will show that it is possible, by proving that there exists an invertible matrix
P such that
P −1 A P = diag(J1 , J2 , . . . , Jk ),

where each Ji is a block diagonal matrix of the form

Ji = diag( J˜1 (λi ), J˜2 (λi ), . . . , J˜si (λi )),

for some si . Each matrix J˜j (λi ) of order j here has the form
⎡ ⎤
λi 1
⎢ λi 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
J˜j (λi ) = ⎢ . . ⎥.
⎢ ⎥
⎣ 1⎦
λi

The missing entries are all 0. Such a matrix J˜j (λi ) is called a Jordan block with
diagonal entries λi . The order of the Jordan block is its order as a square matrix.
Any matrix which is in the block diagonal form diag(J1 , J2 , . . . , Jk ) is said to be in
Jordan form.
In writing Jordan blocks and Jordan forms, we do not show the zero entries for
improving legibility. For instance, the following are possible Jordan form matrices
of order 3 with all diagonal entries as 1:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1
⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 1 1⎦ ⎣ 1 1⎦
1 1 1 1
130 6 Canonical Forms

Example 6.5 The following matrix is in Jordan form:


⎡ ⎤
1
⎢ 1 ⎥
⎢ ⎥
⎢ 1 1 ⎥
⎢ ⎥
⎢ 1 ⎥.
⎢ ⎥
⎢ 2 1 ⎥
⎢ ⎥
⎣ 2 1⎦
2

It has three Jordan blocks for the eigenvalue 1 of which two are of order 1 and
one of order 2; and it has one block of order 3 for the eigenvalue 2.
The eigenvalue 1 has geometric multiplicity 3, algebraic multiplicity 4, and the
eigenvalue 2 has geometric multiplicity 1 and algebraic multiplicity 3. 

In what follows, we will be using similarity transformations resulting out of ele-


mentary matrices. A similarity transformation that uses an elementary matrix E[i, j]
on a matrix A transforms A to (E[i, j])−1 A E[i, j]. Since (E[i, j])−1 = (E[i, j])t =
E[i, j], the net effect of this transformation is described as follows:
E[i, j]−1 A E[i, j] = E[i, j] A E[i, j] exchanges the ith and jth rows, and then exchanges
the ith and the jth columns of A.

We will refer to this type of similarity transformations by the name permutation


similarity.
Using the second type of elementary matrices, we have a similarity transfor-
mation (E α [i])−1 A E α [i] for α = 0. Since (E α [i])−1 = E 1/α [i] and (E α [i])t =
E α [i], this similarity transformation has the following effect:
(E α [i])−1 A E α [i] = E 1/α [i] A E α [i] multiplies all entries in the ith row with 1/α, and then
multiplies all entries in the ith column with α; thus keeping (i, i)th entry intact.

We will refer to this type of similarity transformation as dilation similarity. In


particular, if A is such a matrix that its ith row has all entries 0 except the (i, i)th entry,
and there is another entry on the ith column which is α = 0, then (E α [i])−1 AE α [i]
is the matrix in which this α changes to 1 and all other entries are as in A.
The third type of similarity transformation applied on A yields the matrix
(E α [i, j])−1 AE α [i, j]. Notice that (E α [i, j])−1 = E −α [i, j] and (E α [ j, i])t =
E α [i, j]. This similarity transformation changes a matrix A as described below:
(E α [i, j])−1 AE α [i, j] = E −α [i, j] A E α [i, j] is obtained from A by subtracting α times
the jth row from the ith row, and then adding α times the ith column to the jth column.

We name this type of similarity as a combination similarity.


In the formula for m k , below we use the convention that for any matrix B of order
n, B 0 is the identity matrix of order n.
6.4 Jordan Form 131

Theorem 6.12 (Jordan form) Each matrix A ∈ Cn×n is similar to a matrix in Jordan
form J, where the diagonal entries are the eigenvalues of A. For 1 ≤ k ≤ n, if m k (λ)
is the number of Jordan blocks of order k with diagonal entry λ, in J, then

m k (λ) = rank((A − λI )k−1 ) − 2 rank((A − λI )k ) + rank((A − λI ))k+1 .

The Jordan form of A is unique up to a permutation of the blocks.

Proof First, we will show the existence of a Jordan form, and then we will come
back to the formula m k , which will show the uniqueness of a Jordan form up to a
permutation of Jordan blocks.
Due to Schur triangularization, we assume that A is an upper triangular matrix,
where the eigenvalues of A occur on the diagonal, and equal eigenvalues occur
together. If λ1 , . . . , λk are the distinct eigenvalues of A, then our assumption means
that A is an upper triangular matrix with diagonal entries, read from top left to bottom
right, appear as
λ 1 , . . . , λ 1 ; λ2 , . . . , λ 2 ; . . . ; λk , . . . , λ k .

Let n i denote the number of times λi occurs in this list. First, we show that by
way of a similarity transformation, A can be brought to the form

diag(A1 , A2 , . . . , Ak ),

where each Ai is an upper triangular matrix of size n i × n i and each diagonal entry
of Ai is λi . Our requirement is shown schematically as follows, where each such
element marked x that is not inside the blocks Ai needs to be zeroed-out by a similarity
transformation.
⎡ ⎤ ⎡ ⎤
A1 x A1 0
⎢ ⎥ ⎢ ⎥
⎢ A2 ⎥ ⎢ A2 ⎥
⎢ .. ⎥ −→ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥
⎣ ⎦ ⎣ ⎦
Ak Ak

If such an x occurs as the (r, s)th entry in A, then r < s. Moreover, the corre-
sponding diagonal entries arr and ass are eigenvalues of A occurring in different
blocks Ai and A j . Thus arr = ass . Further, all entries below the diagonals of Ai and
of A j are 0. We use a combination similarity to obtain

−x
E −α [r, s] A E α [r, s] with α = .
arr − ass

This similarity transformation subtracts α times the sth row from the r th row and
then adds α times the r th column to the sth column. Since r < s, it changes the
entries of A in the r th row to the right of the sth column, and the entries in the sth
column above the r th row. Thus, the upper triangular nature of the matrix does not
change. Further, it replaces the (r, s)th entry x with
132 6 Canonical Forms

−x
ar s + α(arr − ass ) = x + (arr − ass ) = 0.
arr − ass

We use a sequence of such similarity transformations starting from the last row of
Ak−1 with smallest column index and ending in the first row with largest column
index. Observe that an entry beyond the blocks, which was 0 previously can become
nonzero after a single such similarity transformation. Such an entry will eventually be
zeroed-out. Finally, each position which is not inside any of the k blocks A1 , . . . , Ak
contains only 0. On completion of this stage, we end up with a matrix

diag(A1 , A2 , . . . , Ak ).

In the second stage, we focus on bringing each block Ai to the Jordan form. For
notational convenience, write λi as a. If n i = 1, then such an Ai is already in Jordan
form. We use induction on the order n i of Ai . Lay out the induction hypothesis that
each such matrix of order m − 1 has a Jordan form. Suppose Ai has order m. Look
at Ai in the following partitioned form:
 
B u
Ai = ,
0 a

where B is the first (m − 1) × (m − 1) block, 0 is the zero row vector in C1×(m−1) ,


and u is a column vector in C(m−1)×1 . By the induction hypothesis, there exists an
invertible matrix Q such that Q −1 B Q is in Jordan form; it looks like
⎡ ⎤ ⎡ ⎤
B1 a 1
⎢ ⎥ ⎢ a 1 ⎥
−1 ⎢ B2 ⎥ ⎢ .. .. ⎥
Q BQ = ⎢ .. ⎥ where each B j = ⎢ . . ⎥.
⎣ . ⎦ ⎣ ⎦
1
B a

Then
⎡ ⎤
a ∗ b1
 −1     ⎢ a ∗ b2 ⎥
⎢ .. .. ⎥
Q 0 Q 0 Q −1 B Q Q −1 u ⎢ . . ⎥
Ai = =⎢ ⎥.
0 1 0 1 0 a ⎢ ∗ bm−2 ⎥
⎣ a b

m−1
a

Call the above matrix as C. In the matrix C, the sequence of ∗’s on the super-
diagonal, read from top left to right bottom, comprise a block of 1s followed by a
0, and then a block of 1s followed by a 0, and so on. The number of 1s depends on
the sizes of B1 , B2 , etc. That is, when B1 is over, and B2 starts, we have a 0. Also,
we have shown Q −1 u as [b1 · · · bm−1 ]t . Our goal is to zero-out all b j s except bm−1
which may be made a 0 or 1.
6.4 Jordan Form 133

In the next sub-stage, call it the third stage, we apply similarity transformations
to zero-out (all or except one of) the entries b1 , . . . , bm−2 . In any row of C, the entry
above the diagonal (the ∗ there) is either 0 or 1. The ∗ is a 0 at the last row of each
block B j . We leave all such b’s right now; they are to be tackled separately. So,
suppose in the r th row, br = 0 and the (r, r + 1)th entry (the ∗ above the diagonal
entry) is a 1. We wish to zero-out each such br which is in the (r, m) position. For
this purpose, we use a combination similarity to transform C to

E br [r + 1, m] C (E br [r + 1, m])−1 = E br [r + 1, m] C E −br [r + 1, m].

Observe that this matrix is obtained from C by adding br times the last row to
the (r + 1)th row, and then subtracting br times the (r + 1)th column from the last
column. Its net result is replacing the (r, m)th entry by 0, and keeping all other
entries intact. Continuing this process of applying a suitable combination similarity
transformation, each nonzero bi with a corresponding 1 on the super-diagonal on
the same row is reduced to 0. We then obtain a matrix, where all entries in the last
column of C have been zeroed-out, without touching the entries at the last row of
any of the blocks B j . Write such entries as c1 , . . . , c . Thus, at the end of third stage,
Ai has been brought to the following from by similarity transformations:
⎡ ⎤
B1
⎢ c1 ⎥
⎢ B2 ⎥
⎢ c2 ⎥
⎢ .. ⎥
F := ⎢
⎢ . ⎥

⎢ ⎥
⎢ ⎥
⎣ B
c ⎦
a

Notice that if B j is a 1 × 1 block, then the corresponding entry c j on the last


column is already 0. In the next sub-stage, call it the fourth stage, we keep the
nonzero c corresponding to the last block (the c entry with highest column index),
and zero-out all other c’s. Let Bq be the last block so that its corresponding c entry is
cq = 0 in the sth row. (It may not be c ; in that case, all of cq+1 , . . . , c are already
0.) We first make cq a 1 by using a dilation similarity:

G := E 1/cq [s] F E cq [s].

In G, the earlier cq at (s, m)th position is now 1. Let B p be any block other than
Bq with c p = 0 in the r th row. Our goal in this sub-stage, call it the fifth stage, is to
zero-out c p . We use two combination similarity transformations as shown below:

H := E −c p [r − 1, s − 1] E −c p [r, s] G E c p [r, s] E c p [r − 1, s − 1].

This similarity transformation brings c p to 0 and keeps other entries intact. We do


this for each such c p . Thus in the mth column of H, we have only one nonzero entry
134 6 Canonical Forms

1 at (s, m)th position. If this happens to be at the last row, then we have obtained a
Jordan form. Otherwise, in this sub-stage (call it the seventh stage), we move this 1
to the (s, s + 1)th position by the following sequence of permutation similarities:

E[m − 1, m] · · · E[s + 2, m]E[s + 1, m] H E[s + 1, m]E[s + 2, m] · · · E[m − 1, m].

This transformation exchanges the rows and columns beyond the sth so that the 1 in
(s, m)th position moves to (s, s + 1)th position making up a block; and other entries
remain as they were earlier.
Here ends the proof by induction that each block Ai can be brought to a Jordan form
by similarity transformations. From a similarity transformation for Ai , a similarity
transformation can be constructed for the block diagonal matrix

à := diag(A1 , A2 , . . . , Ak )

by putting identity matrices of suitable order and the similarity transformation for Ai
in a block form. As these transformations do not affect any other rows and columns
of Ã, a sequence of such transformations brings à to its Jordan form, proving the
existence part in the theorem.
Toward the formula for m k , let λ be an eigenvalue of A, and let 1 ≤ k ≤ n.
Observe that A − λI is similar to J − λI. Thus,

rank((A − λI )i ) = rank((J − λI )i ) for each i.

Therefore, it is enough to prove the formula for J instead of A.


We use induction on n. In the basis case, J = [λ]. Here, k = 1 and m k = m 1 = 1.
On the right hand side,

(J − λI )k−1 = I = [1], (J − λI )k = [0]1 = [0], (J − λI )k+1 = [0]2 = [0].

So, the formula holds for n = 1.


Lay out the induction hypothesis that for all matrices in Jordan form of order less
than n, the formula holds. Let J be a matrix of order n, which is in Jordan form. We
consider two cases.
Case 1: Let J have a single Jordan block corresponding to λ. That is,
⎡ ⎤ ⎡ ⎤
λ 1 0 1
⎢ λ 1 ⎥ ⎢ 0 1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. .. .. ..
J =⎢ . . ⎥⎥,

J − λI = ⎢ . . ⎥
⎥.
⎢ ⎥ ⎢ ⎥
⎣ 1⎦ ⎣ 1⎦
λ 0

Here m 1 = 0, m 2 = 0, . . . , m n−1 = 0 and m n = 1. We see that (J − λI )2 has 1s on


the super-super-diagonal, and 0 elsewhere. Proceeding similarly for higher powers
of J − λI , we see that their ranks are given by
6.4 Jordan Form 135

rank(J − λI ) = n − 1, rank (J − λI )2 = n − 2, . . . , rank (J − λI )i = n − i,


rank (J − λI )n = 0, rank (J − λI )n+1 = 0, . . .

Then for k < n, rank (J − λI )k−1 − 2 rank (J − λI )k + rank (J − λI )k+1


= (n − (k − 1)) − 2(n − k) + (n − k − 1) = 0.
And for k = n, rank (J − λI )k−1 − 2 rank (J − λI )k + rank (J − λI )k+1
= (n − (n − 1)) − 2 × 0 + 0 = 1 = m n .
Case 2: Suppose J has more than one Jordan block corresponding to λ. The first
Jordan block in J corresponds to λ and has order r for some r < n. Then J − λI
can be written in block form as
 
C 0
J − λI = ,
0 D

where C is the Jordan block of order r with diagonal entries as 0, and D is the matrix
of order n − r in Jordan form consisting of other blocks of J − λI. Then, for any j,
 j 
C 0
(J − λI ) j = .
0 Dj

Therefore, rank(J − λI ) j = rank(C j ) + rank(D j ). Write m k (C) and m k (D) for


the number of Jordan blocks of order k for the eigenvalue λ that appear in C, and in
D, respectively. Then
m k = m k (C) + m k (D).

By the induction hypothesis,

m k (C) = rank(C k−1 ) − 2 rank(C k ) + rank(C)k+1 ,


m k (D) = rank(D k−1 ) − 2 rank(D k ) + rank(D)k+1 .

It then follows that

m k = rank((J − λI )k−1 ) − 2 rank((J − λI )k ) + rank((J − λI ))k+1 .

Since the number of Jordan blocks of order k corresponding to each eigenvalue


of A is uniquely determined, the Jordan form of A is also uniquely determined up to
a permutation of blocks. 

To obtain a Jordan form of a given matrix, we may use the construction of similarity
transformations as used in the proof of Theorem 6.12, or we may use the formula
for m k as given there. We illustrate these methods in the following examples.
136 6 Canonical Forms
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 2 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
Example 6.6 Let A = ⎢
⎢ 2 0 0 0 0 ⎥⎥.
⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3
This is an upper triangular matrix. Following the proof of Theorem 6.12, we first
zero-out the circled entries, starting from the entry on the third row. Here, the row
index is r = 3, the column index is s = 7, the eigenvalues are arr = 2, ass = 3,
and the entry to be zeroed-out is x = 2. Thus, α = −2/(2 − 3) = 2. We use an
appropriate combination similarity to obtain

M1 = E −2 [3, 7] A E 2 [3, 7].

That is, in A, we replace r ow(3) with r ow(3) − 2 × r ow(7) and then replace col(7)
with col(7) + 2 × col(3). It leads to
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 0 −2 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
M1 = ⎢ ⎢ 2 0 0 0 0 ⎥ ⎥.
⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3

Notice that the similarity transformation brought in a new nonzero entry such
as −2 in (3, 8) position. But its column index has increased. Looking at the matrix
afresh, we must zero-out this entry first. The suitable combination similarity yields

M2 = E 2 [3, 8] M1 E −2 [3, 8]

which replaces r ow(3) with r ow(3) + 2 × r ow(8) and then replaces col(8) with
col(8) − 2 × col(3). Verify that it zeroes-out the entry −2 but introduces 2 at (3, 9)
position. Once more, we use a combination similarity to obtain

M3 = E −2 [3, 9] M2 E 2 [3, 9]

replacing r ow(3) with r ow(3) − 2 × r ow(9) and then replacing col(9) with col(9) +
2 × col(3). Now,
6.4 Jordan Form 137
⎡ ⎤
2 1 0 0 0 1 0 2 0
⎢ 2 0 0 0 3 0 0 1 ⎥
⎢ ⎥
⎢ 2 1 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 0 0 0 ⎥
⎢ ⎥
M3 = ⎢
⎢ 2 0 0 0 0 ⎥.

⎢ 2 0 0 0 ⎥
⎢ ⎥
⎢ 3 1 1 ⎥
⎢ ⎥
⎣ 3 1 ⎦
3

Similar to the above, we use the combination similarities to reduce M3 to M4 ,


where
M4 = E −1 [2, 9] M3 E 1 [2, 9].

To zero-out the encircled 2, we use a suitable combination similarity, and get

M5 = E −2 [1, 8] M4 E 2 [1, 8].

It zeroes-out the encircled 2 but introduces −2 at (1, 9) position. Once more, we


use a suitable combination similarity to obtain
⎡ ⎤
2 1 0 0 0 1
⎢ 2 0 0 0 3 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 ⎥
⎢ ⎥
M6 = E 2 [1, 9] M5 E −2 [1, 9] = ⎢
⎢ 2 0 ⎥.

⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3

Now, the matrix M6 is in block diagonal form. We focus on each of the blocks,
though we will be working with the whole matrix. We consider the block corre-
sponding to the eigenvalue 2 first. Since this step is inductive we scan this block
from the top left corner. The 2 × 2 principal sub-matrix of this block is already in
Jordan form. The 3 × 3 principal sub-matrix is also in Jordan form. We see that the
principal sub-matrix of size 4 × 4 and 5 × 5 is also in Jordan form, but the 6 × 6
sub-matrix, which is the block itself is not in Jordan form.
We wish to bring the sixth column to its proper shape. Recall that our strategy
is to zero out all those entries on the sixth column which are opposite to a 1 on the
super-diagonal of this block. There is only one such entry, which is encircled in M6
above.
The row index of this entry is r = 1, its column index is m = 6, and the entry
itself is br = 1. We use a combination similarity to obtain
138 6 Canonical Forms
⎡ ⎤
2 1 0 0 0 0
⎢ 2 0 0 0 5 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 0 2 ⎥
⎢ ⎥

M7 = E 1 [2, 6] M6 E −1 [2, 6] = ⎢ 2 0 ⎥.

⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3

Next, among the nonzero entries 5 and 2 at the positions (2, 6) and (4, 6), we
wish to zero-out the 5 and keep 2 as the row index of 2 is higher. First, we use a
dilation similarity to make this entry 1 as in the following:

M8 = E 1/2 [4] M7 E 2 [4].

It replaces r ow(4) with 1/2 times itself and then replaces col(4) with 2 times
itself, thus making (4, 6)th entry 1 and keeping all other entries intact. Next, we
zero-out the 5 on (2, 4) position by using the two combination similarities. Here,
c p = 5, r = 2, s = 4; thus
⎡ ⎤
2 1 0 0 0 0
⎢ 2 0 0 0 0 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 0 1 ⎥
⎢ ⎥
M9 = E −5 [1, 3] E −5 [2, 4] M8 E 5 [2, 4] E 5 [1, 3] = ⎢
⎢ 2 0 ⎥.

⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3

Here, M9 has been obtained from M8 by replacing r ow(2) with r ow(2) − 5 ×


r ow(4), col(4) with col(4) + 5 × col(2), r ow(1) with r ow(1) − 5 × r ow(3), and
then col(3) with col(3) + 5 × col(1).
Next, we move this encircled 1 to (4, 5) position by similarity. Here, s = 4,
m = 6. Thus, the sequence of permutation similarities boils down to only one, i.e.,
exchanging r ow(5) with r ow(6) and then exchanging col(6) with col(5). Observe
that we would have to use more number of permutation similarities if the difference
between m and s is more than 2. It gives
6.4 Jordan Form 139
⎡ ⎤
2 1 0 0 0 0
⎢ 2 0 0 0 0 ⎥
⎢ ⎥
⎢ 2 1 0 0 ⎥
⎢ ⎥
⎢ 2 1 0 ⎥
⎢ ⎥
M10 ⎢
= E[5, 6] M9 E[5, 6] = ⎢ 2 0 ⎥.

⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 1⎥
⎢ ⎥
⎣ 3 1⎦
3

Now, the diagonal block corresponding to the eigenvalue 2 is in Jordan form. We


focus on the other block corresponding to 3. Here, (7, 9)th entry which contains a
1 is to be zeroed-out. This entry is opposite to a 1 on the super-diagonal. We use a
combination similarity. Here, the row index is r = 7, the column index m = 9, and
the entry is br = 1. Thus, we have Jordan form as
⎡ ⎤
2 1
⎢ 2 ⎥
⎢ ⎥
⎢ 2 1 ⎥
⎢ ⎥
⎢ 2 1 ⎥
⎢ ⎥

M11 = E 1 [8, 9] M10 E −1 [8, 9] = ⎢ 2 ⎥. 

⎢ 2 ⎥
⎢ ⎥
⎢ 3 1 ⎥
⎢ ⎥
⎣ 3 1⎦
3

Example 6.7 Consider the matrix A of Example 6.6. Here, we compute the number
m k of Jordan blocks of size k corresponding to each eigenvalue. For this purpose, we
require the ranks of the matrices (A − λI )k for successive k, and for each eigenvalue
λ of A. We see that A has two eigenvalues 2 and 3.
For the eigenvalue 2,

rank(A − 2I )0 = rank(I ) = 9, rank(A − 2I ) = 6, rank(A − 2I )2 = 4,


rank(A − 2I )3+k = 3 for k = 0, 1, 2, . . .

For the eigenvalue 3,

rank(A − 3I )0 = rank(I ) = 9, rank(A − 3I ) = 8, rank(A − 3I )2 = 7,


rank(A − 3I )3+k = 6 for k = 0, 1, 2, . . .

Using the formula for m k (λ), we obtain


140 6 Canonical Forms

m 1 (2) = 9 − 2 × 6 + 4 = 1, m 2 (2) = 6 − 2 × 4 + 3 = 1,
m 3 (2) = 4 − 2 × 3 + 3 = 1, m 3+k (2) = 3 − 2 × 3 + 3 = 0.
m 1 (3) = 9 − 2 × 8 + 7 = 0, m 2 (3) = 8 − 2 × 7 + 6 = 0,
m 3 (3) = 7 − 2 × 6 + 6 = 1, m 3+k (3) = 6 − 2 × 6 + 6 = 0.

Therefore, in the Jordan form of A, there is one Jordan block of size 1, one of
size 2 and one of size 3 with eigenvalue 2, and one block of size 3 with eigenvalue
3. From this information, we see that the Jordan form of A is uniquely determined
up to any rearrangement of the blocks. Check that M11 as obtained in Example 6.6
is one such Jordan form of A. 

Suppose that a matrix A ∈ Cn×n has a Jordan form J = P −1 A P, in which the


first Jordan block is of size k with diagonal entries as λ. If P = [v1 · · · vn ], then
A P = P J implies that

A(v1 ) = λv1 , A(v2 ) = v1 + λv2 , . . . , A(vk ) = vk−1 + λvk .

If the next Jordan block in J has diagonal entries as μ (which may or may not be
equal to λ), then we have Avk+1 = μvk+1 , Avk+2 = vk+1 + μvk+2 , . . . , and so on.
The list of vectors v1 , . . . , vk above is called a Jordan string that starts with v1
and ends with vk . The number k is called the length of the Jordan string. In such a
Jordan string, we see that

v1 ∈ N (A − λI ), v2 ∈ N (A − λI )2 , . . . , vk ∈ N (A − λI )k .

Any vector in N ((A − λI ) j ), for some j, is called a generalized eigenvector


corresponding to the eigenvalue λ of A.
The columns of P are all generalized eigenvectors of A corresponding to the
eigenvalues of A. These generalized eigenvectors form a basis for Fn×1 . Such a basis
is called a Jordan basis. The coordinate matrix of A with respect to the Jordan basis
is the Jordan form J of A.
The Jordan basis consists of Jordan strings. Each Jordan string starts with an
eigenvector of A, such as v1 above. If λ is an eigenvalue of A having geometric
multiplicity γ , then there are exactly γ number of Jordan strings in the Jordan basis
corresponding to the eigenvalue λ. Thus, there are exactly γ number of Jordan blocks
in J with diagonal entries as λ. The size of any such block is equal to the length of
the corresponding Jordan string.
The uniqueness of a Jordan form can be made exact by first ordering the eigen-
values of A and then arranging the blocks corresponding to each eigenvalue (which
now appear together on the diagonal) in some order, say in ascending order of their
size. In doing so, the Jordan form of any matrix becomes unique. Such a Jordan form
is called the Jordan canonical form of a matrix. It then follows that if two matrices
are similar, then they have the same Jordan canonical form. Moreover, uniqueness
6.4 Jordan Form 141

also implies that two dissimilar matrices will have different Jordan canonical forms.
Therefore, Jordan form characterizes similarity of matrices.
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Suppose J = P −1 A P. Now, J t = P t At (P −1 )t = P t At (P t )−1 . That is,
At is similar to J t . Thus, it is enough to show that J t is similar to J. First, let us
see it for a single Jordan block. For a Jordan block Jλ , consider the matrix Q of the
same order as in the following:
⎡ ⎤ ⎡ ⎤
λ 1 1
⎢ λ 1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢
1

⎢ .. .. ⎥
Jλ = ⎢ . . ⎥, Q=⎢
⎢ ... ⎥.

⎢ ⎥ ⎣ ⎦
⎣ 1⎦ 1
λ 1

In the matrix Q, the entries on the anti-diagonal are all 1 and all other entries are
0. We see that Q 2 = I. Thus, Q −1 = Q. Further,

Q −1 Jλ Q = Q Jλ Q = (Jλ )t .

Therefore, each Jordan block is similar to its transpose. Now, construct a matrix
R by putting matrices such as Q as its blocks matching the orders of each Jordan
block in J. Then it follows that R −1 J R = J t .
Jordan form guarantees that one can always choose m linearly independent gen-
eralized eigenvectors corresponding to the eigenvalue λ, where m is the algebraic
multiplicity of λ. Moreover, the following is guaranteed:
If the linear system (A − λI )k x = 0 has r < m number of linearly independent solutions,
then (A − λI )k+1 = 0 has at least r + 1 number of linearly independent solutions.

This result is more useful in computing the exponential of a square matrix rather
than using the Jordan form explicitly. See Sect. 7.6 for details.
Exercises for Sect. 6.4
1. Determine the Jordan forms of the following matrices:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 0 0 1 −2 −1 −3
(a) ⎣1 0 0⎦ (b) ⎣0 1 0⎦ (c) ⎣ 4 3 3⎦
2 1 0 1 0 0 −2 1 −1
2. Determine the matrix P ∈ C3×3 such that P −1 A P is in Jordan form, where A is
the matrix in Exercise 1(c).
3. Let A be a 7 × 7 matrix with characteristic polynomial (t − 2)4 (3 − t)3 . It is
known that in the Jordan form of A, the largest blocks for both the eigenvalues
are of order 2. Show that there are only two possible Jordan forms for A; and
determine those Jordan forms.
4. Let A be a 5 × 5 matrix whose first and second rows are, respectively,
[0, 1, 1, 0, 1] and [0, 0, 1, 1, 1]; and all other rows are zero rows. What is the
Jordan form of A?
142 6 Canonical Forms

5. Let A be an n × n a lower triangular matrix with each diagonal entry as 1, each


sub-diagonal entry (just below each diagonal entry) as 1, and all other entries
being 0. What is the Jordan form of A?
6. Let A be an n × n matrix, where the diagonal entries are 1, 2, . . . , n, from top
left to right bottom, the super-diagonal entries are all 1, and all other entries are
0. What is the Jordan form of A?
7. What is the Jordan form of the n × n matrix whose each row is equal to
[1, 2, · · · , n]?
8. Let λ be an eigenvalue of the n × n matrix A. Suppose for each k ∈ N, we know
the number m k . Show that for each j, both rank(A − λI ) j and null(A − λI ) j
are uniquely determined.
9. Prove that two matrices A, B ∈ Cn×n are similar iff they have the same eigen-
values, and for each eigenvalue λ, rank(A − λI )k = rank(B − λI )k for each
k ∈ N.
10. Show that two matrices A, B ∈ Cn×n are similar iff they have the same eigenval-
ues, and for each eigenvalue λ, null(A − λI )k = null(B − λI )k for each k ∈ N.

6.5 Singular Value Decomposition

Given an m × n matrix A with complex entries, there are two hermitian matrices
that can be constructed naturally from it, namely A∗ A and A A∗ . We wish to study
the eigenvalues and eigenvectors of these matrices and their relations to certain
parameters associated with A. We will see that these concerns yield a factorization
of A.
All eigenvalues of the hermitian matrix A∗ A ∈ Cn×n are real. If λ ∈ R is such an
eigenvalue with an associated eigenvector v ∈ Cn×1 , then A∗ Av = λv implies that

λv2 = λv∗ v = v∗ (λv) = v∗ A∗ Av = (Av)∗ (Av) = Av2 .

Since v > 0, we see that λ ≥ 0. The eigenvalues of A∗ A can thus be arranged
in a decreasing list

λ1 ≥ λ2 ≥ · · · ≥ λr > 0 = λr +1 = · · · = λn

for some r with 0 ≤ r ≤ n. Notice that λ1 , . . . , λr are all positive and the rest are all
equal to 0. In the following we relate this r with rank(A). Of course, we could have
considered A A∗ instead of A∗ A.
Let A ∈ Cm×n . Let λ1 ≥ · · · ≥ λn ≥ 0 be the n eigenvalues of A∗ A. The non-
negative square roots of these real numbers are called the singular values of A.
Conventionally, we denote the singular values of A by si . The eigenvalues of A∗ A
are then denoted by s12 ≥ · · · ≥ sn2 ≥ 0.

Theorem 6.13 Let A ∈ Cm×n . Then rank(A) = rank(A∗ A) = rank(A A∗ ) =


rank(A∗ ) = the number of positive singular values of A.
6.5 Singular Value Decomposition 143

Proof Let v ∈ Cn×1 . If Av = 0, then A∗ Av = 0. Conversely, if A∗ Av = 0, then


v∗ A∗ Av = 0. It implies that Av2 = 0, giving Av = 0. Therefore, N (A∗ A) =
N (A). It follows that null(A∗ A) = null(A). By the rank nullity theorem, we conclude
that rank(A∗ A) = rank(A).
Consider A∗ instead of A to obtain rank(A A∗ ) = rank((A∗ )∗ A∗ ) = rank(A∗ ).
Since rank(At ) = rank(A) = rank(A), rank(A∗ ) = rank(A).
Next, let s1 ≥ s2 ≥ · · · ≥ sr > 0 = sr +1 = · · · = sn be the singular values of A.
That is, there are exactly r number of positive singular values of A. By the spectral
theorem, the hermitian matrix A∗ A is unitarily diagonalizable. So, there exists a
unitary matrix Q such that

Q ∗ (A∗ A)Q = diag(s12 , . . . , sr2 , 0, . . . , 0).

Therefore, rank(A∗ A) is equal to the rank of the above diagonal matrix; and that is
equal to r. This completes the proof. 

We remark that the number of nonzero eigenvalues of a matrix, counting multi-


plicities, need not be same as the rank of the matrix. For instance, the matrix
 
0 1
0 0

has no nonzero eigenvalues, but its rank is 1.


Suppose λ > 0 is an eigenvalue of A∗ A with an associated eigenvector v. Then
A Av = λv implies (A A∗ )(Av) = λ(Av). Since λv = 0, Av = 0. Thus, λ is also an

eigenvalue of A A∗ with an associated eigenvector Av.


Similarly, if λ > 0 is an eigenvalue of A A∗ , then it follows that the same λ is an
eigenvalue of A∗ A. That is,
a positive real number is an eigenvalue of A∗ A iff it is an eigenvalue of A A∗ .

From Theorem 6.13, it follows that A and A∗ have the same r number of positive
singular values, where r = rank(A) = rank(A∗ ). Further, A has n − r number of
zero singular values, whereas A∗ has m − r number of zero singular values. In
addition, if A ∈ Cn×n is hermitian and has eigenvalues λ1 , . . . , λn , then its singular
values are |λ1 |, . . . , |λn |.
Analogous to the factorization of A∗ A, we have one for A itself.

Theorem 6.14 (SVD) Let A ∈ Cm×n be of rank r. Let s1 ≥ . . . ≥ sr be the posi-



S 0
tive singular values of A. Write S = diag(s1 , . . . , sr ) ∈ Cr ×r and = ∈
0 0
Cm×n . Then there exist unitary matrices P = [u 1 · · · u m ] ∈ Cm×m and Q=
[v1 · · · vn ] ∈ Cn×n such that the following are true:
(1) A = P Q ∗ = s1 u 1 v1∗ + · · · + sr u r vr∗ .
(2) (a) For 1 ≤ i ≤ r, A A∗ u i = si2 A A∗ u i ; and for r < i ≤ m, A A∗ u i = 0.
(b) For 1 ≤ j ≤ r, A∗ Av j = s 2j v j ; and for r < j ≤ n, A∗ Av j = 0.
144 6 Canonical Forms

(3) For 1 ≤ i ≤ r, u i = (si )−1 Avi and vi = (si )−1 A∗ u i .


(4) (a) {u 1 , . . . , u r } is an orthonormal basis of R(A).
(b) {v1 , . . . , vr } is an orthonormal basis of R(A∗ ).
(c) {vr +1 , . . . , vn } is an orthonormal basis of N (A).
(d) {u r +1 , . . . , u m } is an orthonormal basis of N (A∗ ).

Proof All positive singular values of A are s1 ≥ · · · ≥ sr . Thus, the eigenvalues of


A∗ A are s12 ≥ · · · ≥ sr2 , and 0 repeated n − r times. Since A∗ A ∈ Cn×n is hermitian,
there exists a unitary matrix Q ∈ Cn×n such that


∗ ∗ S2 0
Q A AQ = diag(s12 , . . . , sr2 , 0, . . . , 0) = ,
0 0

where the columns of Q are eigenvectors of A∗ A. Define


 
S −1 0
D = diag s1−1 , . . . , sr−1 , 1, . . . , 1 = .
0 In−r

Then D ∗ = D, and
     
S −1 0 S2 0 S −1 0 I 0
(AQ D)∗ (AQ D) = D ∗ (Q ∗ A∗ AQ)D = = r .
0 In−r 0 0 0 In−r 0 0

Therefore, the first r columns, say, u 1 , . . . , u r , of the m × n matrix AQ D form an


orthonormal set in Cm×1 ; and the other columns of AQ D are zero columns. Extend
this orthonormal set {u 1 , . . . , u r } to an orthonormal basis {u 1 , . . . , u r , u r +1 , . . . , u m }
for Cm×1 . Construct the matrices
     
P1 = u 1 · · · u r , P2 = u r +1 · · · u m , P = P1 P2 .
 
(1) We find that P ∈ Cm×m is unitary, and AQ D = P1 0 . We already have the
unitary matrix Q ∈ Cn×n . Hence
   
−1 ∗
  S 0 ∗
  S 0
A = (AQ D)D Q = P1 0 Q = P1 P2 Q∗ = P Q∗.
0 In−r 0 0

Also, we see that


⎡ ⎤
s1
⎢ s2 ⎥ ⎡v∗ ⎤
 ⎢
⎢ ..
⎥ 1
⎥ ⎣ .. ⎦
P Q = u1 · · · um ⎢ . ⎥ . = s1 u 1 v1∗ + · · · + sr u r vr∗ .
⎢ ⎥ ∗
⎣ sr ⎦ vn
0
6.5 Singular Value Decomposition 145

(2) (a) A A∗ P = P Q ∗ Q ∗ P ∗ P = ∗
P. The matrix ∗
is a diagonal
matrix with diagonal entries as s1 , . . . , sr , 0, . . . , 0. Therefore, A A∗ u i = si2 u i for
2 2

1 ≤ i ≤ r ; and A A∗ u i = 0 for i > r.


(b) As in (a), Q ∗ A∗ AQ = diag(s12 , . . . , sr2 , 0, . . . , 0) proves analogous facts about
the vectors v j .
(3) Let 1 ≤ i ≤ r. The vector u i is the ith column of AQ D. Thus

u i = AQ Dei = AQ(si )−1 ei = (si )−1 AQei = (si )−1 Avi .

Using this and (2a), we obtain

(si )−1 A∗ u i = (si )−2 A∗ Avi = (si )−2 (si )2 vi = vi .

(4) (a) For 1 ≤ i ≤ r, u i = (si )−1 Avi implies that u i ∈ R(A). The vectors u 1 , . . . , u r
are orthonormal and dim(R(A)) = r. Therefore, {u 1 , . . . , u r } is an orthonormal basis
of R(A).
(b) As in (a), {v1 , . . . , vr } is an orthonormal basis of R(A∗ ).
(c) Let r < j ≤ n. Now, A∗ Av j = 0 implies v∗j A∗ Av j = 0. So, Av j 2 = 0; or that
Av j = 0. Then v j ∈ N (A). But dim(N (A)) = n − r. Therefore, the n − r orthonor-
mal vectors vr +1 , . . . , vn form an orthonormal basis for N (A).
(d) As in (c), {u r +1 , . . . , u m } is an orthonormal basis for N (A∗ ). 
Theorem 6.14 (2) and (4) imply that the columns of P are eigenvectors of A A∗ ,
and the columns of Q are eigenvectors of A∗ A. Accordingly, the columns of P
are called the left singular vectors of A; and the columns of Q are called the right
singular vectors of A. Notice that computing both sets of left and right singular
vectors independently will not serve the purpose since they may not satisfy the
equations in Theorem 6.14 (3).
 
1 0 1 0
Example 6.8 To determine the SVD of the matrix A = , we compute
0 1 0 1
⎡ ⎤
  1 0 1 0
2 0 ⎢0 1 0 1⎥
A A∗ = , A A=⎢

⎣1
⎥.
0 2 0 1 0⎦
0 1 0 1

Since A A∗ is smaller, we compute its eigenvalues and eigenvectors. The eigenval-


ues of A A∗ with multiplicities are 2, 2; thus the eigenvalues of A∗ A are 2, 2, 0, 0.
Choosing simpler eigenvectors of A A∗ , we have

s12 = 2, u 1 = e1 ; s22 = 2, u 2 = e2 .

Here, u 1 and u 2 are the left singular vectors. The corresponding right singular vectors
are:
146 6 Canonical Forms
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 0
1 ∗ ⎢0 1⎥   ⎢0⎥ 1 ∗ ⎢ 1⎥
v1 = A u 1 = √2 ⎢
1
⎣1
⎥ 1 0 =

√ ⎢ ⎥ , v2 =
1
2 ⎣1⎦
A u2 = √ ⎢ ⎥.
1
2 ⎣0⎦
s1 0 s1
0 1 0 1

These eigenvectors are associated with the eigenvalues 2 and 2 of A∗ A. We need


the eigenvectors v3 and v4 associated with the remaining eigenvalues 0 and 0, which
should also form an orthonormal set along with v1 and v2 . Thus, we solve A∗ Ax = 0.
With x = [a, b, c, d]t , the equations are

a + c = 0 = b + d.

Two linearly independent solutions of these equations are obtained by setting a =


1, b = 0, c = −1, d = 0 and a = 0, b = 1, c = 0, d = −1. The corresponding
vectors are w3 = [1, 0 − 1, 0]t and w4 = [0, 1, 0, −1]t . We find that {v1 , v2 , w3 , w4 }
is an orthogonal set. We then orthonormalize w3 and w4 to obtain
⎡ ⎤ ⎡ ⎤
1 0
w3 ⎢ 0⎥ ⎢ 1⎥
v3 = = √12 ⎢ ⎥ , v4 = w4 = √1 ⎢ ⎥.
w3  ⎣ −1 ⎦ w4  2 ⎣ 0⎦
0 −1

We set P =√ [u 1 √u 2 ], Q = [v1 v2 v3 v4 ], and as the 2 × 4 matrix with


singular values 2, 2 on the 2 × 2 first block, and then other entries 0. We see that
⎡ ⎤
    √  1 0 1 0
1 0 1 0 1 0 2 √0 0 0 √1 ⎢ ⎥
⎢0 1 0 1⎥ = P Q ∗ . 
A= = ⎣ ⎦
0 1 0 1 0 1 0 2 0 0 2 1 0 −1 0
0 1 0 −1

In the product P Q ∗ , there are possibly many zero rows that do not contribute
to the end result. Thus some simplifications can be done in the SVD. Let A ∈ Cm×n
where m ≤ n. Suppose A = P Q ∗ is an SVD of A. Let the ith column of Q be
denoted by vi ∈ Cn×1 . Write
 
P1 = P, Q 1 = v1 · · · vm ∈ Cm×n , 1 = diag s1 , . . . , sr , 0, . . . , 0 ∈ Cm×m .

Notice that P1 is unitary and the m columns of Q 1 are orthonormal. In block form,
we have
     
Q = Q1 Q3 , = 1 0 , Q 3 = vm+1 · · · vn .

Then  
  Q ∗1
A=P Q ∗ = P1 0 = P1 ∗
1 Q1 for m ≤ n. (6.1)
1
Q ∗3
6.5 Singular Value Decomposition 147

Similarly, when m ≥ n, we may curtail P accordingly. That is, suppose the ith
column of P is denoted by u i ∈ Cm×1 . Write
 
P2 = u 1 · · · u n ∈ Cm×n , 2 = diag s1 , . . . , sr , 0, . . . , 0 ∈ Cn×n , Q 2 = Q.

Here, the n columns of P2 are orthonormal, and Q 2 is unitary. We write


 
   
P = P2 P3 , = 2
, P3 = u n+1 · · · u m .
0

Then
 

 
A=P Q = P2 P3 2
Q ∗2 = P2 ∗
2 Q2 for m ≥ n. (6.2)
0

The two forms of SVD in (6.1)–(6.2), one for m ≤ n and the other for m ≥ n
are called the thin SVD of A. Of course, for m = n, both the thin SVDs coincide
with the SVD. For a unified approach to the thin SVDs, take k = min{m, n}. Then a
matrix A ∈ Cm×n of rank r can be written as the product

A=P Q∗

where P ∈ Cm×k and Q ∈ Cn×k have orthonormal columns, and ∈ Ck×k is the
diagonal matrix diag(s1 , . . . , sr , 0, . . . , 0) with s1 , . . . , sr being the positive singular
values of A.
It is possible to simplify the thin SVDs further by deleting the zero rows. Observe
that in the product P ∗ AQ, the first r columns of P and the first r columns of Q
produce S; the other columns of P and of Q give the zero blocks. Thus taking
   
P̃ = u 1 · · · u r , Q̃ = v1 · · · vr ,

a simplified decomposition of A is given by

A = P̃ S Q̃ ∗ ,

where P̃ ∈ Cm×r and Q̃ ∈ Cr ×n are having orthonormal columns. Such a decompo-


sition is called the tight SVD of the matrix A. In the tight SVD, Each of the matrices
A, P̃, S and Q̃ ∗ is of rank r.
Write B = P̃ S and C = S Q̃ ∗ to obtain

A = B Q̃ ∗ = P̃ C,

where both B ∈ Cm×r and C ∈ Cr ×n are of rank r. It shows that each m × n matrix
of rank r can be written as a product of an m × r matrix of rank r and an r × n
matrix, also of rank r. We recognize it as the full rank factorization of A.
148 6 Canonical Forms

Example 6.9 Obtain SVD, tight SVD, and a full rank factorization of
⎡ ⎤
2 −1
A = ⎣−2 1⎦ .
4 −2
 
24 −12 √
Here, A∗ A = . It has eigenvalues 30 and 0. Thus s1 = 30. Notice
−12 6
that A A∗ is a 3 × 3 matrix with eigenvalues 30, 0 and 0. We see that r = rank(A) =
the number of positive singular values of A = 1.
For the eigenvalue 30, we solve the equation A∗ A[a, b]t = 30[a, b]t , that is,

24a − 12b = 30a, − 12a + 6b = 30b.

It has the solution a = −2, b = 1. So, a unit eigenvector of A∗ A corresponding


to the eigenvalue 30 is v1 = √15 [−2, 1]t .
For the eigenvalue 0, the equations are

24a − 12b = 0, − 12a + 6b = 0.

Thus a unit eigenvector orthogonal to v1 is v2 = √1 [1,


5
2]t . Then,
⎡ ⎤ ⎡ ⎤
2 −1  −2 √  −1
/ 5
u 1 = √130 Av1 = √130 ⎣ −2 1⎦ 1 √ = √1 ⎣ 1⎦ .
/ 5 6
4 −2 −2

Notice that u 1  = 1. We extend {u 1 } to an orthonormal basis of C3×1 . It is


⎧ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ −1 1 1 ⎬
u 1 = √16 ⎣ 1⎦ , u 2 = √12 ⎣1⎦ , u 3 = √13 ⎣−1⎦ .
⎩ ⎭
−2 0 1

Next, we take u 1 , u 2 , u 3 as the columns of P and v1 , v2 as the columns of Q to


obtain the SVD of A as
⎡ ⎤ ⎡ √ √ √ ⎤ ⎡√ ⎤
2 −1 −1/ 6 1/ 2 1/ 3 30 0  −2 √ 1 √ ∗

⎣−2 1⎦ = P Q ∗ = ⎣ 1/ 6 1/ 2 −1/ 3 ⎦ ⎣ 0 √ √ / 5 / 5
0⎦ 1 √ 2 √ .
√ √ / 5 / 5
4 −2 −2/ 6 0 1/ 3 0 0

For the tight SVD, we construct P̃ with its r columns as the the first r columns
of P, Q̃ with its r columns as the first r columns of Q, and S as the r × r block
consisting of first r singular values of A as the diagonal entries. With r = 1, we thus
have the tight SVD as
6.5 Singular Value Decomposition 149
⎡ ⎤ ⎡ √ ⎤
2 −1 −1/ 6  
√  −2/√5 ∗
⎣−2 1⎦ = P̃ S Q̃ ∗ = ⎣ 1/√6⎦ 30 √ .
√ 1/ 5
4 −2 −2/ 6

In the tight SVD, using associativity of matrix product, we get the full rank
factorizations as
⎡ ⎤ ⎡ √ ⎤ ⎡ √ ⎤
2 −1 −√5  −2 √ ∗ −1/ 6  √ ∗
−2/ 6
⎣−2 / 5 ⎣ 1/√6⎦
1⎦ = ⎣ 5 ⎦ √ = √ .
√ 1/ 5 √ 6
4 −2 −2 5 −2/ 6

It may be checked that the columns of Q are eigenvectors of A A∗ . 

A singular value decomposition of a matrix is not unique. For, orthonormal bases


that comprise the columns of P and Q can always be chosen differently. For instance,
by multiplying ±1 to an already constructed orthonormal basis, we may obtain
another.
Also, it is easy to see that when A ∈ Rm×n , the matrices P and Q can be chosen
to have real entries.
Singular value decomposition is the most important result for scientists and engi-
neers, perhaps, next to the theory of linear equations. It shows clearly the power of
eigenvalues and eigenvectors. The SVD in the summation form, in Theorem 6.14 (1),
looks like
A = s1 u 1 v1∗ + · · · + sr u r vr∗ .

Each matrix u i vi∗ is of rank 1. This means that if we know the first r singular values
of A and we know their corresponding left and right singular vectors, we know A
completely. This is particularly useful when A is a very large matrix of low rank. No
wonder, SVD is used in image processing, various compression algorithms, and in
principal components analysis.
Let A = P Q ∗ be an SVD of A ∈ Cm×n , and let x ∈ Cn×1 be a unit vector. Let
s1 ≥ · · · ≥ sr be the positive singular values of A. Write y = Q ∗ x = [α1 , . . . , αn ]t .
Then


n
y2 = |αi |2 = Q ∗ x2 = x ∗ Q Q ∗ x = x ∗ x = x2 = 1.
i=1
Ax2 = P Q ∗ x2 = x ∗ Q ∗ P ∗ P Q ∗ x = x ∗ Q 2
Q∗ x
r 
r
= y∗ 2
y= s 2j |α j |2 ≤ s12 |α j |2 ≤ s12 .
j=1 j=1

Also, for x = v1 , Av1 2 = s12 u 1 2 = s12 . Therefore, we conclude that

s1 = max{Ax : x ∈ Cn×1 , x = 1}.


150 6 Canonical Forms

That is, the first singular value s1 gives the maximum magnification that a vector
experiences under the linear transformation A. Similarly, from above workout it
follows that
r 
r
Ax =
2
s j |α j | ≥ sr
2 2 2
|α j |2 ≥ sr2 .
j=1 j=1

With x = vr , we have Avr 2 = sr2 u r 2 = sr2 . Hence,

sr = min{Ax : x ∈ Cn×1 , x = 1}.

That is, the minimum positive magnification is given by the smallest positive
singular value sr .
Notice that if x is a unit vector, then sr ≤ Ax ≤ s1 .

Exercises for Sect. 6.5


1. Let A ∈ Cm×n . Show that the positive singular values of A are also the positive
singular values of A∗ .
2. Compute the singular value decompositions of the following matrices:
⎡ ⎤ ⎡ ⎤
2 −2   1 2 2
2 1 2
(a) ⎣ 1 −1⎦ (b) (c) ⎣2 0 −5⎦
−2 1 2
−1 1 3 0 0
   
1 0 2 −1
3. Show that the matrices and are similar but they have different
1 1 1 0
singular values.
4. Prove that if λ1 , . . . , λn are the eigenvalues of an n × n hermitian
matrix, then its singular values are |λ1 |, . . . , |λn |.
5. Let A ∈ Cn×n and let λ ∈ C. Show that A − λI is invertible iff A∗ − λI is invert-
ible.
6. Let A ∈ Fn×n . If A has eigenvalues λ1 , . . . , λn and singular values s1 , . . . , sn ,
then show that |λ1 · · · λn | = s1 · · · sn .

6.6 Polar Decomposition

Square matrices behave like complex numbers in many ways. One such example
is a powerful representation of square matrices using a stretch and a rotation. This
mimics the polar representation of a complex number as z = r eiθ , where r is a non-
negative real number representing the stretch; and eiθ is a rotation. Similarly, a square
matrix can be written as a product of a positive semi-definite matrix, representing
the stretch, and a unitary matrix representing the rotation. We slightly generalize it
to any m × n matrix.
A hermitian matrix P ∈ Fn×n is called positive semidefinite iff x ∗ P x ≥ 0 for
each x ∈ Fn×1 . We use such a matrix in the following matrix factorization.
6.6 Polar Decomposition 151

Theorem 6.15 (Polar decomposition) Let A ∈ Cm×n . Then there exist positive semi-
definite matrices P ∈ Cm×m , Q ∈ Cn×n , and a matrix U ∈ Cm×n such that

A = PU = U Q,

where P 2 = A A∗ , Q 2 = A∗ A, and U satisfies the following:


(1) If m = n, then the n × n matrix U is unitary.
(2) If m < n, then the rows of U are orthonormal.
(3) If m > n, then the columns of U are orthonormal.

Proof Let A ∈ Cm×n be a matrix of rank r with positive singular values s1 ≥


· · · ≥ sr . Write k = min{m, n}. Let A = B E ∗ be the thin SVD of A, where
B ∈ Cm×k , E ∈ Cn×k have orthonormal columns, and ∈ Ck×k has first r diag-
onal entries as s1 , . . . , sr , all other entries 0. Since B ∗ B = E ∗ E = I, we have

A=B E ∗ = (B B ∗ )(B E ∗ ) = (B E ∗ )(E E ∗ ).

With U = B E ∗ , P = B B ∗ and Q = E E ∗ , we obtain the polar decompo-


sitions A = PU = U Q. Moreover,

P2 = B B∗ B B∗ = B E∗ E B ∗ = A A∗ ,
∗ ∗ ∗
Q 2
= E E E E =E B B E ∗ = A∗ A.

Notice that ∗ = . Thus P ∗ = (B B ∗ )∗ = B ∗ B ∗ = P. That is, P is hermi-


√ √
tian. Write D = diag( s1 , . . . , sr , 0, . . . , 0) ∈ Ck×k . Then D ∗ D = . For each
x ∈ Cm×1 ,

x∗ Px = x∗ B B ∗ x = x ∗ B D ∗ D B ∗ x = D B ∗ x2 ≥ 0.

Therefore, P is positive semi-definite. Similarly, Q is shown to be positive semi-


definite.
(1) If m = n, then both B and E are unitary, i.e., B ∗ B = B B ∗ = I = E ∗ E = E E ∗ .
Then U ∗ U = (B E ∗ )∗ B E ∗ = E B ∗ B E ∗ = E E ∗ = I, and UU ∗ = B E ∗ (B E ∗ )∗ =
B E ∗ E B ∗ = B B ∗ = I. So, U is unitary.
(2) If m < n, then k = m; B is a square matrix with orthonormal columns, thus uni-
tary; and E ∗ E = I. We have UU ∗ = B E ∗ E B ∗ = B B ∗ = I. Thus U has orthonor-
mal rows.
(3) If m > n, then k = n. We see that E is unitary, and B ∗ B = I. Therefore, U ∗ U =
E B ∗ B E ∗ = E E ∗ = I. That is, U has orthonormal columns. 

Recall that a thin SVD is obtained from an SVD of A = P Q ∗ by keeping the


first k columns of the larger of the two matrices P and Q, and then restricting to
152 6 Canonical Forms

its first k × k block, where k = min{m, n}. Thus, the polar decomposition of A may
be constructed directly from the SVD. It is as follows:
If A ∈ Cm×n has SVD as A = B D E ∗ , then A = PU = U Q, where
m = n: U = B E ∗, P = B D B∗, Q = E D E ∗.
m < n: U = B E 1∗ , P = B D1 B ∗ , Q = E 1 D1 E 1∗ .
m > n: U = B1 E ∗ , P = B1 D2 B1∗ , Q = E D2 E ∗ .
Here, E 1 is constructed from E by taking its first m columns; D1 is constructed
from D by taking its first m columns; B1 is constructed from B by taking its first n
columns; and D2 is constructed from D by taking its first n rows.
⎡ ⎤
2 −1
Example 6.10 Consider the matrix A = ⎣−2 1⎦ of Example 6.9. We had
4 −2
obtained its SVD as A = B D E ∗ , where
⎡ √ √ √ ⎤ ⎡√ ⎤
−1/ 6 1/ 2 1/ 3 30 0  √ √ 
⎣ 1/√6 √ √
−1/ 3 ⎦ ,
−2/ 5 1/ 5
B= 1/ 2 D=⎣ 0 0⎦ , E= √
1/ 5

2/ 5
.
√ √
−2/ 6 0 1/ 3 0 0

Here, A ∈ C3×2 . Thus, Theorem 6.15 (3) is applicable; see the discussion following
the proof of the theorem. We construct the matrices B1 by taking first two columns
of B, and D2 by taking first two rows of D, as in the following:
⎡ √ √ ⎤
−1/ 6 1/ 2 √ 
√ √ 30 0
B1 = ⎣ 1/ 6 1/ 2⎦ , D2 = .
√ 0 0
−2/ 6 0

Then
√ ⎤
⎡ ⎡ √ √ ⎤
−1 √3   2 + √3 −1 + 2√3
−2 1
U = B1 E ∗ = √16 ⎣ 1 3⎦ √15 = √130 ⎣−2 + 3 1 + 2 3⎦ ,
1 2
−2 0 4 −2
⎡ ⎤ ⎡ ⎤
−1 0   1 −1 2
√ −1 1 −2 √
P = B1 D2 B1∗ = 5 ⎣ 1 0⎦ √16 √ √ = √56 ⎣−1 1 −2⎦ ,
3 3 0
−2 0 2 −2 4
    √  
√ −2 0 −2 1 4 −2
Q = E D2 E ∗ = 6 √1 = √65 .
1 0 5 1 2 −2 1
As expected we find that
⎡ ⎤ ⎡ √ √ ⎤ ⎡ ⎤
√ 1 −1 2 2 + √3 −1 + 2√3 2 −1
PU = √5 ⎣−1 1 − 2⎦ √1 ⎣−2 + 3 1 + 2 3⎦ = ⎣−2 1⎦ = A.
6 30
2 −2 4 4 −2 4 −2
⎡ √ √ ⎤ ⎡ ⎤
2 + √3 −1 + 2√3 √   2 −1
4 −2
U Q = √130 ⎣−2 + 3 1 + 2 3⎦ √65 = ⎣−2 1⎦ = A. 
−2 1
4 −2 4 −2
6.6 Polar Decomposition 153

In A = PU = U Q, the matrices P and Q satisfy P 2 = A A∗ and Q 2 = A∗ A. If


A ∈ Cm×n , then A A∗ ∈ Cm×m and A∗ A ∈ Cn×n are hermitian matrices with eigen-
values as s12 , . . . , sr2 , 0, . . . , 0. The diagonalization of A A∗ yields m linearly inde-
pendent eigenvectors associated with these eigenvalues. If these eigenvectors are
taken as the columns (in that order) of a matrix C, then

A A∗ = C diag(s12 , . . . , sr2 , 0, . . . , 0) C ∗ , P = C diag(s1 , . . . , sr , 0, . . . , 0) C ∗ .

Similarly, the matrix Q is equal to M diag(s1 , . . . , sr , 0, . . . , 0) M ∗ , where M


consists of orthonormal eigenvectors of A∗ A corresponding to the eigenvalues
s12 , . . . , sr2 , 0, . . . , 0.
Now, using the matrix P as computed above, we can determine U so that A = PU.
Similarly, the matrix U in A = U Q can also be determined. In this approach, SVD
is not used; however, the two instances of U may differ since they depend on the
choices of orthonormal eigenvectors of A A∗ and A∗ A. If A is invertible, you would
end up with the same U.
Exercises for Sect. 6.6
1. Let A be an upper triangular matrix with distinct diagonal entries. Show that there
exists an upper triangular
 matrix that diagonalizes A.
1 −1
2. Let A = . Determine whether A and A2 are positive definite.
0 1
3. Determine the polar decompositions of the matrix A of Example 6.10 by diago-
nalizing A A∗ and A∗ A as mentioned in the text.
4. Let A ∈ Cm×n with m < n. Prove that A can be written as A = PU, where
P ∈ Cm×n , and U ∈ Cn×n is unitary.
5. Let A ∈ Cm×n with m > n. Prove that A can be written as A = U Q, where
U ∈ Cm×m is a unitary matrix and Q ∈ Cm×n .

6.7 Problems

1. Let P ∗ A P be the upper triangular matrix in Schur triangularization of A ∈ Fn×n .


If A has n distinct eigenvalues, then show that there exists an upper triangular
matrix Q such that P Q diagonalizes A.
2. If A ∈ Fn×n is both diagonalizable and invertible, then how do you compute its
inverse from its diagonalization?
3. Let A ∈ Fn×n be diagonalizable with eigenvalues ±1. Is A−1 = A?
4. How to diagonalize A∗ if diagonalization of A ∈ Fn×n is known?
5. Let A ∈ Rn×n have real eigenvalues n λ1 > λ2 > · · · > λn with corresponding
eigenvectors x1 , . . . , xn . Let x = i=1 αi xi for real numbers α1 , . . . , αn . Show
the following:
n
(a) Am x = i=1 αi λim xi (b) If λ1 = 1, then lim Am x = α1 x1 .
m→∞
154 6 Canonical Forms

6. Prove that if a normal matrix has only real eigenvalues, then it is hermitian.
Conclude that if a real normal matrix has only real eigenvalues, then it is real
symmetric.
7. Let A ∈ Fn×n have distinct eigenvalues λ1 , . . . , λk . For 1 ≤ j ≤ k, let the lin-
i
early independent eigenvectors associated with λ j be v1j , . . . , v jj . Prove that the
set {v11 , . . . , v1i1 , . . . , vk1 , . . . , vkik } is linearly independent. [Hint: See the proof of
Theorem 5.3.]
8. Let A = [ai j ] ∈ F3×3 be such that a31 = 0 and a32 = 0. Show that each eigen-
value of A has geometric multiplicity 1.
9. Suppose A ∈ F4×4 has an eigenvalue λ with algebraic multiplicity 3, and
rank(A − λI ) = 1. Is A diagonalizable?
10. Show that there exists only one n × n diagonalizable matrix with an eigenvalue
λ of algebraic multiplicity n.
11. Show that a nonzero nilpotent matrix is never diagonalizable.
[Hint: A = 0 but Am = 0 for some m ≥ 2.]
12. Let A ∈ Fn×n and let P −1 A P be a diagonal matrix. Show that the columns of
P that are eigenvectors associated with nonzero eigenvalues of A form a basis
for R(A).
13. Let A be a diagonalizable matrix. Show that the number of nonzero eigenvalues
of A is equal to rank(A).
14. Construct non-diagonalizable matrices A and B satisfying
(a) rank(A) is equal to the number of nonzero eigenvalues of A;
(b) rank(B) is not equal to the number of nonzero eigenvalues of B.
15. Let x, y ∈ Fn×1 and let A = x y ∗ . Show the following:
(a) A has an eigenvalue y ∗ x with an associated eigenvector x.
(b) 0 is an eigenvalue of A with geometric multiplicity at least n − 1.
(c) If y ∗ x = 0, then A is diagonalizable.
16. Using the Jordan form of a matrix show that a matrix A is diagonalizable iff
for each eigenvalue of A, its geometric multiplicity is equal to its algebraic
multiplicity.
17. Let A ∈ Cn×n have an eigenvalue λ with algebraic multiplicity m. Prove that
null((A − λI )m ) = m.
18. Let λ be an eigenvalue of a matrix A ∈ Cn×n having algebraic multiplicity m.
Prove that for each k ∈ N, if null((A − λI )k ) < m, then null((A − λI ))k <
null((A − λI )k+1 .
[Hint: Show that N (A − λI )i ⊆ N (A − λI )i+1 . Then use Exercise 17.]
19. Let λ be an eigenvalue of a matrix A and let J be the Jordan form of A. Prove
that the number of Jordan blocks with diagonal entry λ in J is the geometric
multiplicity of λ.
20. Let A be a hermitian n × n matrix with eigenvalues λ1 , . . . , λn . Show
n that there
exist an orthonormal set {x1 , . . . , xn } in Fn×1 such that x ∗ Ax = i=1 λi |x ∗ xi |2
for each x ∈ F .
n×1
6.7 Problems 155

21. Let A ∈ Rn×n . Show that A At and At A are similar matrices.


22. Let A and B be hermitian matrices of the same order. Are the following state-
ments true?
(a) All eigenvalues of AB are real.
(b) All eigenvalues of AB A are real.
23. Let n > 1 and let u ∈ Fn×1 be a unit vector. Let H = I − 2uu ∗ . Show the
following:
(a) H is both hermitian and unitary; thus H −1 = H.
(b) If λ and μ are two distinct eigenvalues of H, then |λ − μ| = 2.
(c) H u = −u.
(d) The trace of H is n − 2.
(e) If v = 0 and v ⊥ u, then H v = v.
(f) The eigenvalue 1 has algebraic multiplicity n − 1.
24. Let A ∈ Cm×n be a matrix of rank r. Let s1 ≥ · · · ≥ sr be the positive sin-
gular values of A. Let A = P 1 Q ∗ be a singular
 value decomposition of
S 0
A, with S = diag(s1 , . . . , sr ), and 1 = ∈ Cm×n . Define matrices
0 0
 −1 
S 0
2 = ∈ Cn×m and A† = Q 2 P ∗ . Prove the following:
0 0
(a) The matrix A† satisfies the following properties:

(A A† )∗ = A A† , (A† A)∗ = A† A, A A† A = A, A† A A† = A† .

(b) There exists a unique matrix A† ∈ Fn×m satisfying the four equations men-
tioned in (a). The matrix A† is called the generalized inverse of A.
(c) For any b ∈ Fm×1 , A† b is the least squares solution of Ax = b.
25. Show that if s is a singular value of a matrix A, then there exists a nonzero
vector x such that Ax = sx.
26. Let A = P Q t be the SVD of a real n × n matrix. Let u i be the ith column of
P, and let vi be the ith column of Q. Define the matrix B and the vectors xi , yi
as follows:
     
0 At v −vi
B= , xi = i , yi = for 1 ≤ i ≤ n.
A 0 ui ui

Show that xi and yi are eigenvectors of B. How are the eigenvalues of B related
to the singular values of A?
27. Derive the polar decomposition from the SVD. Also, derive singular value
decomposition from the polar decomposition.
28. A positive definite matrix is a hermitian matrix such that for each x = 0, x ∗ Ax >
0. Show that a hermitian matrix is positive definite iff all its eigenvalues are
positive.
156 6 Canonical Forms

29. Show that the square of a real symmetric invertible matrix is positive definite.
30. Show that if A is positive definite, then so is A−1 . Give an example of a 2 × 2
invertible matrix which is not positive definite.
31. Show that A∗ A is positive semi-definite for any A ∈ Fm×n . Give an example of
a matrix A where A∗ A is not positive definite.
32. Show that if Q is unitary and A is positive definite, then Q AQ ∗ is positive
definite.
33. For a matrix A ∈ Fn×n , the principal submatrices are obtained by deleting its
last r rows and last r columns for r = 0, 1, . . . , n − 1. Show that all principal
submatrices of a positive definite matrix
⎡ are positive
⎤ definite. Further, verify that
1 1 −3
all principal submatrices of A = ⎣ 1 1 −3 ⎦ have non-negative determi-
−3 −3 5
nants but A is not positive semi-definite.
34. Let A be a real symmetric matrix. Show that the following are equivalent:
(a) A is positive definite.
(b) All principal submatrices of A have positive determinant.
(c) A can be reduced to an upper triangular form using only elementary row
operations of Type3, where all pivots are positive.
(d) A = U t U, where U is upper triangular with positive diagonal entries.
(e) A = B t B for some invertible matrix B.
35. Let A be a real symmetric positive definite n × n matrix. Show the following:
(a) All diagonal entries of A are positive.
(b) For any invertible n × n matrix P, P t A P is positive definite.
(c) There exists an n × n orthogonal matrix Q such that A = Q t Q.
(d) There exist unique n × n matrices U and D where U is upper triangular
with all diagonal entries 1, and D is a diagonal matrix with positive entries
on the diagonal such that A = U t DU.
(e) Cholesky factorization : There exists a unique upper triangular matrix with
positive diagonal entries such that A = U t U.
Chapter 7
Norms of Matrices

7.1 Norms

Recall that the norm of a vector is the non-negative square root of the inner product of
a vector with itself. Norms give an idea on the length of a vector. We wish to generalize
on this theme so that we may be able to measure the length of a vector without
resorting to an inner product. We keep the essential properties that are commonly
associated with the length.
Let V be a subspace of Fn . A norm on V is a function from V to R which we
denote by  · , satisfying the following properties:
1. For each v ∈ V, v ≥ 0.
2. For each v ∈ V, v = 0 iff v = 0.
3. For each v ∈ V and for each α ∈ F, αv = |α| v.
4. For all u, v ∈ V, u + v ≤ u + v.
Once a norm is defined on V, we call it a normed linear space. Though norms can
be defined on any vector space, we require only subspaces of Fn . In what follows, a
finite dimensional normed linear space V will mean a subspace of some Fn in which
a norm  ·  has been defined.
Recall that Property (4) of a norm is called the triangle inequality. As we had
seen earlier, in R2 ,
 1/2
(a, b) = |a|2 + |b|2 for a, b ∈ R

defines a norm. This norm comes from the usual inner product on R2 . Some of the
useful norms on Fn are discussed in the following example.

Example 7.1 Let V = Fn . Let v = (a1 , . . . , an ) ∈ Fn .


1. The function  · ∞ : V → R given by v∞ = max{|a1 |, . . . , |an |} defines a
norm on V. It is called the ∞-norm or the Cartesian norm.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 157
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7_7
158 7 Norms of Matrices

2. The function  · 1 : V → R given by v1 = |a1 | + · · · + |an | defines a norm


on V. It is called the 1-norm or the taxicab norm.
 1/2
3. The function  · 2 : V → R given by v2 = |a1 |2 + · · · + |an |2 is a norm.
This norm is called the 2-norm or the Euclidean norm. In matrix form, it may be
written as follows:
√ √
for v ∈ Fn×1 , v2 = v ∗ v ; and for v ∈ F1×n , v2 = vv ∗ .

4. Let p > 1. Then, the function  ·  p : V → R defined by


 1/ p
v p = |a1 | p + · · · + |an | p

is a norm; it is called the p-norm.




Consider the ∞-norm on F2 . With x = (1, 0), y = (0, 1), we see that

(1, 0) + (0, 1)2∞ + (1, 0) − (0, 1)2∞ = (max{1, 1})2 + (max{1, 1})2 = 2.
  
2 (1, 0)2∞ + (0, 1)2∞ ) = 2 (max{1, 0})2 + (max{0, 1})2 = 4.

Hence, there exist x, y ∈ Fn such that


 
x + y2∞ + x − y2∞ = 2 x2∞ + y2∞ .

This violates the parallelogram law. Therefore, the ∞-norm does not come from
an inner product. Similarly, it is easy to check that the 1-norm does not come from
an inner product.
The normed linear space Fn with the Euclidean norm is different from Fn with
the ∞-norm.
Notice that a norm behaves like the absolute value function in R or C. With this
analogy, we see that the reverse triangle inequality holds for norms also. It is as
follows. Let x, y ∈ V, a normed linear space. Then,

x − y = (x − y) + y − y ≤ x − y + y − y = x − y.

Similarly, y − x ≤ y − x = x − y. Therefore,


 
 x − y  ≤ x − y for all x, y ∈ V.

This inequality becomes helpful in showing that the norm is a continuous func-
tional. Recall that a functional on a subspace V of Fn is a function that maps vectors
in V to the scalars in F.
Any functional f : V → R is continuous at v ∈ V iff for each ε > 0, there exists
a δ > 0 such that for each x ∈ V, if x − v < δ then | f (x) − f (v)| < ε.
7.1 Norms 159

To show that  ·  : V → R is continuous,  let ε > 0. We


 take δ = ε and then verify
the requirements. So, if x − v < ε, then  x − v  ≤ x − v < ε. Therefore,
the norm  ·  is a continuous function.
Sometimes an estimate becomes easy by using a particular norm rather than the
other. On Fn , the following relations hold between the ∞-norm, 1-norm, and 2-norm:

v∞ ≤ v2 ≤ v1 , v1 ≤ n v2 ≤ n v∞ .

You should be able to prove these facts on your own.


In fact, a generalization of the above inequalities exists. Its proof uses a fact from
analysis that any continuous function from the closed unit sphere to R attains its
minimum and maximum. If V is a normed linear space, then the closed unit sphere
in V is defined as
S = {v ∈ V : v = 1}.

We use this fact in proving the following theorem.

Theorem 7.1 Let V be a subspace of Fn . Let  · a and  · b be two norms on V


and let v ∈ V. Then, there exist positive constants α and β independent of v such
that va ≤ αvb and vb ≤ βva .

Proof Let Sb = {x ∈ V : xb = 1} be the unit sphere in V with respect to the norm
 · b . Since  · a is a continuous function, it attains a maximum on Sb . So, let
 
α = max xa : x ∈ Sb .

Then, for each y ∈ Sb , we have ya ≤ α.


Let v ∈ V, v = 0. Take y = v/vb . Now, v ∈ Sb . Then,
 
va = vb y a = vb ya ≤ αvb .

Similarly, considering the continuous function  · b on the closed sphere Sa =


{x ∈ V : xa = 1}, we obtain the positive constant
 
β = max xb : x ∈ Sa

so that for each z ∈ Sa , zb ≤ β. Then, with z = v/va , we have


 
vb = va z b = va zb ≤ βva .

If v = 0, then clearly both the inequalities hold with α = 1 = β. 

Whenever the conclusion of Theorem 7.1 holds for two norms  · a and  · b , we
say that these two norms are equivalent. We thus see that on any subspace of Fn , any
two norms are equivalent. We will use the equivalence of norms later for defining a
160 7 Norms of Matrices

particular type of norms for matrices. We remark that on infinite dimensional normed
linear spaces, any two norms need not be equivalent.
Exercises for Sect. 7.1

1. Show that the 1-norm on Fn does not come from an inner product.
2. Let p ∈ N. Show that the p-norm is indeed a norm.
3. Let v ∈ Fn . Show the following inequalities:

(a) v∞ ≤ v2 ≤ v1 (b) v1 ≤ n v2 ≤ n v∞

7.2 Matrix Norms

Norms provide a way to quantify the vectors. It is easy to verify that the addition of
m × n matrices and multiplying a matrix with a scalar satisfy the properties required
of a vector space. Thus, Fm×n is a vector space. We can view Fm×n as a normed linear
space by providing a norm on it.

Example 7.2 The following functions  ·  : Fm×n → R define norms on Fm×n . Let
A = [ai j ] ∈ Fm×n .
 
1. Ac = max |ai j | : 1 ≤ i ≤ m, 1 ≤ j ≤ n .
It is called
m the 
Cartesian norm on matrices.
n
2. At = i=1 j=1 |ai j |.
It is called
 m n norm2 on
the taxicab 1/2
matrices.
3. A F = i=1 |a
j=1 i j | = (tr(A∗ A))1/2 .
It is called the Frobenius norm on matrices.


Notice that the names of the norms in Example 7.2 follow the pattern in
Example 7.1. However, the notation uses the subscripts c, t, F instead of ∞, 1, 2.
The reason is that we are reserving the latter notation for some other matrix norms,
which we will discuss soon.
For matrices, it will be especially useful to have such a norm which satisfies
Av ≤ A v. It may quite well happen that an arbitrary vector norm and an
arbitrary matrix norm may not satisfy this property. Thus, given a vector norm, we
require to define a corresponding matrix norm so that such a property is satisfied. In
order to do that, we will first prove a result.

Theorem 7.2 Let A ∈ Fm×n . Let  · n and  · m be norms on Fn×1 and Fm×1 ,
Avm
respectively. Then, : v ∈ Fn×1 , v = 0 is a bounded subset of R.
vn
Proof Clearly, 0 is a lower bound for the given set. We need to show that the given
set has an upper bound.
Let {e1 , .
. . , en } be the standard basis of Fn×1 . For each i, Aei m is a real number.
n
Write α = i=1 Aei m .
7.2 Matrix Norms 161

vn ∈ F , v = 0. We have unique scalars β1 ,. . . , βn not all zero such that
n×1
Let 
v = i=1 βi ei . Then, v∞ = max |βi | : 1 ≤ i ≤ n . And,

 n   n  n
   
Avm = A βi ei  =  βi Aei  = |βi | Aei m
m m
i=1 i=1 i=1
n
≤ v∞ Aei m = αv∞ .
i=1

Consider the norms  · ∞ and  · n on Fn×1 . Due to Theorem 7.1, there exists
a positive constant γ such that v∞ ≤ γ vn ; the constant γ does not depend on
particular vector v. Then, it follows that Avm ≤ αγ vn . That is, for each nonzero
vector v ∈ Fn×1 ,
Avm
≤ αγ .
vn

Therefore, the given set is bounded above by αγ . Also, the set is bounded below
by 0. 

The axiom of completeness of R asserts that every nonempty bounded subset of


R has a least upper bound (lub) and a greatest lower bound (glb) in R. Thus, the least
upper bound of the set in Theorem 7.2 is a real number. Using this we define a type
of norm on matrices.
Let  · m and  · n be norms on Fm×1 and Fn×1 , respectively. The norm on Fm×n
given by

Avm
Am,n = lub : v ∈ Fn×1 , v = 0 for A ∈ Fm×n
vn

is called the matrix norm induced by the vector norms  · m and  · n .


We use the phrase induced norm for matrices, for short. Verify that the induced
norm is, in fact, a norm on Fm×n .
For square matrices, the induced norm takes an alternate form. For, suppose  · 
is a norm on Fn×1 . The induced norm on Fn×n is then given by

Av  
A = lub : v ∈ Fn×1 , v = 0 = lub Ax : x ∈ Fn×1 , x = 1 .
v

The induced norms on Fn×n satisfy the desired properties with respect to the
product of a vector with a matrix and also that of a matrix with another.

Theorem 7.3 Let  · m ,  · k , and  · n be norms on Fm×1 , Fk×1 , and Fn×1 ,


respectively. Let  · m,k and  · k,n be the corresponding induced norms on Fm×k
and Fk×n , respectively. Let A ∈ Fm×k , B ∈ Fk×n , and let v ∈ Fk×1 . Then,
Avm ≤ Am,k vk and ABm,n ≤ Am,k Bk,n .
162 7 Norms of Matrices

Proof To keep the notation simple, let us write all the norms involved as  · ; the
subscripts may be supplied appropriately.
If v = 0, then Av = 0 = v. If v = 0, then

Av Av
≤ lub : v ∈ Fk×1 , v = 0 = A.
v v

Hence, Av ≤ A v.


For the second inequality, first, suppose that Bx = 0 for any x = 0 in Fn×1 . Then,

ABx
AB = lub : x ∈ Fn×1 , x = 0
x
ABx Bx
= lub : x ∈ Fn×1 , x = 0, Bx = 0
Bx x
ABx Bx
≤ lub : x ∈ Fn×1 , Bx = 0 lub : x ∈ Fn×1 , x = 0
Bx x
= A B.

Next, if Bx = 0 for some x = 0, then ABx = 0. Thus, in the first line of the
above calculation, restricting to the set {x : Bx = 0} will not change the least upper
bound. Further, if Bx = 0 for each x, then B = 0 = AB. 

A matrix norm that satisfies the property AB ≤ A B for all matrices for
which the product AB is well defined is called a sub-multiplicative norm. Thus, the
induced norm on matrices is sub-multiplicative.

Example 7.3 Define the function  · ∞ : Fm×n → R by


 
A∞ = max (|ai1 | + |ai2 | + · · · + |ain |) : 1 ≤ i ≤ m for A = [ai j ] ∈ Fm×n .

Then,  · ∞ is a norm on Fm×n . It is called the maximum absolute row sum norm,
or the row sum norm, for short.
The row sum norm on matrices is induced by the ∞-norm on vectors. To see this,
suppose that
 · ∞ is the ∞-norm on the spaces Fn×1 and Fm×1 ,
 ·  is the norm on matrices induced by the vector norm  · ∞ ,
 · ∞ is the row sum norm on matrices, as defined above,
A = [ai j ] ∈ Fm×n , and v = (b1 , . . . , bn )t ∈ Fn×1 , v = 0.
Then (Write down in full and multiply.)
7.2 Matrix Norms 163

 n   n 
   
Av∞ = max  a1 j b j , . . . ,  am j b j 
j=1 j=1
 n   n  
   
≤ max  a1 j  |b j |, . . . ,  am j  |b j |
j=1 j=1
 n   n 
   
≤ max  a1 j , . . . ,  am j  v∞
j=1 j=1
≤ A∞ v∞ .

Av∞
That is, for each nonzero vector v ∈ Fn×1 , ≤ A∞ . Then,
v∞

Av∞
A = lub : v ∈ Fn×1 , v = 0 ≤ A∞ . (7.1)
v∞

To show that A = A∞ , we construct a vector v ∈ Fn×1 such that


Av∞ Ae1 ∞
= A∞ . If A = 0, then clearly, = 0 = A∞ . So, suppose A =
v∞ e1 ∞
0. Now, the maximum of the sums of absolute values of entries in any row of A occurs
at some row. Choose one such row index, say, k. Take the vector u = (c1 , . . . , cn )t ,
where 
ak j /|ak j | if ak j = 0
cj =
0 if ak j = 0.

Then, |c j | = 0 when ak j = 0; otherwise, |c j | = 1. Notice that since A = 0, there


exists at least one j such that c j = 1. Then, u∞ = 1 and Au∞ = A∞ .
Using A as the lub of the set as written in (7.1), we have

Au∞
A ≥ = Au∞ = A∞ .
u∞

Therefore, with (7.1), we conclude that A = A∞ . 

Example 7.4 Define the function  · 1 : Fm×n → R by


 
A1 = max (|a1 j | + |a2 j | + · · · + |am j |) : 1 ≤ j ≤ n for A = [ai j ] ∈ Fm×n .

Then,  · 1 is a norm on Fm×n . It is called the maximum absolute column sum


norm or the column sum norm, for short.
The column sum norm is induced by the 1-norm of vectors. To see this, sup-
pose A = [ai j ] ∈ Fm×n and v = (b1 , . . . , bn )t ∈ Fn×1 . Denote by A1 , . . . , An , the
columns of A. Write the matrix norm induced by the vector norm  · 1 as  · .
Using the triangle inequality, we obtain
164 7 Norms of Matrices

 n  n
 
  b j A j 
Av1 =  bj Aj ≤ 1
1
j=1 j=1
n
    n
= |b j |  A j 1 ≤ max A j 1 : 1 ≤ j ≤ n |b j |
j=1 j=1
 
= max A j 1 : 1 ≤ j ≤ n v1 = A1 v1 .

That is, for each nonzero vector v ∈ Fn×1 ,

Av1  
≤ max A j 1 : 1 ≤ j ≤ n = A1 .
v1

It shows that A ≤ A1 .


For showing equality, suppose the maximum of the sums of absolute values of
entries in any column occurs at the kth column. Again, this k may not be unique;
but we choose one among the columns and fix such a column index as k. That is,
Ak 1 = A1 . Now, ek 1 = 1 and

Aek 1
A ≥ = Ak 1 = A1 .
ek 1

Therefore, A = A1 . 

Example 7.5 Let A ∈ Fm×n . Let  · 2 denote the matrix norm on Fm×n induced by
the Euclidean norm or the 2-norm  · 2 on vectors. Then,

Av22 v ∗ A∗ Av
A22 = lub : v ∈ Fn×1
, v = 0 = lub : v ∈ Fn×1 , v = 0 .
v22 v∗v

The matrix A∗ A is hermitian. Due to the spectral theorem, it is unitarily diagonal-


izable. Its eigenvalues can be ordered as s12 ≥ · · · ≥ sn2 ≥ 0. So, let B = {v1 , . . . , vn }
be an orthonormal basis of Fn×1 , where v j is an eigenvector of A∗ A corresponding
 to
the eigenvalue s 2j . Let v ∈ Fn×1 , v = 0. We have scalars α j such that v = nj=1 α j v j .
Using orthonormality of v j s, we see that

n n n
v∗v = α i α j vi∗ v j = |αi |2 .
i=1 j=1 i=1
n n n n n
v ∗ A∗ Av = α i vi∗ s 2j α j v j = α i α j s 2j v ∗j v j = |αi |2 si2 .
i=1 j=1 i=1 j=1 i=1
∗ ∗
n n
v A Av i=1 |αi | si i=1 |αi | (s1 − si )
2 2 2 2 2
s1 −
2
= s 2
−  n =  n ≥ 0.
v∗v 1
i=1 |αi |
2
i=1 |αi |
2
7.2 Matrix Norms 165

This is true for all v = 0. In particular, for v = v1 , we have α1 = 1 and αi = 0


for each i > 1. We thus obtain

v1∗ A∗ Av1 v ∗ s 2 v1
s12 − ∗ = 1 ∗1 = 0.
v1 v1 v1 v1

Therefore,
v ∗ A∗ Av
s12 = lub : v ∈ Fn×1 , v = 0 .
v∗v

It shows that A2 = s1 is the required induced norm, where s1 is the largest
singular value of A. This induced norm is called the 2-norm and also the spectral
norm on matrices. 
v ∗ Av
In general, the quotient ρA (v) = ∗ is called the Rayleigh quotient of the
v v
matrix A with the nonzero vector v. The Rayleigh quotients ρA (v j ) for eigenvectors v j
give the corresponding eigenvalues of A. It comes of help in computing an eigenvalue
if the associated eigenvector is known. It can be shown that the Rayleigh quotient
of a hermitian matrix with any nonzero vector is a real number. Moreover, if a
hermitian matrix A has eigenvalues λ1 ≥ · · · ≥ λn , then λ1 ≥ ρA (v) ≥ λn for any
nonzero vector v.
We see that the induced norms on Fm×n with respect to the norms  · ∞ ,  · 1 ,
and  · 2 on both Fn×1 and Fm×1 are the row sum norm  · ∞ , the column sum
norm  · 1 , and the spectral norm  · 2 , respectively.
I (v)
If  ·  is any induced norm on Fn×n , then = 1 says that I  = 1.
v
For n > 1, the taxicab norm I t = n; so, it is not an induced norm.
The Cartesian norm is not sub-multiplicative. For instance, take A and B as
the n × n matrix with each of its entries equal to 1. Then, Ac = Bc = 1 and
ABc = n. Thus, it is not an induced norm for n > 1. √
If n > 1, then the Frobenius norm of the identity is I  F = n > 1. Thus, the
Frobenius norm is not induced by any vector norm. However, it satisfies the sub-
multiplicative property. For, Cauchy–Swartz inequality implies that

n n  n 2 n n n n
 
AB2F =  aik bk j  ≤ |aik | 2
|bk j |2
i=1 j=1 k=1 i=1 j=1 k=1 k=1
n n n n
= |aik |2 |bk j |2 = A2F B2F .
i=1 k=1 j=1 k=1

The spectral norm and the Frobenius norms are mostly used in applications since
both of them satisfy the sub-multiplicative property. The other norms are sometimes
used for deriving easy estimates due to their computational simplicity.
166 7 Norms of Matrices

Exercises for Sect. 7.2

1. Let x ∈ R2 . Show that x2 = 1 iff x = (cos θ, sin θ ) for some θ ∈ R.


2. Prove that the induced norm as defined in the text is indeed a norm. That is, it
satisfies the required four properties of a norm.
3. Let  ·  be a norm on Fn×1 . Let A ∈ Fn×n . Prove that

Av  
lub : v ∈ Fn×1 , v = 0 = lub Ax : x ∈ Fn×1 , x = 1 .
v

7.3 Contraction Mapping

We wish to discuss the use of norms in a special circumstance. Suppose it is required


to solve the equation x 2 − x − 1 = 0 numerically, given that it has a root between 1
and 2. (Pretend that we do not know the√
formula for solving a quadratic equation.) We
may then rewrite the equation
√ as x = 1 + x. As an initial guess, we take x0 = 1.
Using the iteration xn+1 = 1 + xn , we find that

  
√ √ √
x0 = 1, x1 = 2, x2 = 1+ 2, x3 = 1+ 1 + 2, . . . .

We compute the absolute difference between the successive approximants:



√ √ √
|x1 − x0 | = 2 − 1, |x2 − x1 | = 1 + 2 − 2,
  
√ √
|x3 − x2 | = 1 + 1 + 2 − 1 + 2, . . .

Those seem to decrease. So, we conjecture that this iteration can be used to
approximate the root of the equation that lies between 1 and 2. Our intuition relies
√ {xn } converges to a real number x, then the limit x
on the fact that if the sequence
will satisfy the equation x = 1 + x.
Moreover, we require the successive approximants to come closer to the root, or
at least, the difference between the successive approximants decreases to 0. This
requirement
√ may put some restrictions on the function we use for iteration, such as
f (x) = 1 + x above.
Let S be a nonempty subset of a normed linear space V. Let f : S → S be any
function. We say that v ∈ S is a fixed point of f iff f (v) = v. The function f :
S → S is called a contraction mapping or a contraction iff there exists a positive
number c < 1 such that

 f (u) − f (v) ≤ c u − v for all u, v ∈ S.


7.3 Contraction Mapping 167

Example 7.6 1. Let V = R. Let√S be the closed interval {x ∈ R : 1 ≤ x ≤ 2}.


Define f : S → R √ by f (x) = 1√ + x.
If 1 ≤ x ≤ 2, then 2 ≤ √ f (x) ≤ 3. That is, f : S → S. If a is a fixed point of
f, then it satisfies a = 1 + a or a 2 − a − 1 = 0.
Now, the derivative of f is continuous on S, and it satisfies
 
 1  1
| f (x)| =  √  ≤ √ < 1.
2 1+x 2 3

Then, for all x, y ∈ S,

   1
| f (x) − f (y)| ≤ max | f (x)| : x ∈ S |x − y| ≤ √ |x − y|.
2 3

Therefore, f is a contraction.
2. Let V = R, and S = R also. Define f : S → R by f (x) = x 2 − 1. Its fixed point
a satisfies a = a 2 − 1 or a 2 − a − 1 = 0.
For x = 2, y = 3, |x − y| = 1, | f (x) − f (y)| = |22 − 1 − (32 − 1)| = 5. Thus,

| f (x) − f (y)|  c |x − y| for any c < 1.

Hence, f is not a contraction.


3. Let V = Rn×1 for a fixed n > 1. Let S = V. We use the 2-norm v2 on V.
Define f : S → S by
 
f (v) = Av, A = diag 0, n1 , n1 . . . , n1 , for v ∈ S.

Then, v = 0 is a fixed point of f.


Notice that A2 = the largest singular value of A = 1
n
< 1. Then,

 f (u) − f (v)2 = Au − Av2 ≤ A2 u − v2 = n1 u − v for u, v ∈ S.

Hence, f is a contraction.
4. With S = V = Rn×1 and f (v) = v defined on S, we see that each vector v ∈ Rn×1
is a fixed point of f.
For any norm on V, we have  f (u) − f (v) = u − v. Therefore, f is not a
contraction.


We will show that a fixed point can be approximated by using a contraction in an


iterative way. For a contraction f : S → S, we will use the fixed-point iteration

x0 , xn+1 = f (xn ) for n ≥ 0


168 7 Norms of Matrices

Here, x0 ∈ S is chosen initially, and we say that the iteration function is f. This
defines a sequence {xn }∞ n=0 in S. We require this sequence to converge to a point in
S. In this regard, we quote some relevant notions and facts from analysis.
Let S be a subset of a normed linear space V. We say that a sequence {yn }∞ n=0 of
vectors yn is in S iff yn ∈ S for each n.
A sequence {yn }∞n=0 in S is said to converge to a vector y ∈ V iff for each ε > 0,
there exists a natural number N such that if m > N , then ym − y < ε.
In such a case, the vector y ∈ V is called a limit of the (convergent)
sequence. If for all convergent sequences the corresponding limit vector happens
to be in S, then S is said to be a closed subset of V.
Further, a sequence {yn }∞
n=0 in S is called a Cauchy sequence iff for each ε > 0,
there exists a natural number N such that if m>N and k > 0, then ym+k − ym <ε.
In case, V is a finite dimensional normed linear space, each Cauchy
sequence in S is convergent; however, the limit vector may not be in S. For our
purpose, we then require S to be a closed subset of a finite dimensional normed
linear space V so that each Cauchy sequence in S will have its limit vector in S.
With this little background from analysis, we essentially show that the fixed-
point iteration with a contraction map defines a Cauchy sequence. But we rephrase
it keeping its applications in mind.

Theorem 7.4 (Contraction mapping principle) Let S be a nonempty closed subset


of a finite dimensional normed linear space V. If f : S → S is a contraction, then
f has a unique fixed point in S.

Proof Denote the norm on V as  · . Let f : S → S be a contraction. Then, there


exits a positive constant c < 1 such that

 f (u) − f (v) < c u − v for all u, v ∈ S.

Choose any vector x0 ∈ S. Define the fixed-point iteration

x0 , xn+1 = f (xn ) for n ≥ 0.

Since f : S → S, each x j ∈ S. Also, for any m ≥ 1,

xm+1 − xm  =  f (xm ) − f (xm−1 ) ≤ c xm − xm−1 .

Then,

xm+1 − xm  ≤ c xm − xm−1  ≤ c2 xm−1 − xm−2  ≤ · · · ≤ cm x1 − x0 .

By the triangle inequality,

xm+k − xm  ≤ (xm+k − xm+k−1 ) + · · · + (xm+1 − xm )


≤ (cm+k−1 + · · · + cm )x1 − x0 
7.3 Contraction Mapping 169

1 − ck
= cm x1 − x0 
1−c
cm
≤ x1 − x0 . (7.2)
1−c

As 0 < c < 1 implies that 0 < ck < 1, we have 0 < 1 − ck < 1; thus, the last
inequality holds.
cm
Suppose ε > 0. Since lim = 0, we have a natural number N such that for
n→∞ 1 − c
m
c
each m > N , < ε. That is,
1−c

xm+k − xm  < ε for all m > N , k > 0.

Therefore, {xn }∞
n=0 is a Cauchy sequence in S. Since S is a closed subset of a finite
dimensional normed linear space, this sequence converges, and the limit vector, say
u, is in S.
Observe that since  f (x) − f (y) < c x − y, the function f is continuous.
Thus, taking limit of both the sides in the fixed-point iteration xn+1 = f (xn ), we see
that u = f (u). Therefore, u is a fixed point of f.
For uniqueness of such a fixed point, suppose u and v are two fixed points of f.
Then,
u = f (u), v = f (v).

If u = v, then 0 < u − v =  f (u) − f (v) < c u − v < u − v as c < 1.


This is a contradiction. Therefore, u = v. 

Contraction mapping principle can be used to solve linear or nonlinear equations


as we have mentioned earlier. The trick is to write the equation in the form x = f (x)
so that such an f is a contraction on its domain. Next, we use the fixed-point iteration
to solve the equation approximately.
It will be useful to have some idea on the possible error we may commit when we
declare that a certain xm in the fixed-point iteration is an approximation of the fixed
point x. The two types of error estimates are as follows.
cm
A priori error estimate : xm − x ≤ x1 − x0 .
1−c
c
A posteriori error estimate : xm − x ≤ xm − xm−1 .
1−c
cm
The first one follows from the inequality xm+k − xm  ≤ x1 − x0  in (7.2),
(1 − c)
by taking k → ∞. This error estimate is called a priori since without computing xm
explicitly, it gives information as to how much error we may commit by approximat-
ing x with xm .
170 7 Norms of Matrices

For the second one, observe that

xm − x =  f (x) − f (xm−1 ) ≤ c x − xm−1 . (7.3)

Using the triangle inequality, we have

x − xm−1  ≤ x − xm  + xm − xm−1  ≤ c x − xm−1  + xm − xm−1 .

This implies
(1 − c) x − xm−1  ≤ xm − xm−1 .

Using (7.3), we obtain


c
x − xm  ≤ cx − xm−1  ≤ xm − xm−1 .
1−c

This error estimate is called a posteriori since it gives information about the error
in approximating x with xm only after xm has been computed.

Example 7.7 A commonly used method to approximate a root of a nonlinear equa-


tion f (x) = 0 is the Newton’s method, where the iteration is given by

x 0 , x n+1 = x n − ( f (x n ))−1 f (x n ) for n ≥ 0.

We may consider f (x) as a vector function, say, f : Rn×1 → Rn×1 . That is,

f (x) = ( f 1 (x1 , . . . , xn ), . . . , f n (x1 , . . . , xn )),

where f j : Rn×1 → R is a real valued function of n variables. Since the components


of x are written as xi with subscripts, we have written the mth approximation to x as
x m , using the superscript notation.
Newton’s method can be posed as a fixed-point iteration with the iteration function
g given by
g(x) = x − ( f (x))−1 f (x).

Here, f (x) is the Jacobian matrix

∂ fi
f (x) = [ai j ] ∈ Rn×n with ai j = for i, j = 1, . . . , n.
∂x j

The resulting iteration x n+1 = x n − ( f (x n ))−1 f (x n ) is solved by setting h n =


xn+1
− x n which satisfies the linear system

f (x n )h n = − f (x n ).
7.3 Contraction Mapping 171

To prove that Newton’s method converges to a solution, we should define S on


which this g would be a contraction under suitable conditions on f.
Denote a root of the equation f (x) = 0 by u. Assume that f and its first and
second partial derivatives are continuous, f (x) is invertible within a neighbourhood
of radius δ > 0 of the vector u and that in the same neighbourhood, ( f (x))−1  ≤ α
for some α > 0. Using Taylor’s formula and then taking the norm, we have

 f (x) − f (u) − f (x)(x − u) ≤ β u − x2 for some β > 0.

As f (u) = 0, it implies that

g(x) − u = x − u − ( f (x))−1 f (x) ≤ β( f (x))−1  u − x2 .

We choose ε < min{δ, (αβ)−1 } and define S := {x : u − x ≤ ε}. Then,

g(x) − u ≤ αβu − x2 ≤ αβε u − x ≤ u − x.

Therefore, g : S → S. Also, if u − x ≤ ε, then c = αβu − x < 1. So, g is


a contraction on S. We conclude that Newton’s method starting with any x 0 within
the ε neighbourhood of the root u will converge to u.
Observe that this feature of Newton’s method requires a good initial guess x 0 . If
the initial guess is away from the desired root, then the method may not converge to
the root. 

Exercises for Sect. 7.3


1. Determine the largest interval S = {x ∈ R : a ≤ x ≤ b} satisfying the property:
for each x0 ∈ S, the fixed-point iteration xi+1 = xi (2 − xi ) converges.
2. Show that the fixed-point iteration xi+1 = xi − xi3 converges if the initial guess
x0 satisfies −1 < x0 < 1.
3. Let S = {x ∈ R : x ≥ 0}. Define f : S → S by f (x) = x + (1 + x)−1 . Show
that for all x, y ∈ S, | f (x) − f (y)| < |x − y|. Also, show that f is not a con-
traction.
4. Show that Newton’s method for the function f (x) = x n − a for x > 0, n > 1,
a > 0, converges to a 1/n .

7.4 Iterative Solution of Linear Systems

Let A ∈ Cn×n be an invertible matrix. To solve the linear system Ax = b, we rewrite


it as x = x + C(b − Ax) for an invertible matrix C. We see that the solution of the
linear system is a fixed point of f (x), where

f (x) = x + C(b − Ax) for an invertible matrix C.


172 7 Norms of Matrices

By using this f (x) as an iteration function, we may obtain an approximate solution


for the linear system.
Notice that we could have rewritten Ax = b as x = x + (b − Ax). However,
keeping an arbitrary invertible matrix C helps in applying the contraction mapping
principle; see the following theorem.

Theorem 7.5 (Fixed-Point Iteration for Linear Systems) Let  ·  be a


sub-multiplicative norm on matrices. Let A ∈ Fn×n , b ∈ Fn×1 , and let C ∈ Fn×n
be such that I − C A < 1. Then, the following iteration converges to the solution
of the system Ax = b :

x 0 , x m+1 = x m + C(b − Ax m ) for m ≥ 0 (7.4)

Proof First, we show that both A and C are invertible. On the contrary, if at least
one of A or C is not invertible, then there exists a nonzero vector x ∈ Cn×1 such that
C Ax = 0. Then,

x = x − C Ax = (I − C A)x ≤ I − C Ax < x,

a contradiction. Hence, both A and C are invertible.


Consequently, there exists a unique solution of the linear system Ax = b.
Let f : Cn×1 → Cn×1 be defined by

f (x) = x + C(b − Ax) for x ∈ Cn×1 .

If x, y ∈ Cn×1 , then

 f (x) − f (y) = x + C(b − Ax) − y − C(b − Ay)


= (I − C A)(x − y) ≤ I − C A x − y.

Since I − C A < 1, f is a contraction on Cn×1 . By the contraction mapping


principle, f (x) has a unique fixed point in Cn×1 . If this fixed point is u, then u = f (u)
implies that u = u + C(b − Au), or C(b − Au) = 0. Since C is invertible, we have
Au = b. That is, the fixed point u is the solution of Ax = b.
Notice that the fixed-point iteration with the iteration function as f (x) is nothing
but Iteration (7.4). This completes the proof. 

Iteration (7.4) for solving a linear system is generically named as the fixed-point
iteration for linear systems. However, it is not just a method; it is a scheme of methods.
By fixing the matrix C and a sub-multiplicative norm, we obtain a corresponding
method to approximate the solution of Ax = b.
For a given linear system, Iteration (7.4) requires choosing a suitable matrix C
such that I − C A < 1 in some induced norm or the Frobenius norm. It may quite
well happen that in one induced norm, I − C A is greater than 1, but in another
induced norm, the same is smaller than 1. For example, consider the matrix
7.4 Iterative Solution of Linear Systems 173
 
2/3 0
B= 2/3
.
0

Its row sum norm, column sum norm, the Frobenius norm, and the spectral norm tell
different stories:
√ √
B∞ = 2
3
+ 2
3
= 4
3
> 1, B1 = 2
3
< 1, B F = 3
8
< 1, B2 = 3
8
< 1.

The last one is computed from the eigenvalues of B ∗ B, which are 8


9
and 0.
The spectral radius of B is (See Exercise 1.):

ρ (B) = max{|λ| : λ is an eigenvalue of B} = 23 .

That is, the norm of B in any induced norm is at least 23 .

Example 7.8 Consider the linear system Ax = b, where A = [ai j ] ∈ Cn×n and b ∈
Cn×1 . Let D = diag(a11 , . . . , ann ), which consists of the diagonal portion of A as
its diagonal and the rest of the entries are all 0. Suppose that no diagonal entry of A
is 0, and that in some induced norm, or in the Frobenius norm, I − D −1 A < 1.
For instance, suppose A is a strict diagonally dominant matrix, that is, the entries
ai j of A satisfy
n
|aii | > |ai j | for i = 1, . . . , n.
j=1, j =i

This says that in any row, the absolute value of the diagonal entry is greater than the
sum of absolute values of all other entries in that row. Then,

 n 
I − D −1 A∞ = max 1 − |ai j |/|aii | : 1 ≤ i ≤ n
j=1
 n 
= max |ai j |/|aii | : 1 ≤ i, j ≤ n < 1.
j=1, j =i

It follows that the iteration

x 0 , x m+1 = x m + D −1 (b − Ax m ) for m ≥ 0

converges to the unique solution of Ax = b. This fixed-point iteration is called Jacobi


iteration. 

In Jacobi iteration, we essentially take D −1 as an approximation to A. In that case,


we may view the lower triangular part of A to be a still better approximation to A.
We try this heuristic in the next example.
174 7 Norms of Matrices

Example 7.9 For A = [ai j ] ∈ Cn×n and b ∈ Cn×1 , let L = [ i j ], where

ij = ai j for i ≥ j, and ij = 0 for i < j.

Assume that A is invertible and that no diagonal entry of A is 0 so that L is also


invertible. Suppose that in some induced norm or in Frobenius norm, I − L −1 A <
1. For instance, if A is strict diagonally dominant, then I − L −1 A∞ < 1 as earlier.
Then, the fixed-point iteration

x 0 , x m+1 = x m + L −1 (b − Ax m ) for m ≥ 0

converges to the solution of Ax = b.


This fixed-point iteration is called Gauss–Seidel iteration. It can be shown that it
converges for all positive definite matrices also. 

Exercises for Sect. 7.4

1. For any matrix A ∈ Fn×n , the non-negative real number

ρ (A) = max{|λ| : λ is an eigenvalue of A}

is called the spectral radius of A. Show that if  ·  is any induced norm, then
ρ (A) ≤ A.
2. Solve the linear system 3x − 6y + 2z = 15, − 4x + y − z = −2, x − 3y +
7z = 22 by using
(a) Jacobi iteration with initial guess as (0, 0, 0).
(b) Gauss–Seidel iteration with initial guess as (2, 2, −1).

7.5 Condition Number

In applications, when we reach at a linear system Ax = b for A ∈ Fn×n and b ∈ Fn×1 ,


it is quite possible that there are errors in the matrix A and the vector b. The errors
might have arisen due to modelling or due to computation of other parameters.
Neglecting small parameters may lead to such errors. Or, you could have entered the
data incorrectly due to various reasons such as the constants are irrational numbers
but you need to feed in only floating point numbers, or by sheer mistake. It is also
possible that the entries in A and b have been computed from numerical solution of
another problem. In all such cases, we would like to determine the effect of small
changes in the data (that is, in A and b) on the solution of the system and its accuracy.
In other words, we assume that we are actually solving a perturbed proble, and we
would like to estimate the error committed due to perturbation. Even with negligible
perturbation, the nature of the perturbed system can very much differ from the original
one.
7.5 Condition Number 175
     
1 1 1+δ 1+δ 1+δ 1
Example 7.10 Consider A = , A = , Ã =
2 2 2 2 2 2
 
2
and b = for some δ = 0.
4
With a small value of δ, say, δ = 10−10 , the matrices A and à may be considered
as small perturbations to the matrix A.
We find that the system Ax = b has infinitely many solutions given by
 
α
Sol(A, b) = :α∈F .
2−α

The system A x = b has no solutions. Whereas the system Ãx = b has a unique
solution x1 = −1/δ, x2 = 2 + 1/δ. 

In analysing the effect of a perturbation on a solution, we require the so-called


condition number of a matrix. Let A ∈ Fn×n be an invertible matrix. Let  ·  be an
induced matrix norm or the Frobenius norm. Then, the condition number of A,
denoted by κ(A), is defined as

κ(A) = A−1  A.

If A is not invertible, then we define κ(A) = ∞.


Naturally, the condition number depends on the chosen matrix norm. We take
the condition number of a non-invertible matrix as infinity due to the complete
unpredictability of the nature of a solution of the corresponding linear system under
perturbation.
⎡ ⎤ ⎡ ⎤
1 1/2 1/4 14 − 6 −2
Example 7.11 The inverse of A = ⎣1/2 1 1/4⎦ is A−1 = 21 2 ⎣
−7 15 −2⎦ .
1/4 1/2 1 0 − 6 12
If we use the maximum row sum norm, then

A∞ = 47 , A−1 ∞ = 2
21
× 24 = 16
7
.

Thus, κ(A) = 47 × 16
7
= 4.
If we use the maximum column sum norm, then

A1 = 47 , A−1 1 = 2
21
× 27 = 18
7
.

Consequently, κ(A) = 7
4
× 18
7
= 29 . 

Observe that the condition number of any matrix with respect to any induced norm
is at least 1. For,
176 7 Norms of Matrices

1 = I  = A−1 A ≤ A−1  A = κ(A).

Therefore, we informally say that a matrix is well-conditioned if its


condition number is close to 1; and it is ill-conditioned if its condition number
is large compared to 1.
We wish to see how an error in the data leads to small or large errors in the solution
of a linear system. The data in a linear system Ax = b are the matrix A and the vector
b. We consider these two cases separately.
First, suppose that A is invertible and there is an error in b. We estimate its effect
on the solution, especially, on the relative error.
Theorem 7.6 Let A ∈ Fn×n be invertible. Let b, b ∈ Fn×1 . Let x, x ∈ Fn×1 satisfy
x = 0, Ax = b , and Ax = b . Then,

x − x b − b
≤ κ(A) .
x b

Proof Ax = b and Ax = b imply that A(x − x) = b − b. That is, x − x =


A−1 (b − b). Then,
x − x ≤ A−1  b − b.

Next, Ax = b implies that b ≤ A x. Multiplying this with the previous
inequality, we obtain

x − x b ≤ A−1  b − b A x = κ(A) b − b x.

The required estimate follows from this. 


Next, consider the case that b is fixed but A is perturbed. We estimate the relative
error in the solution using the relative error in A.
Theorem 7.7 Let A ∈ Fn×n be invertible, A ∈ Fn×n , and let b ∈ Fn×1 . Let x, x ∈
Fn×1 satisfy Ax = b, x = 0 , and A x = b. Then,

x − x A − A
≤ κ(A) .
x  A

Proof Ax = b and A x = b imply that A(x − x) + (A − A)x = 0. It gives

x − x = −A−1 (A − A)x .

Taking norms of both sides, we have

κ(A)
x − x ≤ A−1  A − A x  = A − A x .
A

The required inequality follows from this. 


7.5 Condition Number 177

The two estimates in Theorems 7.6–7.7 need to be interpreted correctly. Roughly,


they say that when the condition number is large, the relative error will also be large,
in general. They do not say that largeness of the condition number ensures large
relative error.
In Theorem 7.6, the estimate is valid for any fixed b. It means that for some
b ∈ Fn×1 the relative error will be large when the condition number of A is large.
Also, there may be some b ∈ Fn×1 for which the relative error is small even if κ(A)
is large. Similar comment applies to Theorem 7.7.
When we compute the solution of a linear system, due to round-off and other
similar factors, the computed solution may not be an exact solution. It means it may
not satisfy the linear system. Suppose that x is the exact solution of the linear system
Ax = b and our computation has produced x̂. Substituting in the equation, we find
that A x̂ = b.
The vector b − A x̂ is called the residual of the computed solution. The residual
b − A x̂
can always be computed a posteriori. The relative residual is the quantity .
x
We wish to estimate the relative error in the computed solution using the relative
residual. Again, the condition number comes of help.

Theorem 7.8 Let A ∈ Fm×n be invertible, b ∈ Fn×1 , and let b = 0. Let x ∈ Fn×1
satisfy x = 0 and Ax = b. Let x̂ ∈ Fn×1 . Then,

1 b − A x̂ x − x̂ b − A x̂


≤ ≤ κ(A) .
κ(A) b x b

Proof A(x − x̂) = Ax − A x̂ = b − A x̂ implies b − A x̂ ≤ A x − x̂. Also,


x − x̂ = A−1 (b − A x̂). Therefore,

b − A x̂
≤ x − x̂ = A−1 (b − A x̂) ≤ A−1  b − A x̂.
A

Dividing by x, we have

b b − A x̂ x − x̂ A−1  b b − A x̂


· ≤ ≤ · . (7.5)
A x b x x b

Since Ax = b, we get b ≤ A x; and x = A−1 b implies x ≤ A−1  b. It
follows that

A−1  b 1 b


≤ A−1  A, and −1
≤ . (7.6)
x A  A A x

Then, from (7.5)–(7.6), we obtain

1 b − A x̂ x − x̂ b − A x̂


· ≤ ≤ A−1  A .
A−1  A b x b
178 7 Norms of Matrices

Now, the required estimate follows. 

Observe that (7.5) can be used to obtain a lower bound as well as an upper bound
on the relative residual.
The estimate in Theorem 7.8 says that the relative error in the approximate solution
b
of Ax = b can be as small as   times the relative residual, or it can be as
A x
A−1  b
large as times its relative residual. It also says that when the condition
x
number of A is close to 1, the relative error and the relative residual are close to each
other; while the larger is the condition number of A, the relative residual provides
less information about the relative error.
   
1.01 0.99 −1 25.25 − 24.75
Example 7.12 The inverse of A = is A = .
0.99 1.01 −24.75 25.25
Using the maximum row sum norm, we see that

A∞ = 2, A−1 ∞ = 50, κ(A) = 100.

The linear system Ax = [2, 2]t has the exact solution x = [1, 1]t . We take an
approximate solution as x̂ = [1.01 1.01]t . The relative error and the relative residual
are given by

x − x̂∞ [0.01, 0.01]t ∞


= = 0.01,
x∞ [1, 1]t ∞
[2, 2]t − A x̂∞ [−0.02, − 0.02]t ∞
= = 0.02.
x∞ [1, 1]t ∞

It shows that even if the condition number is large, the relative error and the
relative residual can be of the same order.
We change the right hand side to get the linear system Ax = [2, − 2]t . It has
the exact solution x = [100, − 100]t . We consider an approximate solution x̂ =
[101, − 99]t . The relative error and the relative residual are

x − x̂∞ [−1, − 1]t ∞


= = 0.01,
x∞ [100, − 100]t ∞
[2, − 2]t − A x̂∞ [−100.01, − 100.01]t ∞
= = 1.0001.
x∞ [100, − 100]t ∞

That is, the relative residual is 100 times the relative error. 

In general, when κ(A) is large, the system Ax = b is ill-conditioned for at least one
choice of b. There might still be choices for b so that the system is well-conditioned.
7.5 Condition Number 179
   
1 105 1 −105
Example 7.13 Let A = . Its inverse is A−1 = . Using the
0 1 0 1
absolute row sum norm, we have κ(A) = A∞ A−1 ∞ = (105 + 1)2 .
The system Ax = [1, 1]t has the solution x = [1 − 105 , 1]t .
Changing the vector b to b = [1.001, 1.001]t , the system Ax = b has the solu-
tion x = [1.001 − 105 − 102 , 1.001]t . We find that the relative change in the solu-
tion and the relative residual are as follows:

x − x ∞ 0.001[1 − 105 , 1]t ∞


= = 0.001,
x∞ [1 − 105 , 1]t ∞
b − b ∞ [0.001 0.001]t ∞
= = 0.001.
b∞ [1 1]t ∞

It shows that even if the condition number is large, the system can be well-
conditioned. 

Exercises for Sect. 7.5

1. Using the spectral norm, show that an orthogonal matrix has the condition number
1.
2.   norm  · 2 for matrices, compute the condition number of the matrix
Using the
3 −5
.
6 1
3. Let A, B ∈ Fn×n . Show that κ(AB) ≤ κ(A)κ(B).
4. Using the norm  · 2 for n × n matrices, show that κ(A) ≤ κ(A∗ A).

7.6 Matrix Exponential

The exponential of a complex number is usually defined by a series, which is an


infinite sum. Since addition is defined only for two numbers, and hence by induction,
for a finite number of them, the infinite sum requires a new definition. Instead of
telling that an infinite sum such as a series has so and so value, we say that it
converges to such and such a number. Similarly, we may define when an infinite sum
of vectors or matrices converges.
Let {Ai}∞
n=1 be a sequence of m × n matrices with complex entries. We say that

the series i=1 Ai converges to an m × n matrix A iff for each ε > 0, there exists
k
a natural number n 0 such that if k > n 0 , then  i=1 Ai − A < ε.
Notice that for a square matrix A, any sub-multiplicative matrix norm satisfies
the property that Ak  ≤ Ak for all natural numbers k. Thus, when each A j is
an n × n matrix and  ·  is sub-multiplicative,
 it can be shown (by using Cauchy
sequences) that the series of matrices ∞ A converges to some matrix if the series
∞ j=1 j
of non-negative real numbers j=1 A j  converges to some real number. Using this,
the exponential of a square matrix A, which we denote by e A , is defined as follows:
180 7 Norms of Matrices


Ai 1 1
eA = = I + A + A2 + A3 + · · ·
i=0
i! 2! 3!

The computation of e A is simple when A is a diagonal matrix. For,

D = diag(d1 , . . . , dn ) implies e D = diag(ed1 , . . . , edn ).

If A is similar to a diagonal matrix, then we may write A = P D P −1 for some


invertible matrix P. Then, Ai = P D i P −1 ; consequently, e A = P e D P −1 . This way,
exponential of any diagonalizable matrix can be computed.
In general, we may use the Jordan form of A for obtaining a closed-form expression
for e A . Suppose that A = P J P −1 , where J is a matrix in Jordan form. Here,

J = diag(J1 , . . . , Jk ),

where Ji are the Jordan blocks in the form


⎡ ⎤
λ 1
⎢ λ 1 ⎥
⎢ ⎥
⎢ .. .. ⎥
Ji = ⎢ . . ⎥
⎢ ⎥
⎣ 1⎦
λ

for an eigenvalue λ of A. Then,

e A = Pe J P −1 = P diag(e J1 , . . . , e Jk ) P −1 .

If Ji in the above form is of order , then e Ji is in the following form:


⎡ 1/( −1)!⎤
1 1 1/2! 1/3! ···
⎢ 1 1 1/2! ··· 1/( −2)!⎥
⎢ ⎥
⎢ 1 1 ··· 1/( −3)!⎥
⎢ ⎥
⎢ .. .. ⎥
e Ji = eλ ⎢ . ⎥
⎢ . ⎥
⎢ 1/2! ⎥
⎢ 1 1 ⎥
⎣ 1 1 ⎦
1

The exponential of a matrix comes often in the context of solving ordinary dif-
ferential equations. Consider the initial value problem

dx
= A x, x(t0 ) = x 0
dt
7.6 Matrix Exponential 181

where x(t) = [x1 (t), · · · , xn (t)]t , A ∈ Rn×n , and x 0 = [a1 , · · · , an ]t ∈ Rn×1 . The
solution of this initial value problem is given by

x(t) = e(t−t0 )A x 0 = et A e−t0 A x 0 .

The matrix et A may be computed via the Jordan form of A as outlined above.
There is an alternative. We wish to find out n linearly independent vectors
v1 , . . . , vn such that the series et A vi can be summed exactly. If this can be done,
then taking these as columns of a matrix B, we see that
 
B = et A v1 · · · vn .
 
The matrix v1 · · · vn is invertible since its columns are linearly independent.
Then, using its inverse, we can compute et A . We now discuss how to execute this
plan.
For any scalar λ, t (A − λI )(tλI ) = (tλI )t (A − λI ). Thus, we have

et A = et (A−λI ) etλI = et (A−λI ) eλt I = eλt et (A−λI ) .

Hence, if v ∈ Rn×1 is such that (A − λI )m v = 0, then


 t m−1 
et A v = eλt et (A−λI ) v = eλt v + t (A − λI )v + · · · + (A − λI )m−1 v .
(m − 1)!

That is, for a generalized eigenvector v, the infinite sum turns out to be a finite
sum. The question is that can we choose n such generalized eigenvectors for A?
From the discussions about Jordan form, we know that the answer is affirmative. In
fact, the following is true:
Let λ be an eigenvalue of A with algebraic multiplicity m. If the linear system (A − λI )k x =
0 has r < m number of linearly independent solutions, then (A − λI )k+1 has at least r + 1
number of linearly independent solutions.

Our plan is to obtain m number of linearly independent generalized eigenvectors


associated with an eigenvalue λ of algebraic multiplicity m. Then, we take together
all such generalized eigenvectors for obtaining the n linearly independent vectors v
so that et A v can be summed exactly. We proceed as follows.

1. Find all eigenvalues of A along with their algebraic multiplicity. For each eigen-
value λ, follow Steps 2–6:
2. Determine linearly independent vectors v satisfying (A − λI )v = 0.
3. If m number of such vectors are found, then write them as v1 , . . . , vm , and set
w1 := eλ1 t v1 , . . . , wm := eλn t vm . Go to Step 7.
4. In Step 2, suppose that only k < m number of vectors vi could be obtained. To
find additional vectors, determine all vectors v such that (A − λI )2 v = 0 but
(A − λI )v = 0. For each such vector v, set the corresponding w as
182 7 Norms of Matrices
 
w := eλt v + t (A − λI )v .

5. If n number of vectors w j could not be found in Steps 3 and 4, then determine all
vectors v such that (A − λI )3 v = 0 but (A − λI )2 v = 0. Corresponding to each
such v, set
t2
w := eλt v + t (A − λI )v + (A − λI )2 v .
2!
6. Continue to obtain more vectors w considering vectors v that satisfy (A −
λI ) j+1 v = 0 but (A − λI ) j v = 0, and by setting

t2 tj
w := eλt v + t (A − λI )v + (A − λI )2 v + · · · + (A − λI ) j v .
2! j!

7. Assume that for each eigenvalue λ of algebraic multiplicity m, we have obtained


v1 , . . . , vm and w1 , . . . , wm . Together, we have obtained n number of linearly
independent vi s and n number of linearly independent wi s. Then, set
   
B = w1 · · · wn , C = v1 · · · vn , et A = BC −1 .

Further, the vectors w1 , . . . , wn are n linearly independent solutions of the system


of differential equations d x/dt = Ax.
⎡ ⎤
1 0 0
Example 7.14 Let A = ⎣1 1 0⎦ . Its eigenvalues, with multiplicities, are 1, 1, 3.
0 0 3
For λ = 1, we determine all nonzero vectors v such that (A − λI )v = 0. If v =
[a, b, c, ]t , then
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 a 0 0
(A − 1 I )v = ⎣1 0 0⎦ ⎣b ⎦ = ⎣ a ⎦ = ⎣0⎦ .
0 0 2 c 2c 0

It implies that a = 0, c = 0, and b is arbitrary. We choose

v1 = [0, 1, 0]t , w1 = et [0, 1, 0]t .

Notice that the eigenvalue 1 has algebraic multiplicity 2 but we got only one
linearly independent eigenvector. To compute an additional generalized eigenvector
for this eigenvalue, we find vectors v such that (A − 1 I )2 v = 0 but (A − 1 I )v = 0.
With v = [a, b, c]t , it gives
⎡ ⎤2 ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 a 0 0 0 0 0 0
(A − 1 I )2 v = ⎣1 0 0⎦ ⎣b ⎦ = ⎣1 0 0⎦ ⎣ a ⎦ = ⎣ 0 ⎦ = ⎣0⎦ .
0 0 2 c 0 0 2 2c 2c 0
7.6 Matrix Exponential 183

It implies that a, b are arbitrary and c = 0. Moreover, it should also satisfy


(A − λI )v = 0. Taking any vector linearly independent with v1 will be such a choice.
Thus, we choose a = 1, b = 0, c = 0. That is,
 
v2 = [1, 0, 0]t , w2 = et v2 + t (A − 1 I )v2 = [et , tet , 0]t .

For λ = 3, (A − λI )[a, b, c]t = 0 implies that −2a = 0 = a − 2b. That is,


a = 0, b = 0, and c is arbitrary. Thus, we choose

v3 = [0, 0, 1]t , w3 = e3t [0, 0, 1]t .

We have got three linearly independent vectors v1 , v2 , v3 and the corresponding


w1 , w2 , w3 . We put these vectors as columns of matrices B and C to obtain
⎡ ⎤ ⎡ ⎤
  0 et 0   0 1 0
B = w1 w2 w3 = ⎣et tet 0 ⎦ , C = v1 v2 v3 = ⎣1 0 0⎦ .
0 0 e3t 0 0 1

Then,
⎡ ⎤⎡ ⎤ ⎡ t ⎤ ⎡ ⎤
0 et 0 0 1 0 e 0 0 e 0 0
et A = BC −1 = ⎣et tet 0 ⎦ ⎣1 0 0⎦ = ⎣tet et 0 ⎦ , e A = ⎣e e 0 ⎦ .
0 0 e3t 0 0 1 0 0 e3t 0 0 e3

Further, we observe that w1 , w2 and w3 are three linearly independent solutions


of the system of differential equations d x/dt = Ax. 

Exercises for Sect. 7.6


1. Let λ be an eigenvalue of a matrix A. Show that eλ is an eigenvalue of the
exponential matrix e A .
λ1 λn
2. Let D = ⎡diag(λ1 , ⎤
. . . , λn ). Show that e D =
⎡ diag(e2 ,⎤. . . , e ).
λ 1 0 1 t /2
t

3. Let A = ⎣0 λ 1⎦ . Show that et A = eλt ⎣0 1 t ⎦ .


0 0 λ 0 0 1
4. Let A be a Jordan block of order n with diagonal entries as λ. Let B = A − λI.
Show that B n = 0. Further, prove that

t2 2 t n−1
et A = eλt I + t B + B + ··· + B n−1 .
2! (n − 1)!

5. Compute et A in the following cases, where A is given by


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
  1 1 0 0 1 1 0 0 1 1 0 0
0 1 ⎢0 1 0 0⎥ ⎢0 1 1 0⎥ ⎢0 1 1 0⎥
(a) (b) ⎢ ⎥ ⎢
⎣0 0 1 0⎦ (c) ⎣0 0 1 0⎦ (d) ⎣0
⎥ ⎢ ⎥
−1 0 0 1 1⎦
0 0 0 1 0 0 0 1 0 0 0 1
184 7 Norms of Matrices

6. Compute e A , where A is given by


⎡ ⎤
    1 1 1
1 1 3 4
(a) (b) (c) ⎣ −1 −1 −1 ⎦
−1 −1 −2 −3
1 1 1
   
0 0 1 0
7. Let A = , B= . Show that tr(e A+B
) = tr(e A e B ).
1 0 1 0
8. Find e A , where A ∈ R5×5 with each entry as 1.
9. Determine et A if A2 = 7A.

7.7 Estimating Eigenvalues

As you have seen, manual computation of eigenvalues of an arbitrary matrix using its
characteristic polynomial is impossible. Many numerical methods have been devised
to compute required number of eigenvalues approximately. This issue is addressed
in numerical linear algebra. Prior to these computations, it is often helpful to have
some information about the size or order of magnitude of the eigenvalues in terms
of the norms of some related vectors or matrices.

Theorem 7.9 Let  ·  be an induced norm on Fn×n , and let λ be an eigenvalue of


a matrix A ∈ Fn×n . Then, |λ| ≤ A.

Proof Let v be an eigenvector associated with the eigenvalue λ of the matrix A.


Since Av = λv, we have Av = |λ| v. Then,
Av Av
|λ| = ≤ lub : v ∈ Fn×1 , v = 0 = A. 
v v
In particular, since the row sum norm and the column sum norm are induced norms,
we have the following computationally simple upper bounds on the eigenvalues of a
matrix:
n n
|λ| ≤ max |ai j , |λ| ≤ max |ai j ,
i j
j=1 i=1

where λ is any eigenvalue of the matrix A = [ai j ] ∈ Fn×n .


In the complex plane, an inequality such as |λ| < r gives rise to a disc of radius
r centred at 0, in which λ lies. Thus, Theorem 7.9 says that all eigenvalues of A
lie inside the disc of radius A centred at the origin. To improve this result, we
introduce some notation and some terminology.
Let A = [ai j ] ∈ Fn×n . Let ri (A) denote the sum of absolute values of all entries
in the ith row except the diagonal entry. That is,
n
ri (A) = |ai1 | + · · · + |ai(i−1) | + |ai(i+1) | + · · · + |ain | = |ai j |.
j=1, j =i
7.7 Estimating Eigenvalues 185

The ith Geršgorin disc Di (A) of A is defined as

Di (A) = {z ∈ C : |z − aii | ≤ ri (A)}.

Thus, there are n number of Geršgorin discs; one for each row of A.

Theorem 7.10 (Geršgorin Discs) All eigenvalues of a matrix lie inside the union of
its Geršgorin discs.

Proof Let λ be an eigenvalue of a matrix A = [ai j ] ∈ Fn×n with an associated eigen-


vector v. Then, v = 0 but (A − λI )v = 0. The vector v has n components. Suppose
the ith component has the largest absolute value. That is, if v = [b1 , · · · , bn ]t , then
|bi | ≥ |b j | for each j = i. Write the equality (A − λI )v = 0 in detail, and consider
the ith equality in it. It looks like

ai1 b1 + · · · + ai(i−1) bi−1 + (aii − λ)bi + ai(i+1) bi+1 + · · · + ain bn = 0.

Bringing the ith term to one side and taking absolute values, we have
n n n
|aii − λ| |bi | ≤ |ai j b j | = |ai j | |b j | ≤ |bi | |ai j | = |bi | ri (A).
j=1, j =i j=1, j =i j=1, j =i

Then, |λ − aii | ≤ ri (A). That is, λ ∈ Di (A). We see that corresponding to each
eigenvalue λ there exists a row i of A such that λ ∈ Di . Therefore, each eigenvalue
of A lies in D1 (A) ∪ · · · ∪ Dn (A). 

Recall that a matrix A = [ai j ] ∈ Fn×n is called strict diagonally dominant iff
|aii | > ri (A) for each i = 1, . . . , n. Look at the proof of Theorem 7.10. If A is
not invertible, then 0 is an eigenvalue of A. With λ = 0, we obtain the inequality
|aii | ≤ ri (A) for some row index i. Thus, each strict diagonally dominant matrix is
invertible.
⎡ ⎤
0 3 2 3 3
⎢ −1 7 2 1 1 ⎥
⎢ ⎥
Example 7.15 Consider the matrix A = ⎢ ⎢ 2 1 0 1 1 ⎥.

⎣ 0 −1 1 0 1 ⎦
1 −1 2 1 0
The Geršgorin discs are specified by complex numbers z satisfying

|z| ≤ 11, |z − 7| ≤ 5, |z| ≤ 5, |z| ≤ 3, |z| ≤ 5.

The first disc contains all others except the second. Therefore, all eigenvalues lie
inside the union of discs |z| ≤ 11 and |z − 7| ≤ 5.
Notice that At and A have the same eigenvalues. It amounts to taking the Geršgorin
discs corresponding to the columns of A. Here, they are specified as follows:
186 7 Norms of Matrices

|z| ≤ 4, |z − 7| ≤ 6, |z| ≤ 7, |z| ≤ 6, |z| ≤ 6.

As earlier, it follows that all eigenvalues of A lie inside the union of the discs
|z| ≤ 7 and |z − 7| ≤ 6.
Therefore, all eigenvalues of A lie inside the intersection of the two regions
obtained earlier as unions of Geršgorin discs. It turns out that this intersection is
the union of the discs |z| ≤ 7 and |z − 7| ≤ 5. 

Further sharpening of Geršgorin’s theorem is possible. One such useful result is


the following:
Let A ∈ Fn×n . Let k ∈ {1, . . . , n}. If k of the Geršgorin discs for A are disjoint from the
other n − k Geršgorin discs, then exactly k of the eigenvalues of A lie inside the union of
these k discs.

There are many improvements on Geršgorin’s theorem giving various kinds of


estimates on eigenvalues. You may see Varga [15].
Exercises for Sect. 7.7
1. Using Geršgorin discs, determine⎡the regions⎤where all eigenvalues
⎡ of the follow-

5 2 4 0 −10 1
ing matrices lie: (a) ⎣ −2 0 2 ⎦ (b) ⎣ 5 2 0⎦
2 4 7 −8 10 12
⎡ ⎤
i 1−i 0
(c) ⎣ i/2 0 2i ⎦
1 + i 2 + i 3 + 4i
2. Give an example of a 2 × 2 matrix, where at least one of its eigenvalues lies on
the boundary of the union of Geršgorin discs.

7.8 Problems

1. Let c and s be real numbers with c2 + s 2 = 1. Show that A F = n, where
⎡ ⎤
−1 c c ··· c c
⎢ 0 −s cs · · · cs cs ⎥
⎢ ⎥
⎢ 0 0 −s 2 · · · cs 2 cs 2 ⎥
⎢ ⎥
A=⎢ .. ⎥ ∈ Fn×n .
⎢ . ⎥
⎢ ⎥
⎣ 0 0 0 · · · −s n−2 cs n−2 ⎦
0 0 0 ··· 0 −s n−2

2. Let A ∈ Fm×n . Let U ∈ Fm×m be unitary. Show that A F = U A F .


3. Let A ∈ Fm×nhave rank r. Use the previous problem and the SVD of A to deduce
that A2F = ri=1 si2 , where si is the ith positive singular value of A.
7.8 Problems 187

4. Show that A + B2F = A2F + 2 tr(At B) + B2F for n × n matrices A and


B with real entries.
5. Let A be an invertible matrix. Show that A−1 2 is the reciprocal of the smallest
positive singular value of A.
6. Let A ∈ Fn×n be a hermitian matrix, and let v ∈ Fn×1 be a nonzero vector. Show
that the Rayleigh quotient ρA (v) ∈ R.
7. Let A be a hermitian n × n matrix with eigenvalues λ1 ≥ · · · ≥ λn , and let v
be any column vector of size n. Show that the Rayleigh quotient ρA (v) satisfies
λ1 ≥ ρA (v)
⎡ ≥ λn . ⎤ ⎡ ⎤
5 4 −4 1
8. Let A = ⎣ 4 5 4⎦ . Let v = ⎣ t ⎦ where t ∈ R.
−4 4 5 1
(a) The Rayleigh quotient ρA (v) depends on t; so call it f (t). Compute f (t).
(b) Compute the minimum and the maximum values of f (t) to estimate the
largest and the smallest eigenvalues of A.
(c) Does A have an eigenvector v whose first and third components are same?
9. Show that if the fixed-point iteration xi+1 = xi2 − 2 converges to 2, then there
exists a natural number n 0 such that for each n > n 0 , xn = 2.
10. Consider finding a root of the equation 4x 2 − e x = 0 by using the fixed-point
iteration xi+1 = 21 e xi /2 . Show the following:
(a) If 0 ≤ x0 ≤ 1, then the iteration converges to the root in that lies between 0
and 1.
(b) The iteration does not converge to the root that lies between 4 and 5 with any
x0 .
11. Let p(t) be a polynomial with real coefficients. Assume that p(t) has only real
zeros. Let z be the largest zero of p(t). Show that Newton’s method with an
initial guess x0 > z converges to z.
12. In which matrix norm, the condition number is the ratio of its largest singular
value to its smallest positive singular value?
13. Let A, B ∈ Fn×n such that AB = B A. Prove that e A+B = e A e B .
14. Give examples of 2 × 2 matrices A and B such that e A+B = e A e B .
15. Let λ be an eigenvalue of A with an associated eigenvector x. Show that x is
also an eigenvector of e A . What is the corresponding eigenvalue?
16. Show that e A is invertible for any A ∈ Fn×n .
17. Let A be a real symmetric positive definite matrix. Show that e A is also such a
matrix.
18. Let A ∈ Rn×n . Is e A always symmetric? Is e A always positive definite?
19. Let A ∈ Rn×n . Show that each column of e At is a solution of the differential
equation d x/dt = Ax.
20. Let A ∈ Rn×n . Let f i (t) be the solution of the initial value problem d x/dt =
Ax, x(0) = ei , the ith standard basis vector of R . Show that e =
n×1 At

f 1 (t), . . . , f n (t) .
References

1. M. Braun, Differential Equations and Their Applications, 4th edn. (Springer, New York, 1993)
2. R.A. Brualdi, The Jordan canonical form: an old proof. Am. Math. Mon. 94(3), 257–267 (1987)
3. S.D. Conte, C. de Boor, Elementary Numerical Analysis: An algorithmic approach (McGraw-
Hill Book Company, Int. Student Ed., 1981)
4. J.W. Demmel, Numerical Linear Algebra (SIAM Pub, Philadelphia, 1996)
5. F.R. Gantmacher, Matrix Theory, vol. 1–2 (American Math. Soc., 2000)
6. G.H. Golub, C.F. Van Loan, Matrix Computations, Hindustan Book Agency, Texts and Readings
in Math. - 43, New Delhi (2007)
7. R. Horn, C. Johnson, Matrix Analysis (Cambridge University Press, New York, 1985)
8. P. Lancaster, M. Tismenetsky, The Theory of Matrices, 2nd edn. (Elsevier, 1985)
9. A.J. Laub, Matrix Analysis for Scientists and Engineers (SIAM, Philadelphia, 2004)
10. S.J. Leon, Linear Algebra with Applications, 9th edn. (Pearson, 2014)
11. D. Lewis, Matrix Theory (World Scientific, 1991)
12. C. Meyer, Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia, 2000)
13. R. Piziak, P.L. Odell, Matrix Theory: From Generalized Inverses to Jordan Form (Chapman
and Hall/CRC, 2007)
14. G. Strang, Linear Algebra and Its Applications, 4th edn. (Cengage Learning, 2006)
15. R.S. Varga, Geršgorin and His Circles, Springer Series in Computational Mathematics, vol.
36 (Springer, 2004)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 189
Nature Switzerland AG 2021
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7
Index

A sum norm, 163


Addition, 7 vector, 4
Adjoint, 13 Combination similarity, 130
Adjugate, 24 Companion matrix, 111
Algebraic multiplicity, 126 Complex conjugate, 13
Angle between vectors, 82 Complex eigenvalue, 103
Annihilated by, 120 Condition number, 175
Annihilates, 120 Conjugate transpose, 13
Annihilating polynomial, 120 Consistency, 44
Anti-diagonal, 141 Consistent system, 46
Augmented matrix, 26 Continuous, 158
Contraction, 166
Contraction mapping principle, 168
B Converges, 168
Basic variable, 46 Converges to, 179
Basis, 57 Coordinate matrix, 70
Bessel inequality, 93 Coordinate vector, 66
Best approximation, 94 Cramer’s rule, 44

D
C Determinant, 23
Cartesian norm, 157, 160 Diagonal
Cauchy schwartz inequality, 82 entries, 5
Cauchy sequence, 168 matrix, 6
Change of basis matrix, 73 of a matrix, 5
Characteristic polynomial, 103 Diagonalizable, 123
χ A(t) , 103 Diagonalized by, 123
Cholesky factorization, 156 Dilation similarity, 130
Closed subset, 168 Dimension, 59
Closed unit sphere, 159
Co-factor, 24
Column, 4 E
index, 4, 5 Eigenvalue, 101
rank, 40, 63 Eigenvector, 101
space, 63 Elementary matrix, 15

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 191
Nature Switzerland AG 2021
A. Singh, Introduction to Matrix Theory,
https://doi.org/10.1007/978-3-030-80481-7
192 Index

Elementary row operation, 15 J


Entry, 4 Jacobi iteration, 173
Equal matrices, 5 Jordan
Equivalent matrices, 76 basis, 140
Equivalent norms, 159 block, 129
Error estmates, 169 canonical form, 140
Euclidean norm, 158 form, 129
string, 140
jth column, 5
F
Finite dimensional normed linear space, 157
Fixed point, 112, 166 L
iteration, 167 Least squares, 96
iteration for linear systems, 172 Left singular vector, 145
Fourier expansion, 91 Limit, 168
Fredholm alternative, 99 Linear
Free variable, 46 combination, 31
Frobenius norm, 160 dependence, 32
Full rank factorization, 77, 147 independence, 32
Full rank matrix, 77 map, 62
Functional, 158 property, 67
Fundamental subspaces, 98 system, 43
Fundamental theorem of algebra, 103 transformation, 62
Linearly independent set, 38
Lower triangular, 7
G
Gaussian elimination, 50
Gauss-Jordan elimination, 49 M
Gauss-Seidel iteration, 174 Matrix, 4
Generalized eigenvector, 140 Matrix multiplication, 8
Generalized inverse, 155 Minimal polynomial, 121
Geometric multiplicity, 126 Minor, 23
Gram matrix, 96 Monic polynomial, 104
Gram-Schmidt Multiplication by scalar, 7
orthogonalization, 85 Multiplicity, 126
orthonormalization, 87

N
H Newton’s method, 170
Hermitian matrix, 106 Nilpotent, 111
Homogeneous system, 44 Norm, 82, 157
Normal, 124
Normed linear space, 157
I Nullity, 46, 63
Idempotent, 29, 111 Null space, 63
Identity matrix, 6
Ill-conditioned matrix, 176
Induced norm, 161 O
∞-norm, 157 Off-diagonal entries, 5
Inner product, 81 1-norm, 158
Invertible, 11 Order, 5
Isometry, 108 Orthogonal
Iteration function, 168 basis, 88
ith row, 5 matrix, 107
Index 193

set, 83 S
vectors, 82 Scalar matrix, 7
Orthonormal basis, 88 Scalars, 4
Orthonormal set, 83 Schur triangularization, 115
Self-adjoint, 106
Similar matrices, 78
P Singular value decomposition, 143
Parallelogram law, 83 Singular values, 142
Parseval identity, 91 Size, 5
Permutation matrix, 113 Skew hermitian, 106
Permutation similarity, 130 Skew symmetric, 106
Pivot, 17 Sol(A, b), 48
Pivotal column, 17 Solution of a linear system, 43
Pivotal row, 19 Spanning subset, 56
p-norm, 158 Spans, 55, 56
Polar decomposition, 151 Spectral
Positive definite, 155 mapping theorem, 118
Positive semidefinite, 150 norm, 165
Powers of matrices, 10 radius, 174
Principal submatrix, 156 Spectrum, 105
Projection matrix, 93 Square matrix, 5
Projection on a subspace, 93 Standard basis, 57
Proper subspace, 55 Standard basis vectors, 6
Pythagoras law, 83 Strict diagonally dominant, 173
Sub-multiplicative norm, 162
Subspace, 54
Sum, 7
Q
Super-diagonal, 5
QR-factorization, 89
SVD, 143
Symmetric, 106
System matrix, 43
R
Range space, 63
Rank, 19 T
echelon matrix, 76 Taxi-cab norm, 158, 160
theorem, 77 Theorem
Rayleigh quotient, 165 Basis extension, 60
Real skew symmetric, 106 Bessel inequality, 93
Real symmetric, 106 Cayley–Hamilton, 120
Reduction to RREF, 19 Contraction mapping, 168
Reflection, 107 Fixed point iteration, 172
Relative residual, 177 Geršgorin discs, 185
Residual, 177 Gram-Schmidt orthogonalization, 85
Reverse triangle inequality, 158 Jordan form, 131
Right singular vector, 145 Polar decomposition, 151
Rotation, 107 QR-factorization, 89
Row, 4 Rank factorization, 76
index, 4, 5 Rank nullity, 63
rank, 40, 63 Rank theorem, 77
reduced echelon form, 18, 19 Schur triangularization, 115
space, 63 Spectral mapping, 118
sum norm, 162 Spectral theorem, 123, 125
vector, 4 SVD, 143
RREF, 18, 19 Thin SVD, 147
194 Index

Tight SVD, 147 V


Trace, 22 Value of unknown, 43
Transition matrix, 73 Vandermonde matrix, 29
Transpose, 12 Vector space, 53
Transpose notation, 4
Triangle inequality, 82
Triangular, 7
Trivial solution, 46
W
2-norm, 158
Well-conditioned matrix, 176

U
Unit vector, 83
Unitary, 107 Z
Upper triangular, 7 Zero matrix, 5

You might also like