2017 Book LinearAlgebra
2017 Book LinearAlgebra
Belkacem Said-Houari
Linear
Algebra
Compact Textbooks in Mathematics
Compact Textbooks in Mathematics
Linear Algebra
Belkacem Said-Houari
Department of Mathematics, College of Sciences
University of Sharjah
Sharjah, United Arab Emirates
Mathematics Subject Classification (2010): 15A03, 15A04, 15A18, 15A42, 15A63, 15B10, 11C20,
11E16
Preface
Contents
2 Determinants .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Belkacem Said-Houari
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Determinants by Cofactor Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.3 Properties of the Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.4 Evaluating Determinants by Row Reduction . . . . . . . . . . . . . . . . . . . . . . . . 79
2.4.1 Determinant Test for Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.5 The Adjoint of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.1 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Servicepart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
xiii
List of Figures
Fig. 1.1 The case where the system has a unique solution
.x0 ; y0 /: the solution is the intersection point
of two lines .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Fig. 1.2 The case where the system has no solution . . . . . . . . . . . . . . . . 2
Fig. 1.3 The two lines coincide and there are infinitely
many points of intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Fig. 1.4 The addition of two vectors in the xy-plane . . . . . . . . . . . . . . . . 8
Fig. 1.5 An easy way to perform the multiplication of two
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Fig. 2.1 To evaluate the 3 3 determinant, we take the
products along the main diagonal and the lines
parallel to it with a .C/ sign, and the products
along the second diagonal and the lines parallel
to it wit a ./ sing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Belkacem Said-Houari
In order to introduce the main ideas of Linear Algebra, we first study matrix algebra.
So, the first thing we begin with is the following simple linear equation:
ax D b; (1.1)
where a and b are two real numbers. We know from elementary algebra that if a ¤ 0,
then Eq. (1.1) has in R the unique solution
b
xD D a1 b: (1.2)
a
Next, suppose that we want to solve the following system of two linear equations in R2 :
(
ax C by D p;
(1.3)
cx C dy D q;
where a; b; c; d; p and q are real numbers. There are at least two ways of looking for
the solutions of (1.3): geometrically and algebraically.
Geometrically
It is well known that both equations in (1.3) are equations of lines in the xy-plane. Thus,
the solutions of the system in the xy-plane are the points of intersection of the two lines.
Therefore, we may distinguish the following three cases:
Case 1. The two lines intersect in exactly one point .x0 ; y0 / as in ⊡ Fig. 1.1. This is
the case where the slopes of the two lines are not the same. That is,
a c a c
¤ ; or equivalently, ¤ :
b d b d
In this case the system (1.3) has a unique solution .x0 ; y0 /.
2 Chapter 1 • Matrices and Matrix Operations
1
y ax + by = q
y0 ax + by = p
x
x0
⊡ Fig. 1.1 The case where the system has a unique solution .x0 ; y0 /: the solution is the intersection point
of two lines
ax + by = p
y
ax + by = q
Case 2. The two lines may be parallel and distinct, which occurs when
a c p q
D and ¤ :
b d b d
In this case there is no intersection and the system (1.3) has no solution. See ⊡
Fig. 1.2.
Case 3. The two lines may coincide, which occurs when
a c p q
D and D :
b d b d
In this case there are infinitely many points of intersection and consequently, there
are infinitely many solutions to (1.3). See ⊡ Fig. 1.3.
Algebraically
Algebraically, we may solve system (1.3) by at least two methods: The method of
substitution and the method of elimination. For the substitution method, we express
1.1 Systems of Linear Equations
3 1
ax + by = p
y ax + by = q
⊡ Fig. 1.3 The two lines coincide and there are infinitely many points of intersection
one of the two variables from the first equation in (1.3) and substitute it in the second
equation as follows:
a p
yD xC ; (1.4)
b b
provided that b ¤ 0 and then plugging expression (1.4) into the second equation of
(1.3), we find
It is clear that if
ad bc ¤ 0;
then the system (1.3) has a uniques solution. On the other hand, if ad bc D 0 and
pd bq ¤ 0, then Eq. (1.5) shows that system (1.3) has no solution. Finally, if ad bc D
0 and pd bq D 0, then system (1.3) has infinitely many solutions.
Thus, as we have said earlier, Eq. (1.1) has a unique solution if and only if
a ¤ 0:
On the other hand, system (1.3) has a unique solution if and only if
ad bc ¤ 0:
a1 x1 C a2 x2 C C an xn D b; (1.6)
f .X/ D a1 x1 C a2 x2 C C an xn D b;
Also, in a linear equation as (1.6) all variables occur only to the first power and the
equation does not involve any products of variables and they do not appear as arguments
of trigonometric, logarithmic, exponential, or other functions.
Example 1.1
The following equations are not linear:
p
x1 x2 D 1; x1 x2 C x2 C x3 D 0; x1 C 2 cos x C 3x3 D 1; x1 x2 D 0:
Definition 1.1.2
We define a system of linear equations as a set of m equations with n unknowns
8
ˆ
ˆ a11 x1 C a12 x2 C C a1n xn D b1 ;
ˆ
ˆ
< a21 x1 C a22 x2 C C a2n xn D b2 ;
ˆ ::: :: :: :: (1.8)
ˆ
ˆ : : :
:̂
am1 x1 C am2 x2 C C amn xn D bm ;
ⓘ Remark 1.1.2 As we have seen for the system of two equations (1.3), we will show in
the sequel that system (1.8) has either
▬ no solution, or
▬ exactly one solution, or
▬ infinitely many solutions.
As in Remark 1.1.1, we may write the system (1.8) using a linear transform from Rn
to Rm
0 1 0 1
f1 .X/ b1
B : C B : C
f .X/ D @ :: A D @ :: C
B C B
A;
fm .X/ bm
Now, going back to the system (1.8), let us assume that n D 3. Then each equation
is the equation of a plane in three-dimensional space. So, the solution of the system is
represented by a point in the intersection of m planes in the xyz-space. It is quite hard to
find the intersections of those planes. In general, the geometric method becomes hard to
apply if n 3: So, we rely on the algebraic method to solve such system for n 3: The
core problem of linear algebra is how to solve the system (1.8).
We can easily see that every homogeneous system has the zero vector X D 0 as a
solution.
Now, we introduce the following definition which will help us to rewrite system
(1.8) in a more convenient form.
(Continued )
6 Chapter 1 • Matrices and Matrix Operations
1
Definition 1.1.4 (continued)
be two vectors in Rn . Then the dot product or the inner product of X and Y is the
real number X Y defined as
X Y D Y X D x1 y1 C x2 y2 C C xn yn :
Vi X D bi ; 1 i m;
We may also write the dot product as a row vector times a column vector (with no dot):
0 1
x1
B : C
.ai1 ; : : : ; ain / B C
@ :: A D ai1 x1 C ai2 x2 C C ain xn :
xn
or equivalently as
AX D b; (1.12)
where
2 3 0 1 0 1
a11 a12 a13 ::: a1n x1 b1
6 7 B C B C
6 a21 a22 a23 ::: a2n 7 B x2 C B b2 C
AD6
6 :: :: :: :: :: 7
7; XDB C
B :: C ; and b D B C
B :: C : (1.13)
4 : : : : : 5 @ : A @ : A
am1 am2 am3 ::: amn xn bm
1.1 Systems of Linear Equations
7 1
Definition 1.1.5
In the above formulas, the rectangular array of numbers A is called a matrix. The
numbers aij ; 1 i m; 1 j n, are called the entries or coefficients of the
matrix A.a
a
See Chap. 6 for the definition of matrices through linear transformations.
The matrix A consists of m rows (horizontal lines) and n columns (vertical lines).
Now, if we want to solve system (1.8), then it is natural to consider the system in its
matrix form (1.12), since it looks similar to Eq. (1.1). Therefore, our first intuition is to
write the solution as in (1.2), that is
X D A1 b: (1.14)
The answers of the above questions are the building blocks of matrix algebra in
particular, and linear algebra in general.
One of the interesting cases is when m D n. In this case, we say that A is a square
matrix and we have following definition.
Example 1.2
1. The matrix
" #
1 0 2
AD p
1 2 3
is a square matrix and the entries of the main diagonal are 3; 0; and 9.
In order to define the addition of matrices, let us first consider two vectors in R2 ,
! !
x1 x2
X1 D and X2 D :
y1 y2
Each vector can be seen as 2 1 matrix. In order to define X1 C X2 , we first need to think
geometrically and draw both vectors in the xy-plane.
It is clear from ⊡ Fig. 1.4 that the vector X1 C X1 is
!
x1 C x2
X1 C X2 D : (1.15)
y1 C y2
Guided by the case for the 2 1 matrices, we can go ahead and define analogously the
addition of two m n matrices.
(Continued )
x
x2 x1 x1 + x2
1.1 Systems of Linear Equations
9 1
A C B D .aij C bij /; 1 i m; 1 j n:
a
The size of a matrix is described in terms of the rows and columns it contains.
It is clear from above that the addition of matrices is commutative, that is,
A C B D B C A:
Example 1.3
Consider the matrices
" # " #
1 0 1 001
AD and BD :
2 0 3 310
Using the same intuition we multiply a vector by a scalar, then this means
geometrically that we multiply each of its components by the same scalar. In a similar
manner we define the multiplication of a matrix by a scalar.
A D .aij / D .aij /; 1 i m; 1 j n:
Example 1.4
Take
" #
2 1
AD and D 2:
3 0
10 Chapter 1 • Matrices and Matrix Operations
1
Then
" #
4 2
2A D :
6 0
Notation We denote the set of scalars by K (K will ordinarily be R or C) and the set of m n
matrices by Mmn .K/. If m D n, then we write Mmn .K/ simply as Mn .K/.
ⓘ Remark 1.1.3 It is clear that for A and B in Mmn .K/, A C B is in Mmn .K/. Thus, the
addition .C/ ( binary operation)a in the set Mmn .K/ satisfies the closure property.
Also, we have seen that for any in K and for any A in Mmn .K/, A is in Mmn .K/ .
a
This binary operation is defined from Mmn .K/ Mmn .K/ ! Mmn .K/ and takes .A; B/ from Mmn .K/
Mmn .K/ to A C B in Mmn .K/.
Example 1.5
The following matrices are zero matrices:
" # " # " #
00 000 0
; ; ; Œ0:
00 000 0
Definition 1.1.10
A group is a nonempty set G together with a binary operation
GG ! G
.a; b/ 7! a b;
.a b/ c D a .b c/I
a e D e a D a;
for all a in G;
G3. (existence of the inverse) for each a in G, there exists a0 in G such that
a a0 D a0 a D e:
abDba
Example 1.6
One of the simplest Abelian group is the group of integers .Z; C/ with the addition .C/ as
the binary operation. The structure of group in this set gives meaning to the negative integers
as the inverses of positive integers with respect to the addition law .C/ .
1
Named after the Norwegian mathematician Niels Henrik Abel.
12 Chapter 1 • Matrices and Matrix Operations
1
Theorem 1.1.5
The set .Mmn .K/; C/ is a commutative group.
Proof
We saw earlier that addition in Mmn .K/ is a binary operation from Mmn .K/
Mmn .K/ ! Mmn .K/ W .A; B/ 7! A C B: It is not hard to see from Definition 1.1.7
that if A; B; and C are matrices in Mmn .K/, then
.A C B/ C C D A C .B C C/;
A C 0 D 0 C A D A;
for each matrix A in Mmn .K/. Thus, the zero matrix is the neutral element in Mmn .K/
with respect to .C/.
Now, for each A in Mmn .K/, A is also a matrix in Mmn .K/ and satisfies
A C .A/ D .A/ C A D 0:
Therefore, the matrix A is the inverse of A in Mmn .K/ with respect to the binary
operation .C/.
Next, since
A C B D B C A;
for any two matrices A and B in Mmn .K/, the group Mmn .K/ is commutative. This
completes the proof of Theorem 1.1.5. t
u
The next step is to define the multiplication of matrices. We have seen in (1.11) the
multiplication of a matrix by a vector. Now, consider a matrix B in Mpr .K/. Then B
can be written as
B D ŒB1 ; B2 ; : : : ; Br ;
1.1 Systems of Linear Equations
13 1
where Bj ; 1 j r are vectors in Mp1 .K/, that is,
0 1
b1j
B : C
Bj D B C
@ :: A :
bpj
Now, using the same idea as in (1.11), and assuming that p D n, we may find, for
instance
2 3
V1
6 : 7
6 : 7 B1 D C1 ;
4 : 5
Vm
where C1 is a vector in Mm1 .K/ whose first component is the dot product V1 B1 and
generally whose ith component is the dot product Vi B1 . In order for these dot products
to be defined, B1 must have the same number of components as Vi ; 1 i m. That is
p D n. If we do this for all the vectors Bj ; 1 j r, then we obtain the matrix
C D ŒC1 ; C2 ; : : : ; Cr ;
(Continued )
14 Chapter 1 • Matrices and Matrix Operations
1
Definition 1.1.11 (continued)
with
X
n
cik D ai1 b1k C ai2 b2k C C ain bnk D aij bjk :
jD1
In ⊡ Fig. 1.5 we show an easy way of how to multiply two matrices by positioning
them as in the ⊡ Fig. 1.5. So, to get the entry c22 , for instance, we multiply the entries of
the second row of the matrix A with the entries of the second columns of the matrix B.
Example 1.7
Consider the two matrices
2 3
12 " #
6 7 43
A D 43 45 and BD ;
21
01
Then
2 3
8 5
6 7
AB D 4 20 13 5 :
2 1
Example 1.8
Consider the square matrices
" # " #
ab 01
AD and BD :
cd 00
Solution
We first compute
" #" # " #
ab 01 0a
AB D D : (1.16)
cd 00 0c
1.1 Systems of Linear Equations
15 1
B : n rows r columns
b 11 b 12 ... b 1r
b 21 b 22 ... b 2r
12
b
.. .. .. ..
.
21
. . .
a
22
b
22
b n1 b n2 ... b nr
a
..
.
2
bn
n
a2
a 11 a 12 ... a 1n c 11 c 12 ... c 1r
a 21 a 22 ... a 2n c 21 c 22 ... c 2r
.. .. .. .. .. .. .. ..
. . . . . . . .
a m1 a m2 ... a mn c m1 c m2 ... c mr
where a and b are any real numbers. Some examples of such matrices are
" # " # " #
10 21 3 7
; ;
01 02 0 3
ⓘ Remark 1.1.6 It is clear that even if we can do the multiplications AB and BA, as in the
case of square matrices, for instance, then in general
AB ¤ BA:
Then
" # " #
01 00
AB D ; but BA D :
00 00
The proof is straightforward. We can just compute the two sides in (1.18) and find that
they are equal. We omit the details.
Thus, we have
X
n
AB D .˛ik /; with ˛ik D aij bjk
jD1
and
X
p
BC D .ˇjl /; with ˇjl D bjk ckl :
kD1
Therefore,
A.BC/ D .il /;
with
X
n X
n X
p
il D aij ˇjl D aij bjk ckl
jD1 jD1 kD1
X
n X
p
X
p
X
n
D aij bjk ckl D aij bjk ckl
jD1 kD1 kD1 jD1
p n
X X X
p
D aij bjk ckl D ˛ik ckl
kD1 jD1 kD1
D .AB/C:
Example 1.9
Consider the matrices
" # " # " #
1 1 2 3 2 6
AD ; BD ; CD :
2 4 4 5 3 7
Then
" #" # " #
6 2 2 6 18 22
.AB/C D D
20 14 3 7 82 22
18 Chapter 1 • Matrices and Matrix Operations
1
and
" #" # " #
1 1 5 33 18 22
A.BC/ D D :
2 4 23 11 82 22
Example 1.10
Consider the matrices:
2 3 2 3
1 4 " # 8 6 6
6 7 20 0 6 7
A D 4 2 3 5 ; BD ; C D 4 6 1 1 5 :
0 1 1
1 2 4 0 0
Solution
Since A is a 3 2 matrix and B is a 2 3 matrix, K must be a 2 2 matrix, otherwise, the
product AKB is not defined. Thus, we put
" #
ab
KD :
cd
Since the product of matrices is associative (Theorem 1.1.8), we first compute the product
AK as
2 3 2 3
1 4 " # a C 4c b C 4d
6 7 ab 6 7
AK D 4 2 3 5 D 4 3c 2a 3d 2b 5 :
cd
1 2 a 2c b 2d
We have introduced square matrices in Definition 1.1.6. In this section, we show that this
class of matrices plays a central role in matrix algebra in particular, and linear algebra
in general.
The class of square matrices enjoys very important algebraic properties. One of these
properties is that the set Mn .K/ has the closure property under multiplication. That is,
for any two matrices A and B in Mn .K/, the product AB is an element in Mn .K/. (This
does not hold for matrices in Mmn .K/ if m ¤ n). In other words, we have the binary
operation
Another property is that the multiplication is distributive over addition from the right
and from the left. That is, for all matrices A; B; and C in Mn .K/, one can easily
verify that
A.B C C/ D AB C AC (1.21)
and
X
m
mŠ
.A C B/m D Cmk Ak Bmk ; Cmk D :
kD0
kŠ.m k/Š
In particular, since the identity matrix commute with all matrices in Mn .K/, then
we have
X
m
.I C A/m D Cmk Ak :
kD0
Definition 1.2.1
Let R be a nonempty set on which two binary operations called addition .C/ and
multiplication ./ are defined. Then, R is a ring (with respect to the given addition
and multiplication) if:
R1. .R; C/ is an Abelian group;
R2. multiplication is associative, that is
a .b c/ D .a b/ c;
a .b C c/ D a b C a c
and
.b C c/ a D b a C c a:
ab Dba
a e D e a D a;
Here we define the identity matrix precisely as the identity element in a ring R.
Definition 1.2.2
Let I be a square matrix in Mn .K/. Then I is an identity matrix if and only if (here
we omit “” in the multiplication)
AI D IA D A; (1.23)
We can easily check, by using the definition of the product of two matrices
(Definition 1.1.11), that (1.23) holds if and only if the matrix I has the form
2 3
1 0 0 0
60 1 0 07
6 7
6 7
ID6
6
0 0 1 07;
7 (1.24)
6 :: :: :: :: :: 7
4: : : : :5
0 0 0 1
i.e., means that all entries are zero except the entries aii on the main diagonal, which are
aii D 1.
Notation In the sequel, we will sometimes denote by In the identity matrix in Mn .K/.
Example 1.12
The following are examples of identity matrices
2 3
" # 100
10 6 7
I D Œ1; ID ; I D 40 1 05:
01
001
22 Chapter 1 • Matrices and Matrix Operations
1
Theorem 1.2.2
The set of square matrices .Mn .K/; C; / with the binary operations .C/ and ./
introduced in Definitions 1.1.7 and 1.1.11, respectively, is a unitary noncommutative
ring.
Proof
We know from Theorem 1.1.5 that .Mn .K/; C/ is an Abelian group. Since (1.19)–(1.22)
are also satisfied, .Mn .K/; C; / is a ring. It is clear that this ring is noncommutative since
the multiplication of matrices is noncommutative. This ring has also an identity element, the
identity matrix defined above. t
u
As we have seen before, the solution of the linear equation (1.1) is given by (1.2). The
constant a in (1.1) can be seen as a square matrix in M1 .K/ and a1 is the inverse matrix
of a in M1 .K/. So, the solution in (1.2) is defined only if a1 exists. Thus, the natural
question is now whether we can generalize this idea to any square matrix in Mn .K/,
with n 1? In other words, can we write a solution of system (1.12) in the case m D n
in the form
X D A1 b (1.25)
where A1 is the inverse matrix of A analogously as in (1.2)? To answer this question,
we need first to define A1 . For n D 1 and a ¤ 0 we have a1 D 1=a and satisfies
a ¤ 0: (1.27)
So, as we indicated above, if we think about the constants in (1.26) as square matrices
in M1 .K/ and 1 as the identity matrix in M1 .K/, then we can define the inverse of the
matrix a as a new matrix a1 in M1 .K/ satisfying (1.26). Now it is quite obvious how
to extend this definition to matrices in Mn .K/; n 1 and introduce the inverse of the
matrix A as follows.
AB D BA D I: (1.28)
1.2 Square Matrices
23 1
Notation The inverse of the matrix A is denoted by A1 . Thus, (1.28) reads
AX D b (1.30)
Therefore,
X D A1 b: (1.31)
We can immediately answer the first question in the negative. As a simple example,
for a matrix a in M1 .K/, a1 exists if and only if a ¤ 0. Also, the zero matrix 0 in
Mn .K/ has no inverse since for any matrix A in Mn .K/
0A D A0 D 0 ¤ I;
which violates the definition of the inverse A1 . So, we need a criteria to determine
which square matrices in Mn .K/ have inverses in Mn .K/. This will be investigated in
the coming sections.
For the second question, we have the following theorem.
AB D BA D I; and AC D CA D I: (1.32)
Now, since the multiplication of matrices is associative (Theorem 1.1.8), we have that
B D C;
Finally, to answer the third question is really a challenging problem especially if the
matrix A is of large size. To understand why this is the case, let us consider a 22 matrix
and compare the amount of work required with that in the case of an 1 1 matrix.
So, we consider the matrix A in M2 .K/ given by
" #
ab
AD
cd
and try to find the inverse A1 . Actually, there are at least two obvious methods of how
to proceed. First, we may just assume that A1 exists as a matrix in M2 .K/ and apply
(1.29) to find the entries of A1 . The second method is based on the strong connection
between the inverse of A and the solution of the linear system (1.30). That is, if we know
A1 , then we know the solution by formula (1.31). Conversely, if the solution of system
(1.30) exists, then it should be written in the form (1.31). Consequently, our strategy is
to solve the 2 2 system where A is the matrix of coefficients and then, once we found
the solution, we try to write it in the form (1.31) and thus obtain A1 .
We consider system (1.3), that is
(
ax C by D p;
(1.35)
cx C dy D q:
pd bq
xD :
ad bc
1.2 Square Matrices
25 1
Plugging this expression in the first equation of (1.35), we get
a p aq bc
yD xC D : (1.36)
b b ad bc
Therefore, the solution of (1.35) is
" #
1 pd bq
XD
ad bc aq bc
ad bc ¤ 0: (1.38)
Theorem 1.2.4
If ad bc ¤ 0, then the inverse of the square matrix
" #
ab
AD
cd
is given by
" #
1 1 d b
A D : (1.39)
ad bc c a
We can plainly see how the level of difficulty of finding the inverse changes from
the 1 1 matrix to the 2 2 matrix. For the 1 1, matrix, the inverse exists if and only if
(1.27) is satisfied and then a1 D 1=a. On the other hand, the inverse of the 2 2 matrix
exists if and only if (1.38) is verified, and then the inverse is given by (1.39). So, as we
have seen above, the difficulty of finding A1 is increasing. We will see in the coming
sections other methods for finding the inverse A1 of a matrix A in Mn .K/ with n 3.
Example 1.13
Find the inverses of the following matrices:
" # " #
10 12
AD ; BD :
23 12
26 Chapter 1 • Matrices and Matrix Operations
1
Solution
For the matrix A, since ab cd D 3 ¤ 0, A1 exists and applying (1.39), we get
" # " #
1 1 3 0 1 0
A D D :
3 2 1 2=3 1=3
For the matrix B and since ab cd D 0, then B1 does not exists. J
We have defined before the product of two matrices, so the question now is how is
the inverse related to the product. For matrices in M1 .K/, we have
But keep in mind that this is true since the product of matrices in M1 .K/ is commutative,
while we already know that the product of matrices in Mn .K/; n > 1 is not commutative
in general. So, only one of the above equalities is true for matrices, and as we will see,
it is the second one.
Proof
Since the multiplication of matrices is associative, we can write
Similarly,
D B1 IB D I:
Consequently, by (1.29), B1 A1 is an inverse of AB, and since the inverse of a matrix is
unique (Theorem 1.2.3), then B1 A1 is the only inverse of AB. u
t
ⓘ Remark 1.2.6 Using induction, we may easily generalize (1.40) for any finite number
of matrices as
Solution
In the first method, we compute the product AB and then we use Theorem 1.2.4 to find the
inverse .AB/1 . We have
" #
1 2
AB D ;
0 2
and so
" # " #
1 1 2 2 1 1
.AB/ D D :
2 0 1 0 1=2
In the second method, we use Theorem 1.2.5. Thus, we have, by using (1.39),
" # " #
1 1 20 1 0
A D D
2 01 0 1=2
and
" # " #
1 1 2 1 2
B D 1 D :
0 1 0 1
Therefore,
" #
1 1 1 1 1
.AB/ DB A D :
0 1=2
Theorem 1.2.7
Let A be an invertible matrix in Mn .K/ and let ¤ 0 be a scalar in K. Then .A1 /1
and .A/1 exist and we have
1 1
.A1 /1 D A; and .A/1 D A : (1.41)
28 Chapter 1 • Matrices and Matrix Operations
1
Proof
The first property in (1.41) is trivial. To prove the second one, we have
1 1
.A/ A1 D AA1 D I:
Similarly,
1
A1 .A/ D I:
Now we can collect the above properties of invertible matrices in Mn .K/ and give them
an algebraic structure.
Proof
To prove the theorem, we can simply verify the assumptions in Theorem 1.1.10. First, it is
clear that the GL.n; K/ is not empty, since the identity matrix lies in this set. In addition, it is
clear from Theorem 1.2.5 that if A and B are two elements of GL.n; K/, then AB is also an
element of GL.n; K/. Since the multiplication of matrices is associative in Mn .K/, it is also
associative in GL.n; K/.
Furthermore, it is clear that I is the identity element in GL.n; K/. Next, for any A in
GL.n; K/ there exists A1 in GL.n; K/ satisfying (1.29). Thus, .GL.n; K/; / is a group. It is
non-Abelian since, we know that multiplication is noncommutative: for example take
" # " #
10 11
AD ; BD :
02 03
t
u
In the next theorem we exhibit the relationship between invertible matrices and
homogeneous systems of linear equations.
1.2 Square Matrices
29 1
Theorem 1.2.9
Let A be a square matrix in Mn .K/. Then the following two properties are
equivalent:
1. the matrix A is invertible;
2. the homogeneous system associated to the matrix A has the trivial solution X D 0
as the unique solution.
Proof
We first need to show that (1) implies (2). So, assume that A is invertible. Then the
homogenous system associated to A has the solution
X D A1 b (1.42)
where b D 0 is the zero vector. Since the inverse of A is unique, the solution X is uniquely
defined by (1.42), and so X D 0 is the unique solution of the homogeneous system. We leave
it to the reader to show that (2) implies (1), which can be done in several ways. t
u
As we have stated before, since one of the main goals of matrix algebra is to provide
necessary conditions for the invertibility of a square matrix A and ways to calculate its
inverse A1 . Hence, we want to characterize at least some particular sets of square
matrices, where we can easily determine if matrices from these sets are invertible or not
and compute the inverses if possible. Among these sets are the set of diagonal matrices
and triangular matrices.
First, we exclude some classes of square matrices that have no inverse.
Theorem 1.2.10
A square matrix that has either a zero row or a zero column is not invertible.
Proof
Let A be a square matrix in Mn .K/ with a zero row. Then, for any matrix B in Mn .K/,
the corresponding row in the product AB is also zero. So, AB cannot be the identity matrix.
Similarly, if A has a zero column, then the product BA has a zero column, so, BA cannot be
the identity matrix. t
u
Example 1.15
The matrices
2 3
" # 101
00 6 7
; 40 0 35
32
502
Here we introduce the set of diagonal matrices that plays an important role in the theory
of matrices.
Example 1.16
The following matrices are diagonal:
2 3 2 3 2 3 2 3
" # 1 0 0 50 0 100 000
20 6 7 6 7 6 7 6 7
AD ; B D 4 0 3 0 5; C D 40 0 0 5; I D 40 1 05; 0 D 40 0 05:
01 p
0 0 00 2 001 000
The next, theorem provides an easy test, which tells us if a diagonal matrix is
invertible or not and gives us right away the inverse.
satisfies
DB D BD D I;
which means that B is an inverse of D and since the inverse is unique (Theorem 1.2.3),
B D D1 :
DK D KD D I: (1.44)
and
2 3
d1 k11 d2 k12 d3 k13 : : : dn k1n
6d k d k d k : : : dn k2n 7
6 1 21 2 22 3 23 7
KD D 6
6 :: :: :: :: :: 77:
4 : : : : : 5
d1 kn1 d2 kn2 d3 kn3 : : : dn knn
di kii D 1; 1 i n:
Solution
For the matrix A, since all the entries of its main diagonal are nonzero, A is invertible and
2 3
1=4 0 0
6 7
A1 D 4 0 1=3 0 5 :
0 0 1=2
On the other hand, since in the matrix B one entry of the main diagonal is zero, B is not
invertible i.e., B1 does not exist. J
A2 A3 Ak
eA D I C A C C C C C (1.45)
2Š 3Š kŠ
DkC1 D Dk D
Example 1.18
Consider the matrix
2 3
1 0 0
6 7
A D 4 0 2 0 5:
p
0 0 2
Find A6 .
Solution
Since A is diagonal, Theorem 1.2.12 shows that
2 3 2 3
.1/6 0 0 1 0 0
6 7 6 7
A6 D 4 0 .2/6 0 5 D 4 0 64 0 5 :
p 6
0 0 . 2/ 0 0 8
J
34 Chapter 1 • Matrices and Matrix Operations
1
Example 1.19
Consider the matrix
2 3
1 0 0
6 7
A D 4 0 1 0 5 :
0 0 3
Solution
Since A is diagonal, and the main diagonal does not contain zero, it follows (see Theo-
rem 1.2.11) that A1 exists and can be computed easily as
2 3
1 0 0
6 7
A1 D 4 0 1 0 5 :
0 0 1=3
Example 1.20
Find an invertible diagonal matrix A that satisfies
2 3
16 0 0
6 7
A2 D 4 0 9 05:
0 01
Solution
Take A in the form
2 3
d1 0 0
6 7
A D 4 0 d2 0 5 ;
0 0 d3
1.2 Square Matrices
35 1
with di ¤ 0; i D 1; 2; 3. The inverse of A is
2 3
1=d1 0 0
6 7
A1 D 4 0 1=d2 0 5 ;
0 0 1=d3
Therefore,
2 3 2 3
.1=d1 /2 0 0 16 0 0
6 7 6 7
A2 D .A1 /2 D 4 0 .1=d2 /2 0 5 D 4 0 9 05:
2
0 0 .1=d3 / 0 01
Whence
1 1 1
D 4; D 3; D 1:
d1 d2 d3
This yields
1 1
d1 D ; d2 D ; d3 D 1:
4 3
Therefore, the matrix A is given by
2 3
1=4 0 0
6 7
A D 4 0 1=3 0 5 :
0 0 1
We have introduced the diagonal matrices in Sect. 1.2.4 and showed that these matrices
have very important properties. In particular, we have shown that we can immediately
know if a diagonal matrix is invertible or not and if it is invertible, then we can easily
find its inverse. Now, since the diagonal matrices form a very narrow set in the class of
square matrices Mn .K/, it is quite natural to ask the following question: is there a larger
set of square matrices than the set of diagonal matrices which enjoy some properties of
the diagonal matrices? The answer is yes and this class of square matrices consists of
the so-called triangular matrices.
is lower triangular.
Obviously, every diagonal matrix is triangular.
Theorem 1.2.13
Let A be a triangular matrix in Mn .K/. Then, A is invertible if and only if all the entries
of the main diagonal are nonzero.
Proof
We prove the statement for upper triangular matrices. The proof for lower triangular matrices
is similar.
Let A be an upper triangular matrix in Mn .K/, then A has the form
2 3
a11 a12 a13 a1n
6 7
6 0 a22 a23 a2n 7
6 7
6 a3n 7
AD6 0 0 a33 7;
6 : :: :: :: :: 7
6 : : 7
4 : : : : 5
0 0 0 ann
1.2 Square Matrices
37 1
that is aij D 0 for all i > j. The linear homogeneous system associated to the matrix A is
It is clear that if ann ¤ 0, then the last equation in (1.49) has only one solution xn D 0.
Inserting this value into the equation just before the last one, we deduce that if a.n1/.n1/ ¤
0, then xn1 D 0 is the unique solution. If we apply the same procedure to all the equations
in (1.49), we deduce that if aii ¤ 0; 1 i n, then the only solution of (1.49) is the trivial
solution X D 0. Consequently, applying Theorem 1.2.9, we conclude that A is invertible if
and only if aii ¤ 0; 1 i n. t
u
1.2.6 Trace
As we have seen, for diagonal and triangular matrices, the entries of the main diagonal of
those matrices are very important and by examining those entries we can immediately
identify the invertible matrices. Now, since the entries of the main diagonal are also
important in a general square matrix, so can we gain something by doing the usual
algebraic operations on those entries? For example, for diagonal and triangular
matrices, if the product of all the entries of the main diagonal is not zero, then the
matrix is invertible. Now, what about the sum of the entries of the main diagonal in a
square matrix, does it give us anything? The answer is affirmative, and as we will see
later, it turns out to be very useful. We call this sum the trace of the square matrix.
X
n
tr.A/ D aii : (1.50)
iD1
Example 1.22
Consider the matrices
2 3
1 0 2 " #
6 7 b11 b12
A D 4 3 4 0 5; BD :
b21 b22
1 5 2
38 Chapter 1 • Matrices and Matrix Operations
1
Then we have
Theorem 1.2.14
Let A and B be two square matrices in Mn .K/ and k be a scalar. Then:
1. tr.A C B/ D tr.A/ C tr.B/.
2. tr.AT / D tr.A/.
3. tr.kA/ D k tr.A/.
4. tr.AB/ D tr.BA/.
In fact the last property holds for A in Mmn .K/ and B in Mnm .K/. Here AT denotes
the transpose of A (see Definition 1.4.1).
Proof
Properties (1)–(3) are trivial and follow directly from the definition. So, we only need to
show property (4). We have, by the definition of the multiplication of matrices,
X
n
AB D .cik / with cik D aij bjk ; 1 i m; 1 j n; 1 k m:
jD1
Hence,
X
m m X
X n
tr.AB/ D cii D aij bji
iD1 iD1 jD1
m X
X n
D bji aij
iD1 jD1
n X
X m
D bji aij
jD1 iD1
D tr.BA/:
t
u
1.3 Solving Linear Systems with Elementary Row Operations
39 1
Example 1.23
Use Theorem 1.2.14 to show that we cannot find two square matrices A and B in Mn .R/
such that
AB BA D I; (1.51)
Solution
We assume that (1.51) holds and show that this leads to a contradiction. Indeed, if (1.51)
holds, then by Theorem 1.2.14 we have
tr.AB/ D tr.BA/:
Whence
tr.I/ D n:
This is a contradiction. Hence, there are no matrices A and B such that (1.51) holds.
J
As we have seen above, in order to find the solution of a linear system (of n equations
and n unknowns) it is enough to compute the inverse of its associated n n matrix A.
Moreover, since it is very simple to find the inverse of a diagonal matrices, it is quite
simple to solve the systems associated to them. We know from elementary algebra that
if we add an equation in the system to another one and then replace the original equation
by the sum of the two, then the solution does not change. For example, in system (1.3),
if we replace the second equation, by the sum of the two equations, we obtain
(
ax C by D p;
(1.52)
.a C c/x C .b C d/y D p C q:
Thus, if we assume that ad bc ¤ 0, then the solution of (1.52) is the same solution of
(1.3). In matrix language, this operation is equivalent to replace the second row in the
matrix A
" #
ab
AD
cd
40 Chapter 1 • Matrices and Matrix Operations
1
to get
" #
a b
aCc bCd
by
" #
p
:
pCq
For simplicity, we may collect these operations in one matrix and transform the matrix
" #
ab p
BD (1.53)
cd q
The matrix B in (1.53) is called the augmented matrix associated to the system (1.3).
Similarly, the same thing is true if we replace a row r in the augmented matrix by the
product kr, where k is a scalar.
The following elementary row operations, will not change the solution of (1.8):
▬ Multiply a row through by a nonzero constant.
▬ Interchange two rows.
▬ Add a constant multiple of a row to another row.
1.3 Solving Linear Systems with Elementary Row Operations
41 1
1.3.1 The Gauss–Jordan Elimination Method
This method is simply based on some row operations that lead to the simplest diagonal
matrix (the identity matrix if possible) for which the inverse matrix can be easily
computed if it exists. To apply the method, and for simplicity, we use the augmented
matrix described in Definition 1.3.1. Essentially, the idea is to reduce the augmented
matrix ŒAjb, where A is a square matrix in Mn .K/ and b is a vector in Mn1 .K/, to
the form ŒDjc where D is a diagonal matrix in Mn .K/ and c is a vector in Mn1 .K/,
or simply to ŒIjd, where I is the identity matrix in Mn .K/ and d is in Mn1 .K/. In this
case the solution of the system
AX D b (1.55)
AX D b;
where
2 3 2 3 2 3
2 1 1 x1 5
6 7 6 7 6 7
A D 4 0 8 2 5 ; X D 4 x2 5 and b D 4 12 5 :
0 8 3 x3 14
So, we want to get zeros everywhere except on the main diagonal. Let us denote in each
step of the row operation the obtained first, second, and third rows by r1 ; r2 , and r3
respectively. So, first in the matrix B we replace r3 by r3 C r2 and obtain
2 3
2 1 1 5
6 7
4 0 8 2 12 5 : (1.57)
0 0 1 2
42 Chapter 1 • Matrices and Matrix Operations
1
Next, in (1.57) we replace r1 by 8r1 C r2 and obtain
2 3
16 0 6 28
6 7
4 0 8 2 12 5 : (1.58)
0 0 1 2
1
Finally, in (1.60) we replace r1 by r
16 1
and r2 by 18 r2 obtaining
2 3
100 1
6 7
40 1 0 15: (1.61)
001 2
Now, since the inverse of the identity matrix is itself, we deduce from (1.61) that
2 3
1
6 7
X D 415
2
i.e., all components are zero except the ith component which is 1. In this way, we get the
augmented matrices ŒIjbi and the corresponding solutions Xi D bi . For each vector ei
the steps are the same: apply the Gauss–Jordan method to the augmented matrix ŒAjei
to get the new augmented matrix ŒIjbi . Hence, we can do all the steps simultaneously
and transform the matrix
ŒAje1 ; e2 ; : : : ; en
ŒIjb1 ; b2 ; : : : ; bn :
Now, since e1 ; e2 ; : : : ; en are the column vectors of the identity matrix I, if we denote
by B the matrix which has b1 ; b2 ; : : : ; bn as column vectors then the above procedure is
equivalent to transform the matrix
ŒAjI
ŒIjB:
Solution
We apply the Gauss–Jordan method to find A1 . Consider the matrix
2 3
584 100
6 7
42 3 2 0 1 05:
121 001
We apply elementary row operations and in each step, we denote by r1 ; r2 , and r3 the rows of
the new matrix. First, we replace r2 by r2 2r3 and get
2 3
5 8 4 10 0
6 7
4 0 1 0 0 1 2 5 :
1 2 1 00 1
Consequently,
2 3
1 0 4
6 7
A1 D 4 0 1 2 5 :
1 2 1
Example 1.25
Consider the matrix
2 3
100
6 7
A D 45 4 05:
101
Show that A1 exists and use elementary row operations (Gauss–Jordan method) to find A1 .
Solution
Since A is a lower triangular matrix and the entries of its main diagonal are nonzero, the
inverse exists (Theorem 1.2.13).
To find A1 , use elementary row operation to transform the matrix
ŒAjI
ŒIjB:
Let r1 ; r2 , and r3 be the rows of all the matrices obtained by means of row operations. We
replace in (1.62) r2 by r2 5r1 to get
2 3
100 1 00
6 7
4 0 4 0 5 1 0 5 ; (1.63)
101 0 01
Consequently,
2 3
1 0 0
6 7
A1 D 4 5=4 1=4 0 5 :
1 0 1
Example 1.26
Find the inverse of the matrix
2 3
0 0 0 k1
6 7
60 0 k2 07
AD6 7
40 k3 0 05
k4 0 0 0
We may exchange the rows as follows: r1 and r4 , and then r3 and r2 , to obtain
2 3
k4 0 0 0 0 0 0 1
6 7
60 k3 0 0 0 0 1 07
6 7: (1.67)
40 0 k2 0 0 1 0 05
0 0 0 k1 1 0 0 0
1 1 1 1
Now, in (1.67) we replace r1 by r1 , r2 by r2 , r3 by r3 , and r4 by r4 , obtaining
k4 k3 k2 k1
2 3
1 0 0 0 0 0 0 1=k4
6 7
60 1 0 0 0 0 1=k3 0 7
6 7:
40 0 1 0 0 1=k2 0 0 5
0 0 0 1 1=k1 0 0 0
Example 1.27
Let k ¤ 0 be a real number. Consider the matrix
2 3
k10
6 7
A D 40 k 15:
00k
Show that A1 exists and use elementary row operations to find A1 .
Solution
Since A is an upper triangular matrix, the inverse exists if and only if all the entries of the
main diagonal are nonzero. So, since we took k ¤ 0, A1 exists.
48 Chapter 1 • Matrices and Matrix Operations
1
To find it, we use elementary row operations to transform the matrix
ŒAjI
ŒIjB:
As above, let r1 ; r2 and r3 be the rows of all the matrices obtained from row operations. In
(1.68) we replace r2 by kr2 r3 to get
2 3
k 1 0 10 0
6 2 7
4 0 k 0 0 k 1 5 : (1.69)
0 0 k 00 1
1 1
and then in (1.70), replace r1 by r ,r
k3 1 2
by r,
k2 2
and r3 by 1k r3 to find
2 3
1 0 0 1=k 1=k2 1=k3
6 7
4 0 1 0 0 1=k 1=k2 5 :
001 0 0 1=k
Consequently,
2 3
1=k 1=k2 1=k3
6 7
A1 D 4 0 1=k 1=k2 5 :
0 0 1=k
J
1.4 The Matrix Transpose and Symmetric Matrices
49 1
Example 1.28
Show that the matrix
2 3
1 6 4
6 7
A D 4 2 4 1 5
1 2 5
is not invertible.
Solution
To show that A is not invertible, it suffices to do some row operations and find one row which
has only zeros.
So, we consider the matrix
2 3
1 6 4 100
6 7
4 2 4 1 0 1 0 5 : (1.71)
1 2 5 0 0 1
Since the third row in left-hand side of (1.74) contains only zeros, A is not invertible.
J
In this section, we introduce two important notions: the transpose of a matrix and
symmetric matrices.
50 Chapter 1 • Matrices and Matrix Operations
1
1.4.1 Transpose of a Matrix
Using the first notation, we can write the system (1.8) in the matrix from (1.12), with
A the matrix given in (1.13). The question now is: can we write the system (1.8) in a
matrix form using the second notation for the vector X? To do this, we recast (1.8) as
2 3
a11 a21 a31 ::: am1
6 7
6 a12 a22 a32 ::: am2 7
.x1 ; : : : ; xn / 6
6 :: :: :: :: :: 7
7 D .b1 ; : : : ; bm /:
4 : : : : : 5
a1n a2n a3n ::: amn
Example 1.29
Let
" #
102
AD :
340
Then
2 3
1 3
6 7
AT D 4 0 45:
2 0
Proof
The first three properties are direct consequences of the definition of the transposed matrix.
We need to prove the last two. The proof of (4) can be done by a direct computation. So,
assume that A is a matrix in Mmn .K/ and B is a matrix in Mnr .K/. Then,
AB D C D .cik /; 1 i m; 1 k r;
with
X
n
cik D aij bjk :
jD1
Hence,
X
n
CT D .AB/T D .cki /1im D bkj aji 1im D BT AT :
1kr 1kr
jD1
t
u
We would also like to know how to find the inverse of the transpose AT if we know
the inverse of A? The answer is given in the following theorem.
Proof
We can establish the invertibility and obtain the formula at the same time, by showing that
and
.A1 /T AT D .AA1 /T D I T D I:
AT D A: (1.76)
Example 1.30
The following matrices are symmetric:
2 3 2 3
" # 1 4 5 d1 0 0
12 6 7 6 7
; 4 4 3 0 5 ; 4 0 d2 0 5 :
24
5 0 2 0 0 d3
Theorem 1.4.3
Let A be a matrix in Mn .K/. Then AAT ; AT A, and A C AT are symmetric matrices.
1.5 Exercises
53 1
Proof
First, for the matrix B D AAT , Theorem 1.4.1 shows that
Thus, B is symmetric.
Second, by the same method we have for C D AT A
Therefore, C is symmetric.
Finally, for D D A C AT , then, we have, again by Theorem 1.4.1,
DT D .A C AT /T D AT C .AT /T D AT C A D D;
1.5 Exercises
Exercise 1.1
We consider, for any real number x, the matrix
" #
cosh x sinh x
AD :
sinh x cosh x
Solution
1. We have
" #" #
cosh x sinh x cosh y sinh y
A.x/A. y/ D
sinh x cosh x sinh y cosh y
" #
cosh x cosh y C sinh x sinh y cosh x sinh y C sinh x cosh y
D
cosh x sinh y C sinh x cosh y cosh x cosh y C sinh x sinh y
" #
cosh.x C y/ sinh.x C y/
D
sinh.x C y/ cosh.x C y/
D A.x C y/; (1.77)
2. It is clear that A0 D I2 D A.0/: Now, let n > 0; then by (1) above we have
Therefore, the uniqueness of the inverse shows that ŒA.x/1 D A.x/, and by definition
Ap D ŒA1 p ; p > 0. Hence, we have for n D p < 0,
ŒA.x/n D A.nx/:
Solution
First, let us examine the simple case n D 2; then we will generalize it for all n. So, let
" # " #
d1 0 a11 a12
DD and AD :
0 d2 a21 a22
1.5 Exercises
55 1
We can easily verify that
" # " #
d1 a11 d1 a12 d1 a11 d2 a12
DA D and AD D :
d2 a21 d2 a22 d1 a21 d2 a22
So, we see that the multiplication of the matrix A from the left by D is effected by multiplying
the successive rows of A by the successive diagonal entries of D, and the multiplication of A
from the right by D is effected by multiplying the successive columns of A by the successive
diagonal entries of D.
Now, we want to show that this property holds for any n 2. So, let A D .ajk /; 1 j
n; 1 k n and D D dij ; 1 i n; 1 j n with dij D 0 for i ¤ j and dii D di . Using
Definition 1.1.11, we get
DA D .cik /; 1 i n; 1 k n;
X
n
cik D dij ajk D dii aik D di aik ; 1 k n:
jD1
Thus,
2 3
d1 a11 d1 a12 d1 a13 : : : d1 a1n
6d a d a d a : : : d2 a2n 7
6 2 21 2 22 2 23 7
DA D 6
6 :: :: :: :: :: 77:
4 : : : : : 5
dn an1 dn an2 dn an3 : : : dn ann
56 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. Assume that A is nilpotent of order k. We want to show that .I C A/1 exists. In the case
of real 1 1 matrices, we have (under some assumptions) the Taylor series expansion
X1
1
.1 C a/1 D D 1 a C a2 a3 C D .1/n an :
aC1 nD0
X
k1
B D I A C A2 A3 C D .1/n An : (1.79)
nD0
The sum in the above equation will be finite since Ak D 0. It remains to verify that B is the
inverse of .I C A/, that is we have to prove that
.I C A/B D B.I C A/ D I:
X
k1
.I C A/B D .I C A/ .1/n An
nD0
X
k1 X
k1
D .1/n An C .1/n AnC1
nD0 nD0
X
k1 X
k2
D .1/n An C .1/n AnC1
nD0 nD0
X
k1 X
k1
D .1/n An C .1/n1 An
nD0 nD1
D I:
B D .I C A/1 :
Exercise 1.4
Let A be a matrix in M2 .K/ of the general form
" #
ab
AD :
cd
Show that
Solution
We compute first A2 :
" #
a2 C bc b.a C d/
A2 D AA D
c.a C d/ cb C d2
and then
" # " #
a.a C d/ b.a C d/ ad bc 0
.a C d/A D ; .ad bc/I2 D :
c.a C d/ d.a C d/ 0 ad bc
2
As we will see later, the number a C d is called the trace of A, the number ad bc is called the determinant
of A, and the polynomial p./ D 2 .a C d/ C .ad bc/ is called the characteristic polynomial of A. See
Definition 7.3.2.
58 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.5 (Idempotent Matrices)
Let A be a matrix in Mn .K/. Then A is said to be idempotent if A2 D A.
1. Show that if A is idempotent, then so is I A.
2. Show that if A is idempotent, then 2A I is invertible and is its own inverse.
3. Find all the idempotent matrices in M2 .R/.
4. Show that if A is idempotent, and if p is a positive integer, then Ap D A.
Solution
1. Since AI D IA D A, we can easily show that
.I A/2 D I 2A C A2
D I 2A C A
D I A;
where we have used the fact that A2 D A. This shows that I A is idempotent since
.I A/2 D I A:
2. We have
D A2 C .A I/2 C 2A.A I/
D A2 C .I A/2 C 2A2 2A
D I;
where we have used the fact that A and I A are idempotent matrices. Consequently,
.2A I/1 D 2A I:
Then, we have
" #" # " #
2 ab ab a2 C bc ab C bd
A D D :
cd cd ac C cd bc C d2
1.5 Exercises
59 1
Therefore, A2 D A leads to the system of equations
8 2
ˆ
ˆ a C bc D a;
ˆ
< ab C bd D b;
(1.80)
ˆ
ˆ ac C cd D c;
:̂
bc C d2 D d:
From the first and the third equations, we deduce that if b D 0 and a C d ¤ 1, then a D 0
or a D 1 and c D 0. If a C d D 1, then c can be any real number. Then from the fourth
equation, we deduce that a D 1 or d D 0. Then in this case, the idempotent matrices are
" # " # " # " #
00 10 00 10
; ; ; :
00 01 c1 c0
If b ¤ 0, then, from the second equation, we have a C d D 1 and from the first equation,
2 2
we have c D a ba D dd b . Thus, the idempotent matrices of M2 .R/ are the matrices of the
form
2 3
a b
4 a a2 5
1a
b
Ap D A; (1.81)
for any positive integer p. It is clear that (1.81) is satisfied for p D 1 and p D 2. Now, assume
that (1.81) holds for p and show that it is still holds for p C 1. We have
ApC1 D Ap A D AA D A2 D A:
60 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. Since R./ is a matrix in M2 .R/, using (1.39) we deduce that
" #
1 1 cos sin
R ./ D
cos2 C sin2 sin cos
" #
cos./ sin./
D
sin./ cos./
D R./:
The above result means that rotating by 1 and then by 2 , is the same as rotating by 1 C 2 .
J
A2 D I:
is an involutory matrix.
2. Find all the involutory matrices in M2 .R/.
3. Show that a matrix A is involutory if and only if
.I A/.I C A/ D 0:
1.5 Exercises
61 1
Solution
1. We need to verify that A2 D I. By a simple computation,
" #" #
2 cos sin cos sin
A D AA D
sin cos sin cos
" #
cos2 C sin2 0
D
0 cos2 C sin2
" #
10
D D I:
01
If b D 0, then a D ˙1 and d D ˙1. Thus, the third equation in the above system gives: if
a D 1 and d D 1 or a D 1 and d D 1, then c D 0, in the other cases a D 1 and d D 1
or a D 1 and d D 1, then c can be any real number. Therefore, for b D 0, the involutory
matrices are
" # " # " # " #
10 1 0 1 0 1 0
; ; ; :
01 0 1 c 1 c 1
.I A/.I C A/ D I 2 A2 AI C IA D I I D 0:
1 1
B2 D BB D .I C A/ .I C A/
2 2
1
D .A2 C 2IA C I 2 /
4
1
D .I C 2A C I/
4
1
D .I C A/ D B;
2
Exercise 1.8
Let A and B be two matrices in Mn .K/ and I be the identity matrix in Mn .K/. Check that if
I C AB is invertible, then I C BA is invertible, and find its inverse.
Solution
Assume that I C AB is invertible, that is .I C AB/1 exists. Now, a matrix C in Mn .K/ is the
inverse of .I C BA/ if and only if
.I C BA/C D I;
leads to (since the of multiplication is associative and distributive over the addition)
C C B.AC/ D I:
Or, equivalently,
B.AC/ D I C: (1.84)
.AB/.AC/ D A AC;
1.5 Exercises
63 1
whence
AC C .AB/.AC/ D A:
That is
.I C AB/.AC/ D A:
AC D .I C AB/1 A: (1.85)
C.BA/ D I C: (1.86)
C.BA/B D B CB:
That is
.CB/.I C AB/ D B:
.CA/B D B.AC/ D I C:
Consequently,
I C D B.I C AB/1 A;
and so
J
64 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.9
Solve in Mn .K/ the equation
2A C 3AT D I: (1.88)
Solution
Using the properties of the transpose, we recast (1.88) as
.2A C 3AT /T D I T :
That is,
2AT C 3A D I: (1.89)
Multiplying Eq. (1.88) by 2 and Eq. (1.89) by 3 and adding the results, we obtain
5A D I:
Therefore, A D 15 I: J
Exercise 1.10
Let
" #
ab
AD
cd
A2 B D BA2 and a C d ¤ 0:
Show that
AB D BA:
Solution
We have seen in Exercise 1.4 that if A is a matrix in M2 .K/, then
1
AD ŒA2 C .ad bc/I2 :
.a C d/
1.5 Exercises
65 1
Since the two matrices A2 and I2 commute with B, we have
1
AB D ŒA2 C .ad bc/I2 B
.a C d/
1
D ŒA2 B C .ad bc/I2 B
.a C d/
1
D ŒBA2 C .ad bc/BI2
.a C d/
1
D BŒ ŒA2 C .ad bc/I2
.a C d/
D BA:
H D fM.a/ W where a is in Rg
66 Chapter 1 • Matrices and Matrix Operations
1
Solution
1. We compute
2 32 3 2 3
0 11 0 11 0 0 0
6 76 7 6 7
U 2 D UU D 4 1 0 0 5 4 1 0 0 5 D 4 0 1 1 5 :
1 0 0 1 0 0 0 1 1
Thus,
a2 2
M.a/ D I3 C aU C U :
2
Also,
U 4 D U 3 U D 0:
Consequently,
.a C b/2 2
M.a/M.b/ D I3 C .a C b/U C U
2
D M.a C b/:
1.5 Exercises
67 1
We may also prove the above identity by a direct computation, using the form of the matrices
M.a/ and M.b/.
3. It is clear that H is nonempty, since M.0/ D I3 is in H. Also, by (1.90), if M.a/ and
M.b/ are in H, then the product M.a/M.b/ is also in H. In addition, using (1.90) once again,
we have
This can be verified by induction. It is clear that (1.91) is true for k D 0, k D 1, and k D 2.
Next, assume that (1.91) holds for k and show that it also holds for k C 1. We have, by (1.90),
D M.ka/M.a/
D M.ka C a/
D M..k C 1/a/:
.M.a//k D M.ka/:
D M.k0 a/
D M.ka/:
.M.a//k D M.ka/:
J
68 Chapter 1 • Matrices and Matrix Operations
1
Exercise 1.12
Show that any matrix A in M2 .R/, with A ¤ I2 , satisfying A3 D I2 has trace equal to 1.
Solution
As we have seen in Exercise 1.4, if A is a matrix in M2 .K/, then we have
A3 tr.A/A2 C det.A/A D 0:
Consequently,
.tr.A//3 D 1:
tr.A/ D 1:
J
69 2
Determinants
Belkacem Said-Houari
2.1 Introduction
As, indicated before, one of the main goals in linear algebra is to be able to determine
whether a given square matrix is invertible or not, and if invertible, to find its inverse.
In this chapter, we give a general criterion for the invertibility of square matrices. So, let
us first recall the equation
ax D b; (2.1)
b
xD D a1 b: (2.2)
a
a ¤ 0: (2.3)
Now, for a system of two equations and two unknowns, we have seen that the system
(
ax C by D p;
(2.4)
cx C dy D q;
ad bc ¤ 0: (2.5)
AX D b; (2.6)
70 Chapter 2 • Determinants
where
2 " # " # " #
ab x p
AD ; XD ; bD :
cd y q
The number adbc constructed from the entries of the matrix A is called the determinant
of the 2 2 matrix A and is denoted by
det.A/ D ad bc:
In analogy with this, and if we regard the constant a in (1.1) as a square matrix in
M1 .K/, then the number a in (2.3) is the determinant of the matrix Œa. As (2.3) and
(2.5) show, the Eq. (2.1) and the system (2.4) have unique solutions, that is to say, the
associated matrices are invertible, if and only if their determinants are not zero. So, the
natural question is the following: can we extend this condition for any square matrix A
in Mn .K/? That is, can we show that the matrix A in Mn .K/ is invertible if and only if
its determinant is not zero? Before answering this question, we need to explain how to
find the determinant of a square matrix A in Mn .K/. A main goal in this chapter is to
give answers to the above two questions.
Since according to the above definitions, the determinant of the 1 1 matrix Œa is a, we
may write the determinant of the 2 2 matrix as
We leave it to the reader to check that the above system has a unique solution if and only
if
a11 .a22 a33 a23 a32 / a21 .a12 a33 a13 a32 / C a31 .a12a23 a13 a22 / ¤ 0: (2.10)
a11 .a22 a33 a23 a32 / a12 .a21a33 a13 a32 / C a31 .a12 a23 a13 a22 /
" # " # " #
a22 a23 a21 a23 a12 a22
D a11 det a12 det C a31 det :
a32 a33 a31 a33 a13 a23
We observe that M11 is obtained by removing the first row and the first column from
the matrix A and computing the determinant of the resulting 2 2 matrix. Similarly,
we can find M12 by removing the first row and the second column and computing the
determinant of the remaining matrix, and so on. These Mij are called the minors of the
matrix A.
72 Chapter 2 • Determinants
2 3
a11 a12 : : : a1.j1/ a1.jC1/ : : : a1n
6 : : : a2.j1/ : : : a2n 7
6 a21 a22 a2.jC1/ 7
6 :: :: :: :: :: :: :: 7
6 7
6 : : : : : : : 7
6 7
Mij D det 6
6 a.i1/1 a.i1/2 : : : a.i1/.j1/ a.i1/.jC1/ : : : a.i1/n 7
7: (2.12)
6 7
6 a.iC1/1 a.iC1/2 : : : a.iC1/.j1/ a.iC1/.jC1/ : : : a.iC1/n 7
6 7
6 :: :: :: :: :: :: :: 7
4 : : : : : : : 5
an1 an2 : : : an.j1/ an.jC1/ : : : ann
Example 2.1
Consider the matrix
2 3
103
6 7
A D 42 1 25:
051
Then,
" # " #
12 22
M11 D det D 1 1 2 5 D 9; M12 D det D 2;
51 01
" # " #
21 03
M13 D det D 10; M21 D det D 15;
05 51
" # " #
13 10
M22 D det D 1; M23 D det D 5;
01 05
" # " #
03 13
M31 D det D 3; M32 D det D 4;
12 22
" #
10
M33 D det D 1:
21
2.2 Determinants by Cofactor Expansion
73 2
In (2.11), we saw that the second term has a negative sign, while the first and the last
terms have a positive sign. So, to avoid the negative signs and to be able to write an easy
formula for the determinant, we define what we call the cofactor.
Example 2.2
In Example 2.1, we have, for instance,
and
We can also write the above determinant using the columns, as follows:
2 3
a11 a12 a13
6 7
det 4 a21 a22 a23 5 D a11 C11 C a21 C21 C a31 C31
a31 a32 a33
D a12 C12 C a22 C22 C a32 C32
D a13 C13 C a23 C23 C a33 C33 :
Now, the above formulas for the determinant can be generalized to any square matrix in
Mn .K/ as follows.
74 Chapter 2 • Determinants
for any fixed j. The above two formulas are called the cofactor expansion of the
determinant.
Example 2.3
Find the determinant of the matrix A given by
2 3
103
6 7
A D 42 1 25:
051
Solution
To calculate the determinant of A, we need first to choose one row or one column and make
use of Definition 2.2.3 accordingly. The smart choice is to take the row or the column that
contain the largest number of zeros. In this case, we may choose the first row and use it to do
the cofactor expansion. So, we use (2.15) with i D 1 and write
We have computed the minors of the above matrix A in Example 2.1, so we have
det.A/ D 9 C 3 10 D 21:
2.2 Determinants by Cofactor Expansion
75 2
⊡ Fig. 2.1 To evaluate the
3 3 determinant, we take the + + + − − −
products along the main
diagonal and the lines parallel to a11 a12 a13 a11 a12
it with a .C/ sign, and the
products along the second
diagonal and the lines parallel to a21 a22 a23 a21 a22
it wit a ./ sing
We can obtain the above result using the trick in ⊡ Fig. 2.1 as follows:
det.A/ D 1 1 1 C 0 2 0 C 3 2 5 3 1 0 1 2 5 0 2 1 D 31 10 D 21:
Example 2.4
Calculate the determinant by a cofactor expansion for
2 3
2 1 5
6 7
A D 4 1 4 3 5 :
4 2 0
Solution
We may calculate the determinant of A using the third row:
D 4C13 C 2C23 :
Now, we have
" #
1C3 1 5
C13 D .1/ M13 D det D 17
4 3
and
" #
2C3 21
C23 D .1/ M23 D det D 1:
53
76 Chapter 2 • Determinants
Consequently,
2
det.A/ D 4 17 C 2 .1/ D 66:
In this section, we give the determinant of some particular matrices and establish some
properties of the determinant.
It is clear from Definition 2.2.3 that if we use the cofactor expansion along one of
the rows or along one of the columns of the matrix A, then we obtain the same value
for the determinant. This implies that, A and AT have the same determinant and we state
this in the following theorem.
Proof
The proof of (2.18) can be done by induction. We first take n D 2 and let D2 be the matrix
" #
d11 0
D2 D :
0 d22
Clearly,
and let us show that (2.18) holds for n. Choosing i D n and applying formula (2.14), we get
D dnn det.Dn1 /
Example 2.5
Let
2 3
1 0 0
6 7
A D 4 0 3 05
0 05
Then
ⓘ Remark 2.3.3 We deduce immediately from Theorem 2.3.2 that if In is the identity
matrix in Mn .K/, then
det.In / D 1:
Proof
2 We prove this statement for upper triangular matrices; the same argument works for lower
triangular matrices. So, let
2 3
a11 a12 a13 a1n
6 7
6 0 a22 a23 a2n 7
6 7
6 a3n 7
AD6 0 0 a33 7
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0 ann
and then
is equal to
D ann det.An1 /
Solution
Since A is upper triangular and B is lower triangular using (2.20) we get
det.A/ D 1 2 .4/ D 8
and
ⓘ Remark 2.3.5 We have seen in Theorems 1.2.11 and 1.2.13 that diagonal and
triangular matrices are invertible if and only if all the entries of the main diagonal are
not zero. That is, if the product of those entries is not zero, which is equivalent to the
fact that the determinant is not zero.
As we have seen above, it is easy to compute the determinant of diagonal and triangular
matrices. So, if we can apply the row operations to transform a square matrix into
a triangular matrix (which is easier than transforming it to a diagonal one), then the
determinant of the new matrix can be calculated by just taking the product of the entries
of the main diagonal. Therefore, the question is: how do the row operations affect the
determinant? In this section, we answer this question and compute some determinants
using the row reduction method.
We begin by a fundamental theorem that will lead us to an efficient procedure for
evaluating the determinant of square matrices.
Theorem 2.4.1
Let A D .aij /; 1 i; j n, be a square matrix in Mn .K/. If A has a row of zeros or a
column of zeros, then
det.A/ D 0:
80 Chapter 2 • Determinants
Proof
2 Suppose that there exists 1 i0 n, such that ai0 j D 0 for all 1 j n. Then, using (2.14)
for i D i0 , we deduce that
Similarly, if there exists 1 j0 n, such that aij0 D 0 for all 1 i n, then using (2.15)
for j D j0 , we get
Now let
" #
ab
AD
cd
be a matrix in M2 .K/ and let B1 be the matrix that results by interchanging the two rows
and B2 be the matrix that results by interchanging the two columns; that is,
" # " #
cd ba
B1 D and B2 D :
ab dc
Then,
Next, let B3 the matrix that results by multiplying one row (the first row for instance)
by a scalar k and B4 be the matrix that results from multiplying the first column by a
scalar k; that is
" # " #
ka kb ka b
B3 D and B4 D :
c d kc d
Then,
Finally for this case, let B5 be the matrix that results by adding a multiple of one row
of the matrix A to another row and B6 be the matrix that results by adding a multiple of
one column of A to another column, that is, for instance,
" # " #
a C kc b C kd a C kb b
B5 D and B6 D :
c d c C kd d
2.4 Evaluating Determinants by Row Reduction
81 2
Then,
The ways the above row operations affect the value of the determinant remain valid
for any square matrix in Mn .K/; n 1.
Theorem 2.4.2
Let A be a matrix in Mn .K/.
1. If B is the matrix that obtained by interchanging two rows or two columns of A,
then
Proof
Let A D .aij / and B D .bij /; 1 i; j n.
1. Without loss of generality, we can consider for instance the case where B is the matrix
obtained from A by interchanging the first two rows. Let Mij and Mij0 ; 1 i; j n, denote
the minors of A and B and Cij and Cij0 denote the cofactors of A and B, respectively. Then, by
using the cofactor expansion through the first row of A, we have
0 0 0
det.B/ D b21 C21 C b21 C21 C C b2n C2n
0 0 0
D a11 C21 C a12 C22 C C a1n C2n ; since b2j D a1j ; 1 j n;
0 0 0
D a11 M21 C a12 M22 ˙ a1n M2n
82 Chapter 2 • Determinants
0
D a11 M11 C a12 M12 ˙ a1n M1n ; since M2j D M1j ; 1 j n;
2
D det.A/:
and
Since
we have
Ci0 j D Ci00 j ; 1 j n:
D k det.A/:
Similarly, we can show .3/. We leave this as an exercise to the reader. Also, the same
argument can be applied if we use columns instead of rows. t
u
Elementary Matrices
If the matrix A in Theorem 2.4.2 is the identity matrix, then the matrix B is called an
elementary matrix.
Our goal now is to show that the row operations on a matrix A are equivalent of
multiplying the matrix A from the left by a finite sequence of elementary matrices, and
similarly the column operations on A are equivalent of multiplying the matrix A from
the right by a finite sequence of elementary matrices. To see this first on an example,
consider the matrix
2 3
1 2 3
6 7
A D 44 6 05
5 1 7
and let B be the matrix obtained from A by interchanging the first and the second rows:
2 3
4 6 0
6 7
B D 4 1 2 3 5 :
5 1 7
that is
B D E1 A:
Then,
2
C D AE2 ;
with
2 3
100
6 7
E2 D 4 0 2 0 5 :
001
ⓘ Remark 2.4.3 It is not hard to see that every elementary matrix is invertible and its
inverse is also an elementary matrix.
Now, from the definition of elementary matrices, we can easily deduce the follow-
ing:
▬ If E1 is the elementary matrix obtained by interchanging two rows or two columns
of the identity matrix, then det.E1 / D 1.
▬ If E2 is the elementary matrix obtained by multiplying a single row or a single
column of the identity matrix by a scalar k, then det.E2 / D k.
▬ If E3 is the elementary matrix obtained by adding a multiple of one row (respectively,
one column) of the identity matrix to another row (respectively, another column),
then det.E3 / D 1.
Theorem 2.4.4
Let A be a matrix in Mn .K/. If A contains two proportional rows or two proportional
columns, then
det.A/ D 0:
Proof
Let A D .aij /; 1 i; j n, be a matrix in Mn .K/. Assume that there exist 1 i0 ; i1 n
such that the row ri0 and the row ri1 satisfy ri1 D kri0 .
Let B be the matrix that obtained by adding ri1 to kri0 in A. Then, by using (2.23), we
have
det.B/ D det.A/:
But all the entries of the resulting row in B are zero. Thus, Theorem 2.4.1 implies that
det.B/ D 0 and therefore, det.A/ D 0. The same method can be applied if two columns
of A are proportional. t
u
2.4 Evaluating Determinants by Row Reduction
85 2
Example 2.8
Consider the matrices
2 3 2 3
1 3 1 1 3 9
6 7 6 7
A D 40 2 0 5 and B D 40 2 6 5:
2 6 2 0 1 3
Since the first row and the third row of A are proportional .r3 D 2r1 /, det.A/ D 0. Similarly,
since the second column and the third column of B are proportional, det.B/ D 0.
Example 2.9
Use the row reduction method to calculate the determinant of the matrix
2 3
0 1 5
6 7
A D 4 3 6 9 5 :
2 6 1
Solution
Since the determinant of a triangular matrix is the product of the entries of the main diagonal,
we apply the necessary row operations in order to get a triangular matrix. First, let A1 be the
matrix that obtained by interchanging r1 (the first row) and r2 in A, that is
2 3
3 6 9
6 7
A1 D 4 0 1 5 5 :
2 6 1
det.A1 / D det.A/:
Next, let A2 be the matrix obtained by multiplying the first row in A1 by k D 1=3, i.e.,
2 3
1 2 3
6 7
A2 D 4 0 1 5 5 :
2 6 1
By Theorem 2.4.2,
1
det.A2 / D det.A1 /:
3
86 Chapter 2 • Determinants
det.A3 / D det.A2 /:
Then
Theorem 2.4.5
Let A be a matrix in Mn .K/ and k be a scalar. Then, for B D kA, we have
Proof
Let A D .aij /; 1 i; j n. Then the matrix B is given by B D .kaij /; 1 i; j n. So, to
get B we need to do n row operations. Let A0 D A, An D B and Ai ; 1 i n be the matrix
obtained by multiplying the row ri of the matrix Ai1 by k. Then, applying Theorem 2.4.2,
we get
det.Ai / D k det.Ai1 /
2.4 Evaluating Determinants by Row Reduction
87 2
and therefore
Proof
To show (2.25), we provide a counterexample. Thus, consider the two matrices
" # " #
12 3 0
AD and BD :
03 1 2
Then,
" #
42
ACBD :
11
We have
so,
t
u
Proof
2 If A is a singular matrix, then det.A/ and det.AB/ are both zero (Theorem 2.4.8). Hence,
(2.26) holds. So, we can assume that A is invertible. Then, A can be row reduced (using the
Gauss–Jordan elimination method in Sect. 1.3.1) to the identity matrix. That is, we can
find a finite sequence of elementary matrices E1 ; E2 ; : : : ; E` such that
E 1 E 2 E ` A D In : (2.27)
Hence,
the proof of (2.26) is reduced to show that for any elementary matrix E and any square matrix
M in Mn .K/, we have
Consequently,
D det.A/ det.B/:
Then,
" #
6 0
AB D :
2 3
Hence,
and
We have seen that Eq. (2.1) has a unique solution if and only if
a ¤ 0; or detŒa D a ¤ 0:
Similarly, we have shown that system (2.4) has unique solution if and only if
" #
ab
ad bc ¤ 0; or det D ad bc ¤ 0:
cd
This is equivalent to say that the inverse of the above matrix exists if and only its
determinant is not zero. In fact, this is the case for any matrix A in Mn .K/.
Theorem 2.4.8
Let A be a matrix in Mn .K/. Then, A is invertible if and only if
det.A/ ¤ 0:
90 Chapter 2 • Determinants
Proof
2 First assume that A is invertible, and A1 is its inverse. Then we have
AA1 D A1 A D In :
E 1 E 2 E ` A D In ; (2.29)
and Ei1 ; 1 i `, is the elementary matrix corresponding to the ith row operation applied
by the Gauss–Jordan elimination algorithm.
Now, denoting
B D E1 E2 E` ;
we get
AB D BA D In :
1
det.A1 / D : (2.30)
det.A/
2.5 The Adjoint of a Square Matrix
91 2
Proof
Since A is invertible, Theorem 2.4.8 implies that det.A/ ¤ 0. Writing the invertibility relation
AA1 D A1 A D I;
Hence,
1
det.A1 / D ;
det.A/
as claimed. t
u
Example 2.11
Consider the matrix
" #
ab
AD :
cd
We have seen in Theorem 2.4.8 that the inverse of A exists if and only det.A/ ¤ 0. Since
our ultimate goal is to compute A1 , we may ask whether there is a way to compute
A1 by using the determinant? To answer this question, let us consider a matrix A in
M2 .K/,
" #
ab
AD :
cd
92 Chapter 2 • Determinants
B D CT :
1
A1 D CT : (2.32)
det.A/
adj.A/ D CT :
2.5 The Adjoint of a Square Matrix
93 2
Example 2.12
Find the adjoint of the matrix
2 3
1 0 2
6 7
A D 4 1 3 0 5 :
1 0 2
Solution
We compute the cofactors of A as
" # " #
3 0 1 0
C11 D det D 6; C12 D det D 2;
0 2 1 2
and similarly,
and then
2 3
6 0 6
6 7
adj.A/ D CT D 4 2 4 2 5 :
3 0 3
Now, as in the case of the inverse of a 2 2 matrix considered in (2.32), we have the
following theorem.
1
A1 D adj.A/: (2.33)
det.A/
94 Chapter 2 • Determinants
Proof
2 We need to show that the matrix B defined by
1
BD adj.A/
det.A/
satisfies
AB D BA D I: (2.34)
Then, B is an inverse of A and the uniqueness of the inverse (Theorem 1.2.3) leads to B D
A1 . To check (2.34), let A D .aij /; 1 i; j n, and adj.A/ D .dji /; 1 j; i n, with
dji D Cij . By Definition 1.1.11,
A adj.A/ D .bij /; 1 i; j n;
Now, if i D j, then the above formula is the cofactor expansion of the determinant of the
matrix A along the ith row.
On the other hand, if i ¤ j, then
The above equation is just the determinant of the matrix A, where we replace the ith row by
the jth row. Then, in this case the matrix contains two identical rows and so its determinant
is zero (Theorem 2.4.4).
Therefore, we obtain
2 3
det.A/ 0 : : : 0
6 0 det.A/ : : : 0 7
6 7
A adj.A/ D 6
6 :: :: :: :: 77 D det.A/I:
4 : : : : 5
0 0 : : : det.A/
By the same method, we can show that BA D I, and therefore, B D A1 . This completes the
proof of Theorem 2.5.1. t
u
2.5 The Adjoint of a Square Matrix
95 2
Example 2.13
Use the adjoint formula to find the inverse of the matrix
2 3
1 0 2
6 7
A D 4 1 3 0 5 :
1 0 2
Solution
We have computed the cofactors of A in Example 2.12. Let us now find the determinant of A.
Using the cofactor expansion along the second column, we have
D 3C22
D 12:
1
A1 D adj.A/:
det.A/
Therefore
2 3 2 3
6 0 6 1=2 0 1=2
1 6 7 6 7
A1 D 4 2 4 2 5 D 4 1=6 1=3 1=6 5 :
12
3 0 3 1=4 0 1=4
Example 2.14
Use formula (2.33) to find the inverse of the matrix
2 3
3 2 1
6 7
A D 4 2 0 1 5:
1 2 1
96 Chapter 2 • Determinants
Solution
2 We need first to compute the cofactors of A as in Example 2.12. We find
C11 D 2; C12 D 3; C13 D 4; C21 D 4; C22 D 4; C23 D 4;
and so
2 3
2 4 2
6 7
adj.A/ D 4 3 4 1 5 :
4 4 4
Now, we use the cofactor expansion along the second row to find the determinant of A as
D 4:
Example 2.15
Use the adjoint matrix to find the inverse of the matrix
2 3
2 1 1
6 7
A D 4 0 1 3 5 :
0 0 2
2.5 The Adjoint of a Square Matrix
97 2
Solution
First, it is clear that since A is a triangular matrix,
This means that A is invertible. Now, we need to find the adjoint of A. We compute first the
cofactor matrix C of A. We have
" #
1 3
C11 D det D 2; C12 D 0; C13 D 0; C21 D 2;
0 2
C22 D 4; C23 D 0; C31 D 2; C32 D 6; C33 D 2:
Consequently,
2 3 2 3
C11 C12 C13 2 0 0
6 7 6 7
C D 4 C21 C22 C23 5 D 4 2 4 0 5 :
C31 C32 C33 2 6 2
Thus,
2 3
2 2 2
6 7
adj.A/ D CT D 4 0 4 6 5 ;
0 0 2
and so
3 2
2 3
2 2 2 1=2 1=2 1=2
1 16 7 6 7
A1 D adj.A/ D 4 0 4 6 5 D 4 0 1 3=2 5 :
det.A/ 4
0 0 2 0 0 1=2
Example 2.16
1. Use the row reduction method to find the determinant of the matrix
2 3
2 4 6
6 7
A D 4 0 0 1 5 :
2 1 5
98 Chapter 2 • Determinants
Solution
2 1. Denote by r1 ; r2 , and r3 the rows of A and of all the matrices obtained by means of row
operations. Our goal is to apply the row operation method to get a triangular matrix from A.
First, we exchange r2 and r3 and get
2 3
2 4 6
6 7
A1 D 4 2 1 5 5 ;
0 0 1
Consequently,
Consequently,
2 3 2 3
C11 C12 C13 1 2 0
6 7 6 7
C D 4 C21 C22 C23 5 D 4 26 1 10 5 ;
C31 C32 C33 4 2 0
and so
2 3
1 26 4
6 7
adj.A/ D CT D 4 2 2 2 5 :
0 10 0
2.5 The Adjoint of a Square Matrix
99 2
This gives
2 3 2 3
1 26 4 1=10 13=5 2=5
1 1 6 7 6 7
A1 D adj.A/ D 4 2 2 2 5 D 4 1=5 1=5 1=5 5 :
det.A/ 10
0 10 0 0 1 0
In this subsection, we will use the adjoint formula to find the solution of the system
AX D b (2.36)
where A is an invertible matrix in Mn .K/ and X and b are vectors in Mn1 .K/. That is
2 3
a11 a12 ::: a1n 32 2
3
6 7 x1 b1
6 a21 a22 ::: a2n 7 6 : 7 6 : 7
AD6
6 :: :: :: :: 7
7; XD6 7
4 :: 5 and bD6 7
4 :: 5 : (2.37)
4 : : : : 5
xn bn
an1 an2 ::: ann
then we have
It is clear that the matrix A1 is obtained by replacing the first column of A by the vector
b and the matrix A2 is obtained by replacing the second column of A by the vector b.
This shows that the solution of (2.36) is given by
det.A1 / det.A2 /
x1 D and x2 D :
det.A/ det.A/
This method of finding x1 and x2 is called the Cramer rule and is generalized in the
following theorem.
where Aj ; 1 j n, is the matrix obtained by replacing the entries in the jth column
of the matrix A by the entries of the column
2
3
b1
6b 7
6 27
bD6 7
6 :: 7 :
4 : 5
bn
2.5 The Adjoint of a Square Matrix
101 2
Proof
First method. It is clear that if det.A/ ¤ 0, then A is invertible and the unique solution of
(2.36) is given by X D A1 b. Now using formula (2.33), we have
1
XD adj.A/b
det.A/
2 32 3
C11 C21 ::: Cn1 b1
6C C ::: Cn2 7 6 7
6
1 6 12 22 7 6 b2 7
D
det.A/ 6 : : :: :: 7 6 7
76 : 7;
4 :: :: : : 5 4 :: 5
C1n C2n : : : Cnn bn
whence
2 3 2 3
x1 b1 C11 C b2 C21 C C bn Cn1
6x 7 6b C C b C C C b C 7
6 27 6 1 12 n n2 7
6 : 7D 1 6 :
2 22
6 : 7 6 : :: :: 77:
4 : 5 det.A/ 4 : : : 5
xn b1 C1n C b2 C2n C C bn Cnn
det.Aj /
xj D :
det.A/
Second method. We denote by Aj .b/ the matrix obtained by replacing the jth column of
A by the vector b. Let a1 ; a2 ; : : : ; an be the column vectors of A and let e1 ; e2 ; : : : ; en be the
column vectors of the identity matrix I. Then, we have, for 1 j n,
Ij .X/ D Œe1 ; : : : ; X; : : : ; en ;
102 Chapter 2 • Determinants
D Œa1 ; : : : ; b; : : : ; an D Aj .b/:
Example 2.17
Use Cramer’s rule to find the solution of the linear system
(
x1 x2 D 1;
(2.39)
x1 C 2x2 D 3:
Solution
System (2.39) can be written in matrix form as
AX D b;
with
" # " # " #
1 1 x1 1
AD ; XD and bD :
1 2 x2 3
Since det.A/ D 3 ¤ 0, A is invertible and the components of the unique solution of (2.39)
are given by
det.A1 / 5 det.A2 / 2
x1 D D and x2 D D :
det.A/ 3 det.A/ 3
J
2.5 The Adjoint of a Square Matrix
103 2
Example 2.18
Use Cramer’s rule to find the solution of the system
8
ˆ
< x1 C 2x2 C 3x3 D 1;
2x1 C 5x2 C 3x3 D 6; (2.40)
:̂
x1 C 8x3 D 6:
Solution
The system (2.40) can be written in the form
AX D b;
with
2 3 2 3 2 3
123 x1 1
6 7 6 7 6 7
A D 42 5 35; X D 4 x2 5 ; and b D 4 6 5:
108 x3 6
det.A/ D 1 ¤ 0:
J
104 Chapter 2 • Determinants
2.6 Exercises
2
Exercise 2.1
1. Consider the matrix
2 3
c20
6 7
A D 41 c 25;
01c
where c is a real number. Find all values of c, if any, for which A is invertible.
2. Put c D 1 and use the adjoint matrix to find A1 .
Solution
1. The matrix A is invertible if and only if det.A/ ¤ 0. Now, using the cofactor expansion,
we have
D c.c2 2/ 2c
D c.c2 4/ D c.c 2/.c C 2/:
Thus,
2 3
1 2 4
6 7
adj.A/ D CT D 4 1 1 2 5 ;
1 1 1
and so
2 3 2 3
1 2 4 1=3 2=3 4=3
1 16 7 6 7
A1 D adj.A/ D 4 1 1 2 5 D 4 1=3 1=3 2=3 5 :
det.A/ 3
1 1 1 1=3 1=3 1=3
J
2.6 Exercises
105 2
Exercise 2.2
Let
" #
ab
AD :
cd
Solution
1. We have
" #" # " #
2 ab ab a2 C bc ab C db
A D AA D D :
cd cd ac C dc d2 C bc
tr.A2 / D a2 C bc C d2 C bc D a2 C d2 C 2bc:
2. We have
" #
tr.A/ 1
det D .tr.A//2 tr.A2 /
tr.A2 / tr.A/
D 2 det.A/;
Exercise 2.3
Let A and B be two invertible matrices in Mn .R/. Show that if
AB D BA;
then n is even.
106 Chapter 2 • Determinants
Solution
2 Since AB D BA, we have
det.AB/ D det.BA/:
Using the product rule (Theorem 2.4.7) and the fact that det.A/ ¤ 0 and det.B/ ¤ 0
(Theorem 2.4.8), we obtain .1/n D 1. This shows that n is even. J
Exercise 2.4
1. Find the determinant of the matrix A in M2 .C/ given by
" #
! !
AD
1 !
Solution
1. By direct computation,
det.A/ D ! 2 C !:
! 3 D cos.2/ C i sin.2/ D 1:
! 3 1 D .! 1/.! 2 C ! C 1/ D 0;
which gives
! 2 C ! C 1 D 0;
2.6 Exercises
107 2
since ! ¤ 1. Hence,
det.A/ D ! 2 C ! D 1:
Similarly, to find the determinant of B, we use the cofactor expansion along the first row, to
get
D ! 4 C 3! 2 2!:
a2 C a C 1 D 0 (2.41)
! 4 C ! 2 C 1 D 0:
Since the coefficients of the Eq. (2.41) are real, one necessarily has that !N is also a solution
to (2.41), therefore, ! 2 D !.
N Consequently,
det.B/ D 3.! 2 !/
D 3.!N !/
D 6i sin.2=3/:
2. To find the inverse of B, we need first to find the adjoint matrix adj.B/. We compute
the cofactors of B as follows:
" # " #
! !2 2 4 2 1 !2
C11 D det D ! ! D ! !; C12 D det D ! 2 !;
!2 ! 1 !
" # " #
1 ! 2 1 1
C13 D det D ! !; C21 D det D ! 2 !;
1 !2; !2 !
108 Chapter 2 • Determinants
" # " #
1 1 1 1
2 C22 D det D ! 1; C23 D det D 1 !2;
1! 1 !2
" # " #
1 1 2 1 1
C31 D det D ! !; C23 D det D 1 !2;
! !2 1 !2
" #
1 1
C33 D det D ! 1:
1!
and thus
2 3
!2 ! !2 ! !2 !
6 7
adj.B/ D CT D 4 ! 2 ! ! 1 1 ! 2 5 :
!2 ! 1 !2 ! 1
Solution
1. We find the determinant of Vandermonde by induction. First, for n D 1, we have V.a1 / D
1. Next, for n D 2, we have
" #
1 a1
V.a1 ; a2 / D det D a2 a1 :
1 a2
For n D 3, we have
2 3
1 a1 a21
6 7
V.a1 ; a2 ; a3 / D det 4 1 a2 a22 5 :
2
1 a3 a3
To find the above determinant, we use Theorem 2.4.2 and replace c2 in the above matrix by
c2 a1 c1 and c3 by c3 a1 c2 , where c1 ; c2 and c3 are the first, the second, and the third
columns of the above matrix to get
2 3
1 0 0
6 7
V.a1 ; a2 ; a3 / D det 4 1 a2 a1 a2 .a2 a1 / 5
1 a3 a1 a3 .a3 a1 /
" #
a2 a1 a2 .a2 a1 /
D det
a3 a1 a3 .a3 a1 /
D .a2 a1 /a3 .a3 a1 / .a3 a1 /a2 .a2 a1 /
D .a2 a1 /.a3 a1 /.a3 a2 /
Y
D .ai aj /:
1j<i3
where
X
n
sk D aki ; 0 k 2n 2;
iD1
for some a1 ; a2 ; : : : ; an in K.
1. Show that Hn D VnT Vn , where Vn is the Vandermonde matrix (2.42).
2. Find det.Hn /.
2.6 Exercises
111 2
Solution
We have
2 32 3
1 1 ::: 1 1 a1 a21 : : : an1
1
6 a2 : : : an 7 6 a2 a22 : : : an1 7
6 a1 761 2 7
VnT Vn D 6
6 :: :: :: 7 6
76 : :: :: 7
7
4 : : : 5 4 :: : : 5
an1
1 an1
2 : : : an1
n 1 an a2n : : : an1
n
To get siCj2 , we need to multiply row ri1 of VnT with column cj1 of Vn . For instance
s0 D s1C12 D r1 c1 ; i D j D 1;
s1 D s1C22 D s2C12 D r1 c2 D r2 c1 ; i D 1; j D 2; or i D 2; j D 1:
::
:
sk D ri cj ; with i C j D k C 2:
D Œdet.V/2
Y
D .ai aj /2 ;
1j<in
det.A/ D 0:
112 Chapter 2 • Determinants
Solution
2 1. Let A be a matrix in Mn .K/. Then,
.A C AT /T D AT C .AT /T D AT C A:
Consequently, A C AT is symmetric.
Similarly,
.A AT /T D AT .AT /T D AT A D .A AT /:
1 1
AD .A C AT / C .A AT /:
2 2
Since A C AT is symmetric, 12 .A C AT / is also symmetric; moreover, since A AT is skew
symmetric, so is 12 .A AT /.
3. Since A is skew symmetric, we have A D AT . This gives
by making use of (2.24). Now, since det.A/ D det.AT / (Theorem 2.3.1), then by the above
reasoning if n is odd, then
2 det.A/ D 0:
This yields
det.A/ D 0:
Solution
1. We use the row reduction method (Theorem 2.4.2) to compute the determinant of Cp . We
can apply two methods:
For the first method, we denote the columns of the matrix Cp and the columns of the
obtained matrices after any column operation by C1 ; C2 ; : : : ; Cn . Permuting the first and the
second columns, we get
2 3
1 0 0 ::: 0
6 : 7
6 7
6 0 0 1 :: 0 7
6 7
Cp;1 D6
6
:: :: :: :: 7
6 : : : : 77
6 7
4 0 0 0 ::: 1 5
a1 a0 a2 : : : an1
and det.Cp;2 / D det.Cp;1 / D .1/2 det.Cp /. We continue in this way and in each operation,
we permute Ci and CiC1 ; i D 1; : : : ; n 1, and finally, after .n 1/ operations, we get
2 3
1
0 ::: 0 0
6 : 7
6 7
6 0 1 :: 0 0 7
6 7
Cp;.n1/ D6 :
6 ::
:: :: 7
6 : : 0 77
6 7
4 0 0 : : : 1 0 5
a1 a2 : : : an1 a0
and det.Cp;.n1/ / D .1/n1 det.Cp /. Since Cp;.n1/ is a triangular matrix, its determinant is
the product of the entries of the main diagonal, that is
det.Cp;.n1/ / D a0 :
114 Chapter 2 • Determinants
Consequently,
2
det.Cp / D .1/n a0 :
with det.A/ D det.Cp /. Now, we use the cofactor expansion along the first column to
compute the determinant of A
where
2 3
1 0 ::: 0
6 7
6 0 1 ::: 07
6 7
D det 6 D 1:
:: 7
Mn1
6 :: : : 7
4: : :5
0 0 ::: 1
Thus,
2. We have
2 3
1 0 : : : 0
6 : 7
6 7
6 0 1 :: 0 7
6 7
6
I Cp D 6 :: :: :: :
:: 7:
: : : 7
6 7
6 7
4 0 0 0 ::: 1 5
a0 a1 a2 : : : C an1
Now, to compute the determinant of I Cp , we use the cofactor expansion along the first
column, obtaining
.n1/ .n1/
det.I Cp / D C11 C a0 Cn1 D M11 C a0 .1/nC1 Mn1 ;
2.6 Exercises
115 2
where
2 3
::
6 1 : 0 7
6 : :: :: 7
.n1/ 6 : : 7
M11 D det 6 : : 7
6 7
4 0 0 ::: 1 5
a1 a2 : : : C an1
and
2 3
1 0 : : : 0
6 7
6 1 ::: 0 7
.n1/ 6 7
D det 6 D .1/n1 ;
:: 7
Mn1 (2.46)
6 :: :: 7
4 : : : 5
0 0 : : : 1
Now, observe that the matrix in (2.46) is the same as the matrix I Cp , but of size
.n 1/ .n 1/. So, the same method as before yields
.n1/ .n2/
M11 D M11 C a1 ;
.n2/
where M11 is the determinant of the .n 2/ .n 2/ matrix
2 3
::
6 1 : 0 7
6 : :: :: 7
6 : : 7
6 : : 7:
6 7
4 0 0 ::: 1 5
a2 a3 : : : C an1
Consequently, we obtain
.n2/
det.I Cp / D 2 M11 C a1 C a0 :
.2/
det.I Cp / D n2 M11 C an3 n3 C C a1 C a0 ; (2.47)
where
" #
.2/ 1
M11 D det D 2 C an1 C an2 :
an2 C an1
116 Chapter 2 • Determinants
1. Show that
Solution
1. The cofactor expansion along the last column, yields
then
2.6 Exercises
119 2
Solution
1. Assume that A D .aij / is an r r matrix .r p/ and D is a p p matrix. We use induction
on r to show (2.50). If r D 1, then by expanding along the first row, we obtain
2 3
a11 0 : : : 0
6 7
6 7
6 7
6 0 7
det 6 7 D a11 det.D/ D det.A/ det.D/:
6 :: 7
6 7
4 : D 5
0
Now, we assume that (2.50) is true for r D p 1 and show that it remains true for r D p. We
have
" #
A 0pp
det.M/ D det ;
0pp D
and
" #
A.p1/.p1/ 0.p1/p
det.M/ D app det D app det.A.p1/.p1/ / det.D/
0p.p1/ D
D det.A/ det.D/;
Then,
2 " #
7 4
det.AD BC/ D det D 115:
6 13
That is,
since det Ip D 1. J
121 3
Belkacem Said-Houari
3.1 Introduction
The main objectives in this chapter are to generalize the basic geometric ideas in R2 or
R3 to nontrivial higher-dimensional spaces Rn . Our approach is to start from geometric
concepts in R2 or R3 and then extend them to Rn in a purely algebraic manner.
In engineering and physics, many quantities, such as force and velocity, are defined
by their direction and magnitude. Mathematically, this is what we call a vector. Thus,
we can simply define a vector in 2- or 3-dimensional space as a line segment with a
definite direction, or graphically as an arrow connecting a point A (called initial point)
and a point B (called terminal point), as shown in ⊡ Fig. 3.1
In engineering and physics, some quantities like weight or height can be represented
by a scalar. On the other hand, a force, for example, can be described by its magnitude
and direction, therefore, a vector is characterized by its magnitude (length) and its
direction. Accordingly, two vectors are equivalent (or equal) if they have the same length
and the same direction. See ⊡ Fig. 3.2.
Consider the vector v with initial point A and terminal point B and let w be the vector
with initial point B and terminal point C. Then it is not hard to see that v C w is the
vector with initial point A and terminal point C, as in ⊡ Fig. 3.3.
Now, the question is that: how can we add v to w if the terminal point of v is not the
initial point of w? In this case, as we stated before, two vectors with the same length and
the same direction are equal. Thus, as in ⊡ Fig. 3.4, the dashed vector is equal to the
vector v and its initial point is in the same time the terminal point of v and thus we can
122 Chapter 3 • Euclidean Vector Spaces
E
⊡ Fig. 3.1 The vector v D AB
B
3
A
+w
+
w
+
add the two vectors as above. Now, from this simple remark, we may deduce directly
that:
1. The sum of vectors is commutative. That is
v C w D w C v; (3.1)
u C .v C w/ D .u C v/ C w; (3.2)
+w
u )
+w
+
(
u+
u
w=
)+
(u +
1
2
−2
x
O 1
Let v be a vector in 2-dimensional space with its initial point at the origin of
a rectangular coordinate system. Then the vector is completely determined by the
coordinates of its terminal point; see ⊡ Fig. 3.7.
#»
Thus, v D OP D .v1 ; v2 /, where v1 and v2 are called the components of v. Now,
in the coordinate system the vector is determined by its components rather than by its
direction and its length.
Now, if the initial point P1 is not the origin, then we just need to translate the
coordinate system in such a way that the point P1 will be the origin in the new coordinate
system as in ⊡ Fig. 3.8.
Let P.x1 ; y1 / be the initial point of the vector v and P2 .x2 ; y2 / be the terminal point
of v. Then in the x0 y0 -plane the components of the vector v are v D .x02 ; y02 /, since P1 is
the origin in the x0 y0 -plane. Since in the xy-plane we have x02 D x2 x1 and y02 D y2 y1 ,
the components of v in the xy-plane are v D .x2 x1 ; y2 y1 /.
124 Chapter 3 • Euclidean Vector Spaces
y1 x
P1 x2
x
x1 x2
Now, we are going to generalize the above notions to any n-dimensional space.
We have seen in (1.15) how to add two vectors in 2-dimensional space (R2 ) and
3-dimensional space (R3 ). To generalize (1.15), let us first define the space Rn .
Now, similarly to the geometric approach in (1.15), we may define the addition in
Rn . Thus, if v D .v1 ; v2 ; : : : ; vn / and w D .w1 ; w2 ; : : : ; wn / are two vectors in Rn , we
define the sum v C w to be the vector
v C w D .v1 C w1 ; v2 C w2 ; : : : ; vn C wn /: (3.3)
Example 3.1
Let v D .1; 3; 1/ and w D .1; 0; 2/ be two vectors of R3 . Then,
v C w D .0; 3; 1/:
As depicted in ⊡ Fig. 3.9, if v is the vector with the initial point A and terminal point B
and w is the vector with initial point B and terminal point C, then the sum v C w is the
vector with the initial point A and terminal point C.
The real numbers v1 ; v2 ; : : : ; vn are called the components of the vector v. Form
(3.3), we may easily show that
v C w D w C v:
Thus, the addition .C/ of vectors is commutative. Moreover, and as we have seen for
the 2-dimensional space, it is clear that
3.2 Vector Addition and Multiplication by a Scalar
125 3
⊡ Fig. 3.9 The sum of two y w
vectors v C w in a coordinate
system B C
+w
A
x
− 1
x
1
−
− 2
u C .v C w/ D .u C v/ C w;
for any three vectors u; v and w in Rn . Thus, the addition of vectors is also associative.
In particular, if w D v, then
We may also define the vector .1/v D v (⊡ Fig. 3.10) to be the vector with the same
length as v and its direction the opposite of that of v. Using the geometric representation
in R2 , for example, we find that if v D .v1 ; v2 /, then v D .v1 ; v2 /.
Analogously, if k is a scalar, then we define kv to be the vector
Also, we define the zero vector in Rn to be the vector whose components are all zero.
That is, 0 D 0Rn D .0; 0; : : : ; 0/, and this vector has the property
0Cv DvC0Dv
for any vector v in Rn . Thus, the zero vector is an identity element with respect to the
addition of vectors. Furthermore, for any vector v in Rn ,
v C .v/ D 0:
Consequently, we may collect all the above properties in the following theorem (see
Definition 1.1.10).
3
Theorem 3.2.1 (Group Structure of Rn )
The space .Rn ; C/ is an Abelian group with respect to the addition .C/.
This property represents the compatibility of scalar multiplication with the multiplica-
tion in R.
Next, (3.3) and (3.4) imply that
k.v C w/ D k.v1 C w1 ; v2 C w2 ; : : : ; vn C wn /
D .k.v1 C w1 /; k.v2 C w2 /; : : : ; k.vn C wn //
D .kv1 C kw1 ; kv2 C kw2 ; : : : ; kvn C kwn /
D k.v1 ; v2 ; : : : ; vn / C k.w1 ; w2 ; : : : ; wn /
D kv C kw:
The above four properties together with Theorem 3.2.1 endow Rn with an algebraic
structure called vector space.
Let us give a general definition of a vector space E over a field K.
3.2 Vector Addition and Multiplication by a Scalar
127 3
Definition 3.2.2 (Vector Space)
Let E be a nonempty set and K be a field. Let two operations be given on E: one
internal denoted by .C/ and defined as:
EE!EW .v; w/ 7! v C w
Then, E is called a vector space over the field K if for all x; y in E and for all ; in
K the following properties hold:
1. .E; C/ is an Abelian group.
2. 1K x D x.
3. .x/ D ./x.
4. .x C y/ D x C y.
5. . C /x D x C x.
Example 3.2
As we have seen above, Rn is a vector space over R.
In this case and as we have seen in Chap. 1, the vector v can be seen as an n 1
matrix.
As shown in ⊡ Fig. 3.11, the vector v can be written in the coordinate system as
v D v1 e1 C v2 e2 ; (3.5)
where v1 and v2 are scalars. Relation (3.5) says that the vector v is a linear combination
of the two vectors e1 and e2 .
128 Chapter 3 • Euclidean Vector Spaces
3 2e2
x
1e1
v D k1 v1 C k2 v2 C C kp vp : (3.6)
Example 3.3
Let v D .v1 ; v2 ; v3 / be a vector in R3 ; then v can be written as
Thus, v is a linear combination of the vectors e1 D .1; 0; 0/; e2 D .0; 1; 0/, and e3 D .0; 0; 1/.
In this section, we define some numbers associated to vectors, called norm, dot product,
and distance.
As, we have said earlier, a vector is determined by its length (or magnitude or norm) and
its direction. Now, to define the norm of a vector in Rn , let us first consider the vector
v D .v1 ; v2 / in the coordinate system R2 as in ⊡ Fig. 3.11. Then, using Pythagoras’
theorem, we have
q
kvk2 D v12 C v22 ; or kvk D v12 C v22 ;
where kvk denotes the length of v. This indicates how to generalize the above notion to
vectors in Rn for any n 2.
3.3 Norm, Dot Product, and Distance in Rn
129 3
Definition 3.3.1 (Norm of a Vector)
Let v D .v1 ; v2 ; : : : ; vn / be a vector in Rn . Then the norm of v is denoted by kvk
and is defined as
q
kvk D v12 C v22 C C vn2 : (3.7)
The norm in (3.7) is called the Euclidean norm since it is associated with the
Euclidean geometry.
Example 3.4
Consider the vector v D .1; 0; 2/. Then,
p p
kvk D 12 C 02 C .2/2 D 5:
Proof
1. The first property is a direct consequence of (3.7).
2. Assume first that kvk D 0. Then by (3.7), we deduce that
D jkjkvk:
130 Chapter 3 • Euclidean Vector Spaces
ⓘ Remark 3.3.2
It is clear from above that the norm of a vector in Rn can be seen as a mapping N
defined on the spaces Rn and with values into RC as:
3
N W Rn ! RC ; v 7! N .v/ D kvk;
Unit Vectors
Example 3.5
The vectors e1 D .1; 0/ and e2 D .0; 1/ are unit vectors in R2 , since ke1 k D ke2 k D 1.
where we have applied the last property in Theorem 3.3.1, with k D 1=kvk. The above
process of obtaining the unit vector u is called the normalization of v. We can write v as
v D kvku:
Since kvk 0, we see that the vectors u and v have the same direction.
ⓘ Remark 3.3.3 It is clear that any vectors v D .v1 ; v2 ; : : : ; vn / can be written as a linear
combination of the standard unit vectors e1 ; e2 ; : : : ; en as
v D v1 e1 C v2 e2 C : : : vn en :
3.3 Norm, Dot Product, and Distance in Rn
131 3
3.3.2 Distance in Rn
# » p
d.P1 ; P2 / D kP1 P2 k D . y1 x1 /2 C . y2 x2 /2 C C . yn xn /2 : (3.8)
Example 3.8
Let P1 .0; 1; 2/ and P2 .1; 3; 1/ be two points in R3 . Then
# » p p
d.P1 ; P2 / D kP1 P2 k D .1 0/2 C .3 1/2 C .1 2/2 D 6:
Proof
The first and the second properties are obvious. To prove the third property, we have
p
d.P1 ; P2 / D . y1 x1 /2 C . y2 x2 /2 C C . yn xn /2
p
D .x1 y1 /2 C .x2 y2 /2 C C .xn yn /2
D d.P2 ; P1 /:
For the fourth property, since the distance is positive, it is enough to show that
2 2
d.P1 ; P2 / d.P1 ; P2 / C d.P2 ; P3 / : (3.9)
132 Chapter 3 • Euclidean Vector Spaces
We have
2
d.P1 ; P2 / D .x1 y1 /2 C .x2 y2 /2 C C .xn yn /2
3
D ..x1 z1 / C .z1 y1 //2 C ..x2 z2 / C .z2 y2 //2 C : : :
C ..xn zn / C .zn yn //2
D .x1 z1 /2 C .x2 z2 /2 C C .xn zn /2
C .z1 y1 /2 C .z2 y2 /2 C C .zn yn /2
Clearly,
D d.P1 ; P2 /d.P2 ; P3 /
Since vectors in Rn can be seen as column matrices, we can not multiply two vectors
using matrix multiplication (Definition 1.1.11). Instead, we can define the dot product
of two vectors. To examine a simple case first consider two vectors u D .u1 ; u2 / and
v D .v1 ; v2 / in R2 and let be the acute angle between them, as in ⊡ Fig. 3.12. We
define the dot product of these two vectors as the product of their norms and the cosine
of the angle :
Formula (3.11) is based on 2-dimensional geometry and it cannot be applied for vectors
in Rn with n > 3. For this reason we will provide an algebraic formula to define the
θ
3.3 Norm, Dot Product, and Distance in Rn
133 3
⊡ Fig. 3.13 The cosine law
A c
B
b
a
C
u
Q( 1 , 2)
θ
x
dot product of two vectors in Rn for any n 1. To do this, let us first review the cosine
formula for any triangle with A; B and C as its angles and a; b and c the lengths of its
sides, as shown in ⊡ Fig. 3.13.
We have
c2 D a2 C b2 2ab cos C;
a2 D b2 C c2 2bc cos A; (3.12)
b2 D a2 C c2 2ac cos B:
u v D kukkvk cos D u1 v1 C u2 v2 :
134 Chapter 3 • Euclidean Vector Spaces
In this last formula, we got a rid of the cosine of the angle and so, by knowing the
components of the vectors u and v, we can easily compute the dot product between the
two vectors using the algebraic expression
3
u v D u1 v1 C u2 v2 :
This indicates how to generalize the definition of the dot product for any two vectors
u D .u1 ; u2 ; : : : ; un / and v D .v1 ; v2 ; : : : ; vn / in Rn , for all n 2.
u v D u1 v1 C u2 v2 C C un vn : (3.14)
Example 3.9
Consider the vectors u D .1; 2; 1/ and v D .3; 0; 1/ in R3 . Then
u v D 1 3 C 2 0 C .1/ .1/ D 2:
It is clear that formula (3.14) allows us to deduce several properties of the dot
product. We list some of them.
Proof
1. It is clear that, since 0 D .0; 0; : : : ; 0/, then u 0 D 0.
2. By Definition 3.3.1,
u v D u1 v1 C u2 v2 C C un vn
D v1 u1 C v2 u2 C C vn un
D v u:
4. By (3.3),
v C w D .v1 C w1 ; v2 C w2 ; : : : ; vn C wn /:
Hence,
D .u1 v1 C u2 v2 C C un vn / C .u1 w1 C u2 w2 C C un wn /
D u v C u w:
t
u
It is clear from above that the dot product has very similar properties to those of the
usual multiplication in R. Here is a simple example.
Example 3.10
For any two vectors u and v in Rn , we have
of their norms. Its idea is very simple and for vectors in R2 or R3 it is a direct
consequence of the definition of the dot product (formula (3.11)) and the properties
of the cosine function. So, let u and v be two nonzero vectors in R2 or R3 . Then, from
3 (3.11), we have
uv
cos D ;
kukkvk
and since
j cos j 1;
it follows that
ˇ ˇ
ˇ uv ˇ
ˇ ˇ
ˇ kukkvk ˇ 1:
This yields
ju vj kukkvk: (3.15)
ju vj kukkvk: (3.16)
That is,
q q
ju1 v1 C u2 v2 C C un vn j u21 C u22 C C u2n v12 C v22 C C vn2 :
Proof
In fact there are several proofs of the Cauchy–Schwarz inequality; here we present one that
uses the above properties of the dot product.
As, we have seen above, it is clear that for any two vectors u and v in Rn and for any
in R, we have
with
Since a > 0, the polynomial of in (3.17) is positive for any in R if and only if the
discriminant
D b2 ac < 0;
that is
p p
jbj a c;
or equivalently
ju vj D kkvk2 D kkvkkvk
D kukkvk:
Example 3.11
Consider the vectors u D .1; 0; 1/ and v D .1; 2; 0/. Thus, we have
p p
kuk D 2; kuk D 5; u v D 1;
It is clear that
p p
1< 2 5;
a2 b2
jabj C : (3.18)
2 2
p b
Now, let > 0 be any real number. Applying (3.18) for a0 D a and b0 D p , we obtain
a0 2 b0 2
ja0 b0 j C ;
2 2
or equivalently,
2 b2
jabj a C : (3.19)
2 2
b2
jabj a2 C : (3.20)
4
Example 3.13
Show that for any real numbers a1 ; a2 ; : : : ; an , we have the inequality
Solution
Consider in Rn the two vectors
ju vj kukkvk:
3.3 Norm, Dot Product, and Distance in Rn
139 3
That is,
p q 2
ja1 C a2 C C an j n a1 C a22 C C a2n ;
or equivalently
Now, having the Cauchy–Schwarz inequality, we can prove some important inequal-
ities and identities.
Proof
We know that
.u C v/ .u C v/ D ku C vk2 : (3.23)
.u C v/ .u C v/ D u u C v v C 2.u v/
u−
3
u
+
u
Theorem 3.3.8 (Parallelogram Identity)
Let u and v be two vectors in Rn . Then,
Proof
The parallelogram identity in R2 is illustrated in ⊡ Fig. 3.15. To prove it, we have, on one
hand
ku C vk2 D .u C v/ .u C v/ D .u u/ C .v v/ C 2.u v/
D kuk2 C kvk2 C 2.u v/: (3.27)
ku vk2 D .u v/ .u v/ D u u C v v 2.u v/
D kuk2 C kvk2 2.u v/: (3.28)
Above, we wrote the norm of a vector u in terms of the dot product. Now, we want
to do the opposite. So, can we write the dot product of two vectors using their norms?
The answer of this question is given in the following theorem.
1 1
uv D ku C vk2 ku vk2 : (3.29)
4 4
3.4 Orthogonality in Rn
141 3
Proof
To prove (3.29), we subtract (3.28) from (3.27), obtaining
3.4 Orthogonality in Rn
From Euclidean geometry we know that two vectors u and v in R2 are orthogonal
or perpendicular if and only if the acute angle between them is D =2. This is a
geometric definition. To define the orthogonality of vectors in Rn for any n 2, we
need to find instead an algebraic formulation. We have seen from formula (3.11) that if
we know the angle between two vectors, then we can define the algebraic quantity that
we called the dot product. Thus, if D =2, then we have from (3.11) that
u v D 0: (3.30)
We see immediately that the two vectors are orthogonal if and only if their dot product
is zero. Since (3.30) is an algebraic definition of the orthogonality, then we can take it
as a definition of the orthogonality of vectors in Rn for any n 2.
u v D 0: (3.31)
Example 3.14
1. The vectors e1 D .1; 0/, and e2 D .0; 1/ are orthogonal in R2 since e1 e2 D 0.
2. In R3 , the vectors e1 D .1; 0; 0/; e2 D .0; 1; 0/ and e3 D .0; 0; 1/ are pairwise orthogonal
since
e1 e2 D e2 e3 D e3 e1 D 0:
Example 3.15
1. Show that the vectors u D .a; b/ and v D .b; a/ are orthogonal in R2 .
2. Use the above result to find two vectors that are orthogonal to w D .2; 3/.
3. Find a unit vector that is orthogonal to y D .5; 12/.
142 Chapter 3 • Euclidean Vector Spaces
Solution
1. We have
3 u v D ab ba D 0;
1 1 1
uD zD p .12; 5/ D .12; 5/ D .12=13; 5=13/:
kzk 2
.12/ C .5/ 2 13
y u D 5.12=13/ C 12.5=13/ D 0:
#» #» v
kFv k D k F k cos
kvk
#»
where v=kvk is the unit vector in the direction of v. The vector Fv is called the
#»
orthogonal projection of F on v.
Before we define the projection of a vector u on a nonzero vector v, we prove the
following assertion.
Theorem 3.4.1
Let u and v be two vectors in Rn . Assume that v is a nonzero vector. Then, u can be
written in exactly one way in the form
u D w1 C w2 ; (3.32)
w1
Proof
Since w1 D kv, where k is a scalar, and since w2 is orthogonal to v, we have w2 v D 0.
Thus, we obtain
u v D .w1 C w2 / v
D w1 v C w2 v
D .kv/ v
D kkvk2 :
Consequently,
uv
kD ;
kvk2
whence
uv uv
w1 D v and w2 D u v:
kvk2 kvk2
t
u
uv
projv u D v: (3.33)
kvk2
uv v
w1 D ;
kvk kvk
uv
kvk
Example 3.16
Let u D .1; 1; 0/ and v D .2; 0; 1/. Then, we have
uv
projv u D v
kvk2
2
D p .2; 0; 1/
5
4 2
D p ; 0; p :
5 5
Pythagoras’ Theorem in Rn
In Euclidean geometry, Pythagoras’ theorem says that in a right triangle as in ⊡
Fig. 3.17, we have
Since the angle between the vectors u and v is D =2, we have by making use of
(3.11) that u v D 0. Consequently, instead of the geometric property discussed above,
we formulated an algebraic property that the two vectors should satisfy in order for the
Pythagoras theorem to hold. That is to say, if the dot product of two vectors is zero, then
the identity (3.34) is satisfied. Thus, using this algebraic assumption, we can generalize
the Pythagoras theorem for any two orthogonal vectors in Rn and we have the following
statement.
ku C vk2 D .u C v/ .u C v/ D .u u/ C .v v/ C 2.u v/
D kuk2 C kvk2 ;
3.5 Exercises
Show that:
1. Each row of A is a unit vector and each column of A is a unit vector.
2. The row vectors of A are pairwise orthogonal.
3. The column vectors of A are pairwise orthogonal.
A matrix with the above properties is called an orthogonal matrix. See Sect. 8.1.
Solution
We denote by
1 1 1 1 1 1 1 1
r1 D p ; p ; 0 ; r2 D ; ; p ; r3 D ; ; p
2 2 2 2 2 2 2 2
Similarly, we have
r
1 1 1
3 kr2 k D kr3 k D kc1 k D kc2 k D C C D 1:
4 4 2
2. We have
1 1 1 1
r1 r2 D p p D 0:
2 2 2 2
Similarly,
r1 r3 D r2 r3 D 0:
c1 c2 D c2 c3 D c3 c1 D 0;
Exercise 3.2
Let u and v be two vectors in Rn .
1. Show that
ˇ ˇ
ˇ ˇ
ˇkuk kvkˇ ku vk: (3.36)
2. Prove that
Solution
1. To establish (3.36), we write u D v C .u v/. Then, by the triangle inequality (3.22),
kuk D kv C .u v/k
kvk C ku vk:
That is
kvk D ku C .v u/k
kuk C kv uk:
Therefore,
and
which is exactly (3.37). Of course equality holds if and only if u v D 0, which means
if and only if the vectors are orthogonal. Geometrically, this means that the product of the
lengths of the diagonals in a parallelogram as in ⊡ Fig. 3.15 is less than or equal to the sum
kuk2 C kvk2 , and we have equality if and only if the parallelogram is a square, that is, if and
only if the vectors u and v are orthogonal. J
Exercise 3.3
1. Suppose that u; v; w, and z are vectors in Rn such that
u C v C w C z D 0:
2. Prove that
1 1 1 1
16 .a C b C c C d/ C C C (3.43)
3 a b c d
Solution
1. We first compute kwk2 , obtaining
kwk2 D w w
We have
.u v/ C .u u/ D .u .u C v// and .v z/ C .u z/ D .u C v/ z:
Consequently,
.u v/ C .u u/ C .v z/ C .u z/ D u .u C v/ C .v C u/ z
D .u C v/ .u C z/:
since u u D kuk2 . Now, from this last formula, we see that if u C v and u C z are orthogonal,
then
that is
1 1 1 1
16 .a C b C c C d/ C C C ;
a b c d
which is exactly inequality (3.43). In fact, the above inequality can be generalized to any
positive numbers ai ; 1 i n, as follows:
! !
X
n Xn
1
2
n ai :
iD1
a
iD1 i
1
ku vk2 D 2kw uk2 C 2kw vk2 4kw .u C v/k2 : (3.46)
2
Solution
Recall the parallelogram identity:
wu wv
UD and VD ;
2 2
we get
1 1 1 1
kw .u C v/k2 C k .u v/k2 D 2k .w u/k2 C 2k .w v/k2 ;
2 2 2 2
150 Chapter 3 • Euclidean Vector Spaces
a d
3 b
or
1 1 1 1
kw .u C v/k2 C ku vk2 D kw uk2 C k.w v/k2 :
2 4 2 2
1 2
a2 C b2 D c C 2d2 :
2
J
Exercise 3.5
Prove that the following statements are equivalent:
1. ku vk D ku C vk:
2. ku C vk2 D kuk2 C kvk2 :
3. The vectors u and v are orthogonal.
Solution
In order to show that the above statements are equivalent, we need to show that .1/ ) .2/ )
.3/ ) .1/:
First, assume that ku vk D ku C vk: Then,
ku vk2 D ku C vk2 :
and
u v D u v;
where equality holds if and only if kuk D kvk or kuk C kvk D ku vk.
Solution
We have
2
u v u v u v
D
kuk kvk kuk kvk kuk kvk
1 1 2
D .u u/ C .v v/ .u v/
kuk2 kvk2 kukkvk
2
D 2 .u v/: (3.51)
kukkvk
Hence
2
u v 1
2 2
kuk kvk D kukkvk ku vk .kuk kvk/ :
152 Chapter 3 • Euclidean Vector Spaces
Multiplying both side in the last identity by .kuk C kvk/2 =4 and adding ku vk2 to both
sides, we get
2 2
3 u v kuk C kvk
ku vk2
kuk kvk 2
.kuk C kvk/2
D ku vk2 ku vk2 .kuk kvk/2
4kukkvk
.kuk kvk/2
D .kuk C kvk/2 ku vk2 : (3.52)
4kukkvk
Using the triangle inequality (3.22), we deduce that the right-hand side in (3.52) is positive.
Therefore, we obtain
2 2
u v
ku vk2 kuk C kvk 0;
kuk kvk 2
with
!1=p !1=q
X
n X
n
kukp D jui j
p
and kvkq D jvi j
q
iD1 iD1
Solution
Without loss of generality, we may assume that
where
ui vi
ui D and vi D ; i D 1; : : : ; n:
kukp kvkq
kukp D kvkq D 1:
jajp jbjq 1 1
jabj C ; C D 1;
p q p q
jui jp jvi jq
jui vi j C :
p q
X
n
D kukp kvkq jui vi j
iD1
!
Xn
jui jp X jvi jq
n
kukp kvkq C
iD1
p iD1
q
kukp kvkq
D kukp kvkq C D kukp kvkq ;
p q
since kukq D kvkq D 1 and 1=p C 1=q D 1. This yields the desired result. J
1
This Young inequality can be easily shown by using the concavity of the logarithm function.
154 Chapter 3 • Euclidean Vector Spaces
Solution
The inequality is trivial if xi D yi D 0 for all 1 i n.
Now, without loss of generality, assume that
3
X
n
.xi C yi /p ¤ 0:
iD1
We have
X
n X
n X
n
.xi C yi /p D xi .xi C yi /p1 C yi .xi C yi /p1 : (3.54)
iD1 iD1 iD1
Applying the Hölder inequality to the two terms in the right-hand side of (3.54), with
q D p=.p 1/, we get
!1=p !1=q
X
n X
n
p
X
n
xi .xi C yi / p1
xi .xi C yi /p
iD1 iD1 iD1
and
!1=p !1=q
X
n X
n
p
X
n
yi .xi C yi / p1
yi .xi C yi /p :
iD1 iD1 iD1
Exercise 3.9
Let u and v be two nonzero vectors in Rn . Prove that
1 1
u kukv D v kvku : (3.56)
kuk kvk
3.5 Exercises
155 3
Solution
We have
2
1 1 1
u kukv D u kukv u kukv
kuk kuk kuk
1
D .u u/ 2.u v/ C kuk2 .v v/
kuk2
D 1 2.u v/ C kuk2 kvk2 :
Taking the square root of both sides in the above identity, then we obtain (3.56). J
Exercise 3.10
Let u and v be two nonzero vectors in Rn .
1. Show that if kuk ¤ kvk, then for any
> 0, we have
kuk
C1 kvk
C1
k.ukuk
vkvk
/k ku vk: (3.57)
kuk kvk
Solution
1. To show (3.57), it is enough to prove that (since the right-hand side of (3.57) is positive
for
> 1)
k.ukuk
vkvk
/k2 .kuk kvk/2 .kuk
C1 kvk
C1 /2 ku vk2 :
156 Chapter 3 • Euclidean Vector Spaces
By expanding both sides in the above inequality, and using the fact that
k.ukuk
vkvk
/k2 D ukuk
vkvk
ukuk
vkvk
3
D kuk2
C2 C kvk2
C2 2kuk
kvk
.u v/
and
we obtain
h ih i
kuk2
C2 C kvk2
C2 2kuk
kvk
.u v/ kuk2 C kvk2 2kukkvk
h ih i
kuk2
C2 C kvk2
C2 2kuk
C1 kvk
C1 kuk2 C kvk2 2.u v/ :
After cancelling like terms and moving all terms to the right side, we get the equivalent
inequality
2kukkvk 2.u v/ 0:
.kuk
C2 kvk
C2 /.kuk
kvk
/
1
D k.ukuk
vkvk
/k :
kuk
C1 kvk
C1
Applying (3.57) to the last term on the right-hand side in the above identity, then (3.58)
holds. J
3.5 Exercises
157 3
Exercise 3.11
Let u and v be two nonzero vectors in Rn . Show that
1
uv ku C vk2 .kuk kvk/2 (3.60)
4
and
1
uv .kuk kvk/2 ku vk2 : (3.61)
4
Solution
To show (3.60), we have, by the polarization identity (identity (3.29))
1 1
uv D ku C vk2 ku vk2 : (3.62)
4 4
Now, using (3.38), we obtain
which gives
2kukkvk C 2.u v/ 0:
u v D .u2 v3 u3 v2 ; u3 v1 u1 v3 ; u1 v2 u2 v1 /:
158 Chapter 3 • Euclidean Vector Spaces
Show that
Solution
To show (3.64), we compute all the terms in this identity. So, we have
2.u2 v3 u3 v2 C u3 v1 u1 v3 C u1 v2 u2 v1 /: (3.65)
.u v/2 D .u1 v1 C u2 v2 C u3 v3 /2
D u21 v12 C u22 v22 C u23 v32 C 2.u1 v1 u2 v2 / C 2.u1 v1 u3 v3 / C 2.u2 v2 u3 v3 /: (3.66)
ku vk2 C .u v/2
D u22 v32 C u23 v22 C u23 v12 C u21 v32 C u21 v22 C u22 v12 C u21 v12 C u22 v22 C u23 v32
D kuk2 kvk2 :
J
159 4
Belkacem Said-Houari
In Chap. 3 we studied the properties of the space Rn and extended the definition of its
algebraic structure to a general vector space (Definition 3.2.2). In this chapter, we study
some properties of general vector spaces. We first recall Definition 3.2.2.
EE!EW .v; w/ 7! v C w;
So, E is a vector space over the field K if .E; C/ is an Abelian group and for all x; y
in E and for all ; in K the following properties hold:
1. 1 x D x.b
2. .x/ D ./x.
3. .x C y/ D x C y.
4. . C /x D x C x.
a
A commutative field is a commutative unitary ring (Definition 1.2.1) such that e ¤ 0 and all nonzero
elements are invertible with respect to multiplication. For example, .R; C; / and .C; C; / are fields.
b
Here 1 is the identity element with respect to the multiplication in K.
Example 4.1
As we have seen in Example 3.2, Rn is a vector space over the field R.
160 Chapter 4 • General Vector Spaces
Then, it is easily checked that .C; C; / is a vector space over R. In fact this vector space has
the same algebraic property as R2 . See Example 5.8 for more details.
EE!EW .f ; g/ 7! f C g
and
RE !E W .; f / 7! f D f
where
for all x in X. Then one can easily check that E D .F .X; R/; C; / is a vector space over the
field R.
4.2 Properties of Vector Spaces
161 4
We have seen in Example 3.2 that Rn is a vector space over the field R. In fact, this
is true for any field K.
Let us list some important algebraic properties of vector spaces. These properties will
be used later on in the book. We start with the following theorem.
where 0K is the zero of the field K (the identity element with respect to the addition in
K) and 0E is the zero of E (the identity element of the Abelian group .E; C/). If K D R,
we denote 0R simply by 0.
Proof
To prove the first property, we have by using axiom (4) in Definition 4.1.1 that
which holds for any and in K and for any u in E. Recall that if is in K, then is also
4 in K, since .K; C/ is an Abelian group with respect to addition in K and is the inverse of
with respect the addition in K. Since (4.1) holds for any and in K, setting D we
have
0K u D 0E :
Next, to prove the second property, we use axiom (3) in Definition 4.1.1 and the fact that
.E; C/ is an Abelian group (v is in E for any v in E), to get
.u v/ C v D ..u v/ C v/ D v
for any u and v in E and any in K. This gives, for the same reason as before,
0E D 0E :
Finally, for the fourth property, we deduce from the first and the second properties that
0K u D 0E D 0E :
u D 0E ;
implies either D 0K (the second property), or ¤ 0K and in this case is invertible (since
.K; C; / is a commutative field) and we have
where we have used axiom (1) in Definition 4.1.1 and the fact that u D 0E : Thus, the fourth
property holds and the proof of Theorem 4.2.1 is complete. t
u
4.3 Subspaces
163 4
4.3 Subspaces
Verifying all the axioms in Definition 3.2.2 seems rather tedious, but as we will see in
many situations, we do not need to verify all of them since in many cases, the vector
spaces under consideration are contained in a larger vector space. For example we can
easily show that the set of 2 2 matrices with real entries is a vector space over R with
the usual matrix operations of addition and multiplication by scalars. But this space is
contained in the larger vector space .Mmn .R/; C; /. So, in this section, we will discuss
how to recognize such vector spaces. We start with the following definition.
Now, we should gain something from the fact that F is a subset of the larger space
E. Indeed, to show that F is itself a vector space under the addition and the scalar
multiplication defined on E, it is not necessary to verify all the axioms in Definition 4.1.1
since many of them are “inherited” from E. For example, there is no need to prove that
u C v D v C u for all elements of F since this property is already satisfied for all the
elements of E (recall that .E; C/ is an Abelian group) and hence for all elements of F,
since F is a subset of E. However, other properties are not inheritable. For example, for
two elements u and v in F, u C v is of course in E, but it may not lie in the subset F;
similarly, u is in E for any in K, but it might not be in F.
Thus, we can easily prove the following theorem.
ⓘ Remark 4.3.2 The first property in Theorem 4.3.1 means that F is a subgroup of the
group .E; C/ (see Exercise 1.11). In addition, this property can be replaced by
u C v 2 F: (4.3)
Of course, this property alone does not imply that F is a subgroup of .E; C/.
164 Chapter 4 • General Vector Spaces
The proof of Theorem 4.3.1 is very simple and we omit it. We simply need to verify
that the axioms of Definition 4.1.1 are satisfied if and only if the two properties of
Theorem 4.3.1 hold.
Now, we may combine the two properties in Theorem 4.3.1 and state the following
result.
4
Theorem 4.3.3
Let E be a vector space over a field K and F be a nonempty subset of E. Then, F is
subspace of E if and only if for all u and v in F and all and in K, we have
u C v 2 F: (4.4)
Proof
First, assume that F is a subspace of E. Then, by the second property in Theorem 4.3.1, u
and v belong to F; therefore, (4.3) yields (4.4). Conversely, assume that (4.4) is satisfied,
then we get for D D 1 the property (4.3). In addition, for D 0K , we get the second
property in Theorem 4.3.1. This completes the proof of Theorem 4.3.3. t
u
ⓘ Remark 4.3.4 From now on, we consider Theorem 4.3.1 or Theorem 4.3.3 as the
definition of a subspace.
Example 4.6
Let E be a vector space over a field K and F be a subspace of E. Then F always contains 0E .
In addition, the set f0E g is itself a subspace of E which is the smallest subspace of E, called
the zero subspace of E. Also, E is a subspace of itself and it is the largest subspace of E.
Example 4.7
Let E be a vector space over a field K and u be an element of E. Then, the set
F D fu W 2 Kg
is a subspace of E.
Solution
Let v and w be two elements of F. Then there exist and in K such that
v D u and w D u:
Then
v C w D u C u D . C /u;
˛v D ˛.u/ D .˛/u:
Solution
We may define F to be the set of all vectors whose dot product with u is zero (Defini-
tion 3.4.1): if v 2 F, then u v D 0. It is clear that F is not empty, since u 0Rn D 0:
This implies that 0Rn 2 F. Now, let u and w be two elements of F and ˛ and ˇ be two real
numbers. Then we have, by using the properties of the dot product,
Solution
First of all, it is clear that the set F is not empty since it does contain 0E because
0E D 0K u1 C 0K u2 C C 0K un :
Second, let u and v be two elements of F and ˛ and ˇ be two elements of K. Then there exist
i ; 1 i n, and i ; 1 i n, in K such that
u D 1 u1 C 2 u2 C C n un and v D 1 u1 C 2 u2 C C n un :
Consequently,
˛u C ˇv D ˛.1 u1 C 2 u2 C C n un / C ˇ.1 u1 C 2 u2 C C n un /
D .˛1 /u1 C .˛2 /u2 C C .˛n /un C .ˇ1 /u1 C .ˇ2 /u2 C C .ˇn /un
Solution
We will show that F1 is a subspace of F .R; R/, leaving it to the reader to show that F2 is
also a subspace of F .R; R/.
First, it is clear that F1 is not empty since the zero function 0F .R;R/ , defined as
D ˛f .x/ C ˇg.x/
D .˛f C ˇg/.x/
for all x in R, This means that ˛f C ˇg is an even function and thus it is an element of F1 .
Therefore, F1 is a subspace of F .R; R/. J
Solution
First, it is clear that
" #
00
0M2 .R/ D
00
belongs to both F1 and F2 . Second, it is clear that the sum of two upper triangular matrices
is an upper triangular matrix and also the multiplication of an upper triangular matrix by a
scalar yields an upper triangular matrix. This means that F1 is a subspace of M2 .R/. We can
show similarly that F2 is a subspace of M2 .R/. J
where A is a matrix in Mmn .R/. Show that N is a subspace of Rn . This subspace is called
the null space of the matrix A.
Solution
First of all N is not empty, since A0Rn D 0Rm , which means that 0Rn belongs to N . Now,
let u and v be two elements in N and ˛ and ˇ be two elements of R. Then, we have
Hence, N is a subspace of Rn . J
In the following theorem we will show that if we have a family of subspaces of a vector
space E, then we can always construct other subspaces from this family.
F D F1 \ F2
(Continued )
1
We may also denote N .A/ D Ker.A/.
168 Chapter 4 • General Vector Spaces
is a subspace of E.
Proof
To simplify things, we prove the above theorem for two vector spaces F1 and F2 . The same
prove can be adapted for the general case.
It is clear that
F D F1 \ F2
is not empty, since 0E belongs to both subspaces F1 and F2 due to the fact that both are
subspaces of E.
Next, consider two elements u and v in F and two elements ˛ and ˇ in K. Then, u and
v are also elements of F1 and of F2 . Then, we deduce that ˛u C ˇv is an element of F1 and
F2 , again because F1 and of F2 are subspaces of E, and therefore, ˛u C ˇv is an element of
F. This shows that F is a subspace of E. t
u
ⓘ Remark 4.3.7 If F1 and F2 are two subspaces of a vector space E, then the union
F D F1 [ F2
As we have seen above, the union of two subspaces is not necessarily a subspace.
However, for any subspaces F1 and F2 of a vector space E, there is at least one subspace
containing F1 and F2 , called their sum and denoted F1 C F2 .
4.3 Subspaces
169 4
Proof
It is clear that 0E belongs to both F1 and F2 sine both are subspaces of E. In addition, since
we can write
0E D 0E C 0E ;
u D u1 C u2 and v D v1 C v2 ;
˛u C ˇv D ˛.u1 C u2 / C ˇ.v1 C v2 /
D .˛u1 C ˇv1 / C .˛u2 C ˇv2 /:
Now, since F1 and F2 are subspaces of E, ˛u1 C ˇv1 2 F1 and ˛u2 C ˇv2 2 F2 , whence
˛u C ˇv 2 F. Consequently, F is a subspace of E.
Now, if U is another subspace of E containing F1 and F2 , then U contains all the elements
u1 of F1 as well as all the elements u2 of F2 , hence, it contains all the elements u D u1 C u2 ,
since U is a subspace. Thus, U contains all the elements u of F. Hence, F1 CF2 is the smallest
subspace containing F1 and F2 . t
u
ⓘ Remark 4.3.9 Let E be a vector space over a field K and F be a subspace of E. Define
˚
CE .F/ D u 2 E such that u … F :
Then we have
E D F C CE .F/;
We have seen in Theorem 3.4.1, that if we fix a vector v in Rn , then each vector u in Rn
can be written in exactly one way in the form
4 u D w1 C w2 ; (4.5)
We have seen in Example 4.7 that F1 is a subspace of Rn and in Example 4.8 that F2 is
also a subspace of Rn .
Form (4.5), we can say that
Rn D F1 C F2 ;
with the additional property, that each element of Rn can be written in a unique way as
the sum of an element of F1 and an element of F2 . In this case, we say that Rn is the
direct sum of F1 and F2 . More generally, we have
E D F1 ˚ F2 ;
u D v C w:
Theorem 4.3.10
Let F1 and F2 be two subspaces of a vector space E. Then E is the direct sum of F1
and F2 if and only if E D F1 C F2 and F1 \ F2 D f0E g.
4.3 Subspaces
171 4
Proof
First, assume that E D F1 ˚ F2 . Let u be an element in F1 \ F2 . Then we can write u as
u D u C 0E ;
with u 2 F1 and 0E 2 F2 , or
u D 0E C u;
with 0E 2 F1 and u 2 F2 . Since u can be written in exactly one way as the sum of an element
of F1 and an element of F2 , we deduce that u D 0E . Therefore, F1 \ F2 D f0E g.
Conversely, assume that E D F1 C F2 and F1 \ F2 D f0E g and let u be an element of E
such that
u D v1 C w1 D v2 C w2
v1 v2 D w1 w2 :
v1 v2 D w1 w2 D 0E :
Example 4.13
Consider the two subspaces F1 of even functions and F2 of odd functions defined in
Example 4.10. Show that
F .R; R/ D F1 ˚ F2 :
Solution
Let f be a function in F .R; R/. Then for all x in R, f can be written as
with
f .x/ C f .x/
g.x/ D D g.x/
2
Thus, we have shown that F .R; R/ D F1 C F2 : It remains to prove that F1 \ F2 D f0F .R;R/ g.
Let f be an element in F1 \ F2 ; then f is odd and even at the same time. That is, for all x in
R, we have
This gives f .x/ D 0 for all x in R. Hence f D 0F .R;R/ . Therefore, F1 \ F2 D f0F .R;R/ g and
consequently, Theorem 4.3.10 implies that F .R; R/ D F1 ˚ F2 . J
In this section, we investigate whether the elements in a given set of a vector space are
interrelated in the sense that one element can be written as a linear combination of the
others, as we have seen in Example 4.9. As we will see later, it is really important to
know if such a relation exists. Now, we start with the following definition.
1 u1 C 2 u2 C C n un D 0E : (4.6)
1 u1 C 2 u2 C C n un D 0E
implies that 1 D 2 D D n D 0K :
4.4 Linear Independence
173 4
Example 4.14
In the vector space R2 , the two vectors e1 D .1; 0/ and e2 D .0; 1/ are linearly independent,
since the relation
1 e1 C 2 D 0R2 ;
implies that 1 D 2 D 0. But the vectors u1 D .1; 1/ and u2 D .2; 2/ are linearly dependent,
since
2u1 C u2 D 0R2 :
Example 4.15
Show that in the vector space F .R; R/, the two functions est and ert are linearly independent
if and only if s ¤ r.
Solution
First, assume that s ¤ r and let ˛ and ˇ be two real numbers such that
˛est C ˇert D 0:
˛sest C ˇrert D 0:
Multiplying the first equation by s and subtracting the result from the second equation, we
obtain ˇ.r s/et D 0. Since s ¤ r, we have ˇ D 0. Plugging this into the first equation, we
obtain ˛ D 0. Hence the two functions est and ert are linearly independent.
Conversely, assume that the functions est and ert are linearly independent and r D s.
Then, we have
est ert D 0;
which contradicts the linear independence of est and ert . Thus, necessarily r ¤ s. J
Theorem 4.4.1
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be elements of E. The elements
u1 ; u2 ; : : : ; un are linearly dependent if and only if (at least) one of the elements is a
linear combination of the rest.
174 Chapter 4 • General Vector Spaces
Proof
First, assume for instance that u1 is a linear combination of the other elements. Then, there
exist ˛2 ; ˛3 ; : : : ; ˛n 2 K such that
u1 D ˛2 u2 C ˛3 u3 C C ˛n un :
4
That is
u1 ˛2 u2 ˛3 u3 ˛n un D 0E :
1 u1 C 2 u2 C C p up C C n un D 0E :
This gives
1 2 n
up D u1 u2 un ;
p p p
Theorem 4.4.2
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be elements of E. If
u1 ; u1 ; : : : ; un are linearly independent, then ui ¤ 0E for all 1 i n. In particular,
if u is an element of E, then the family fug is linearly independent if and only if
u ¤ 0E .
Proof
The proof is straightforward since if there is 1 p n with up D 0E , then, we have for
some ¤ 0K ,
0K u1 C 0K u2 C C up C C 0K un D 0E ;
Theorem 4.4.3
Let E be a vector space over a field K and u1 ; u2 ; : : : ; un be linearly independent
elements of E. Let 1 ; 2 ; : : : ; n and 1 ; 2 ; : : : ; n be elements of K such that
1 u1 C 2 u2 C C n un D 1 u1 C 2 u2 C C n un : (4.7)
Then i D i ; i D 1; : : : ; n.
Proof
We can write (4.7) as
i D i ; i D 1; : : : ; n:
t
u
In this section, our main goal is to show how a small finite set of vectors or elements of a
vectors space (called a basis) can be used to describe all the other vectors in that vector
space. We introduce this fundamental property of basis in linear algebra. To explain the
idea, consider the vector space R2 and take the two vectors e1 D .1; 0/ and e2 D .0; 1/
of R2 . Then any vector u D .x; y/ of R2 can be written as a linear combination of the
vectors e1 and e2 as
u D xe1 C ye2 :
In this case, we say that the set fe1 ; e2 g spans the vector space R2 . In addition, since the
vectors e1 and e2 are linearly independent (Example 4.14), the above linear combination
is unique (Theorem 4.4.3). That is to say, there is one and only one way to write u
as a combination of the two vectors e1 and e2 . So, in order to be able to write u in a
unique way as a linear combination of the two vectors e1 and e2 , two properties should
be satisfied:
▬ The set fe1 ; e2 g spans R2 , and
▬ the two vectors e1 and e2 are linearly independent.
176 Chapter 4 • General Vector Spaces
Any set of two vectors in R2 satisfying the above two properties is called basis. Now,
we extend the above idea to general vector spaces.
u D 1 u1 C 2 u2 C C n un :
We may also say that the elements u1 ; u2 ; : : : ; un generate the vector space E and
write
E D span.u1 ; u2 ; : : : ; un /:
Example 4.16
It is trivial to see that in the vector space Rn , the set of vectors fe1 ; e2 ; : : : ; en g with
spans Rn .
Example 4.17
Let Pn be the set of all polynomials with real coefficients and of degree less than or equal
to n. One can easily show that this set is a vector space over the field R. Now, the family
f1; x; x2 ; : : : ; xn g spans this vector space, since any polynomial p.x/ in Pn can be written as
p.x/ D a0 C a1 x C C an xn ;
u D 1 u1 C 2 u2 C C n un
Example 4.19
Show that the family f1; x; x2 ; : : : ; xn g is a basis of the vector space Pn defined in
Example 4.17.
Solution
We have seen in Example 4.17 that the family f1; x; x2 ; : : : ; xn g spans Pn . It remains to show
that this family is linearly independent. So, let 0 ; 1 ; : : : ; n be elements in R such that for
all x in R, we have
0 C 1 x C C n xn D 0: (4.8)
Since (4.8) holds for all x in R, it also holds for x D 0. Thus, putting x D 0 in (4.8), we get
0 D 0. Now, by taking the derivative of (4.8) with respect to x, we obtain
0 D 1 D D n D 0:
Thus, the family f1; x; x2 ; : : : ; xn g is linearly independent and hence forms a basis in Pn . J
Example 4.20
We consider the space M2 .R/ of 2 2 square matrices over the field R.
We define in this space the elements
" # " # " # " #
10 01 00 00
M1 D ; M2 D ; M3 D ; M4 D :
00 00 10 01
Solution
First, we need to show that the family fM1 ; M2 ; M2 ; M4 g spans M2 .R/. So, consider a matrix
" #
ab
MD
cd
1 M1 C 2 M2 C 3 M3 C 4 M4 D 0M2 .R/ :
That is
4 " # " # " # " # " #
10 01 00 00 00
1 C 2 C 3 C 4 D ;
00 00 10 01 00
or equivalently,
" # " #
1 2 00
D :
3 4 00
In this section, we will associate a number to a vector space. Recall that the cardinality
of a set B is the number of elements of B.
Example 4.21
Let us list the dimensions of some spaces that we have encountered before.
▬ The dimension of Rn is n since fe1 ; e2 ; : : : ; en g is a basis of Rn (Example 4.18).
▬ The dimension of M2 .R/ is 4 since fM1 ; M2 ; M3 ; M4 g is a basis of M2 .R/ (Exam-
ple 4.20). In general, we can easily prove that dimK Mmn .K/ D m n:
▬ We have proved in Example 4.19 that the set f1; x; x2 ; : : : ; xn g is a basis of the space Pn .
Consequently, dimR Pn D n C 1:
4.6 Dimension of a Vector Space
179 4
Now, we state one of the main theorems in linear algebra.
The proof of this theorem is based on Zorn’s lemma and is beyond the scope of this
book.
Example 4.22
In R2 , the set S D fe1 ; e2 g D f.1; 0/; .0; 1/g is maximal linearly independent subset, since
for any vector u D .x; y/ in R2 , the set S [ fug D fe1 ; e2 ; ug is linearly independent because
u D xe1 C ye2 .
Theorem 4.6.2
Let E be a vector space over a field K and let S D fu1 ; u1 ; : : : ; un g be a maximal set of
linearly independent elements of E. Then, S is a basis of E.
Proof
We just need to show that S spans E. That is, each element of E can be expressed as a linear
combination of the elements of S. Let w be an element in E. Since S is a maximal linearly
independent set, the elements w; u1 ; u2 ; : : : ; un are linearly dependent. Hence there exists
0 ; 1 ; : : : ; n in K, not all 0K , such that
0 w C 1 u1 C C n un D 0E : (4.10)
1 2 n
wD u1 u2 C C un :
0 0 0
Theorem 4.6.3
Different bases of the same vector space E have the same number of elements (the
same cardinality).
Proof
To prove this lemma, we assume that v1 ; v2 ; : : : ; vm are linearly independent. Since B D
fu1 ; u1 ; : : : ; un g is a basis of E, each element from the set fv1 ; v2 ; : : : ; vm g can be written in a
unique way as a linear combination of the elements of B. For instance, we have
v1 D 1 u1 C 2 u2 C C n un
1 2 n
u1 D v1 u2 un :
1 1 1
G1 D fv1 ; u2 ; : : : ; un g
w D a1 u1 C a2 u2 C C an un
1 2 n
D a1 v1 u2 un C a2 u2 C C an un
1 1 1
a1 a1 2 a1 n
D v1 C a2 u2 C C an un
1 1 1
v2 D 1 v1 C 2 u2 C C n un :
1 1 3 n
u2 D v2 v1 u3 un :
2 2 2 2
G2 D fv1 ; v2 ; u3 ; : : : ; un g
also spans E. The idea now is to continue our procedure and to replace u1 ; u2 ; : : : by
v1 ; v2 ; : : : to conclude at the end (by induction) that
Gn D fv1 ; v2 ; : : : ; vn g
spans E, and since m > n, the elements vnC1 ; vnC2 ; : : : ; vm are linear combinations
of v1 ; v2 ; : : : ; vn , which contradicts our assumption on the linear independence of
v1 ; v2 ; : : : ; vm . This concludes our proof. t
u
Theorem 4.6.5
Let E be a vector space over a field K with dimK E D n and let u1 ; u2 ; : : : ; un be
linearly independent elements of E. Then the set fu1 ; u2 ; : : : ; un g constitutes a basis
of E.
Proof
According to Lemma 4.6.4, the set fu1 ; u2 ; : : : ; un g is a maximal set of linearly independent
elements of E, thus, Theorem 4.6.2, implies that this set is a basis of E. t
u
ⓘ Remark 4.6.6 Let E be a vector space over a field K with dimK E D n. Then we deduce
from above that:
▬ Any set of linearly independent elements of E has at most n elements.
▬ Any set that has at least n C 1 elements is linearly dependent.
182 Chapter 4 • General Vector Spaces
As we have said before, any subspace F of a vector space E is itself a vector space. So,
we need to find the dimension of this subspace and compare it with the dimension of E.
Thus, we have the following theorem.
4
Theorem 4.6.7 (Dimension of a Subspace)
Let E be a vector space over a field K with dimK E D n .n > 0, that is E ¤ f0E g). Let
F be a subspace of E with F ¤ f0E g. Then,
dimK F dimK E:
In particular, if
Proof
Suppose that dimK F > dimK E. Then there exists at least one basis in F with at least n C 12
elements, that is, there exists at least one linearly independent set of F with at least n C 1
elements. But each linearly independent set of F is also linearly independent in E. Hence,
we obtain a linearly independent set in E with at least n C 1 elements, which contradicts
Remark 4.6.6. Thus, dimK F dimK E.
Now, if dimK F D dimK E, then there exists a basis B D fu1 ; u2 ; : : : ; un g in F. Then,
B is also a basis of E (Theorem 4.6.5). Therefore, any v 2 E can be written as a linear
combination of elements of F of the form
v D 1 u1 C 2 u2 C C n un
2
This basis has at least one element u0 ¤ 0E since F ¤ f0E g.
4.7 Exercises
183 4
out to be positive:
fv1 ; v2 ; : : : ; vn g
is a basis of E.
Proof
Since dimK E D n and r < n, the set S D fv1 ; v2 ; : : : ; vr g cannot be a basis of E, and then by
Theorem 4.6.2, S cannot be a maximal set of linearly independent elements of E. Hence, by
Definition 4.6.2, there exists vrC1 2 E such that the set S [ fvrC1 g is linearly independent.
Now, if r C 1 D n, then according to Theorem 4.6.5, S is a basis of E. If r C 1 < n, then we
repeat the same procedure until we construct (by induction) a set of n linearly independent
elements fu1 ; u2 ; : : : ; un g of E and then, this should be a basis of E due to the same reason as
before (Theorem 4.6.5). t
u
Example 4.23
We know that dimR R3 D 3. Consider the two vectors u D .1; 0; 1/ and v D .1; 3; 2/. It is
clear that u and v are linearly independent. If we now take the vector w D .1; 2; 4/, then
we may easily show that u; v, and w are linearly independent and thus form a basis of R3 .
4.7 Exercises
Exercise 4.1
Consider the vector space F .R; R/ introduced in Example 4.4, defined over the field R. Show
that the three functions in this vector space
Solution
Using the sine and cosine laws, we have
and
Multiplying (4.11) by cos q and (4.12) by . sin p/ and adding the results, we obtain
4
cos q sin.x C p/ C sin p cos.x C q/ D cos q cos p sin x C sin q sin p sin x
D .cos q cos p C sin q sin p/ sin x
D cos.p q/ sin x:
but ˛; ˇ and are not all zero, for any real numbers p and q. Consequently, f ; g, and h are
linearly dependent. J
be vectors in R4 .
1. Show that the set B D fu1 ; u2 ; u3 ; u4 g is a basis in R4 .
2. Find the components of the vector
in this basis.
Solution
1. We have seen in Example 4.21 that dimR R4 D 4. Now, since B has also four elements,
then according to Theorem 4.6.5, in order to show that B is a basis, it is enough to prove that
B is a linearly independent set. So, let 1 ; 2 ; 3 and 4 be elements in R satisfying
1 u1 C 2 u2 C 3 u3 C 4 u4 D 0R4 ; (4.13)
that is
1 .1; 2; 1; 2/ C 2 .2; 3; 0; 1/ C 3 .1; 3; 1; 0/ C 4 .1; 2; 1; 4/ D .0; 0; 0; 0/:
4.7 Exercises
185 4
This gives (see Chap. 3) the system of equations
8
ˆ 1 C 22 C 3 C 4 D 0;
ˆ
ˆ
<
21 C 32 C 33 C 24 D 0;
ˆ
ˆ 1 3 C 4 D 0;
:̂
21 2 C 44 D 0:
Now, it is clear that the above system has .1 ; 2 ; 3 ; 4 / D .0; 0; 0; 0/ as a solution. This
solution is unique, since the matrix
2 3
1 2 1 1
6 2 3 3 27
6 7
AD6 7
4 1 0 1 15
2 1 0 4
det.A/ D 2 ¤ 0:
v D ˛1 u1 C ˛2 u2 C ˛3 u3 C ˛4 u4 : (4.14)
Hence, ˛1 ; ˛2 ; ˛3 and ˛4 are the components of v in the basis B. To find these components,
we proceed as before and obtain the system of equations
8
ˆ
ˆ ˛1 C 2˛2 C ˛3 C ˛4 D 7;
ˆ
<
2˛1 C 3˛2 C 3˛3 C 2˛4 D 14;
ˆ ˛1 ˛3 C ˛4 D 1;
ˆ
:̂
2˛1 ˛2 C 4˛4 D 2:
so (4.15) becomes
2 3 2 32 3 2 3
˛1 17=2 5 13=2 2 7 0
6 7 6 76 7 6 7
6 ˛2 7 6 3 2 3 1 7 6 14 7 6 2 7
6 7D6 76 7 D 6 7:
4 ˛3 5 4 5 3 3 1 5 4 1 5 4 2 5
˛4 7=2 2 5=2 1 2 1
Consequently, the components of the vector v in the basis B are .0; 2; 2; 1/. J
Solution
1. Denote
B1 D fw1 ; w2 ; : : : ; wr ; urC1 ; : : : ; un g
4.7 Exercises
187 4
is a basis of F1 and
B2 D fw1 ; w2 ; : : : ; wr ; vrC1 ; : : : ; vm g
is a basis of F2 .
Now,
u D u C 0E ;
u D 0E C u
z D z1 C z2 :
z1 D 1 w1 C 2 w2 C C r wr C rC1 urC1 C C n un ;
and
z2 D 1 w1 C 2 w2 C C r wr C rC1 vrC1 C C m vm ;
or equivalently
X
r X
n X
m
˛i wi C j uj C k vk D 0E : (4.18)
iD1 jDrC1 kDrC1
X
m X
r X
n
k vk D ˛i wi j uj ;
kDrC1 iD1 jDrC1
X
m
which shows that k vk belongs to F1 and hence to F1 \ F2 . Thus, since S D
kDrC1
fw1 ; w2 ; : : : ; wr g is a basis of F1 \ F2 , there exist ˇ1 ; ˇ2 ; : : : ; ˇr in K such that
X
m X
r
k vk D ˇi wi ;
kDrC1 iD1
or equivalently,
X
m X
r
k vk ˇi wi D 0E :
kDrC1 iD1
ˇ1 D ˇ2 D D ˇr D rC1 D D m D 0K : (4.19)
X
n X
r X
m
j uj D ˛i wi k vk :
jDrC1 iD1 kDrC1
X
n
This shows that j uj is an element in F1 \ F2 and as before, we can show, by using the
jDrC1
fact that B1 is a linearly independent set, that
rC1 D D m D 0K : (4.20)
X
r
˛i wi D 0E ;
iD1
which gives ˛1 D ˛2 D D ˛r D 0K . J
4.7 Exercises
189 4
Consequently, the set B is linearly independent and therefore, is a basis of F1 C F2 .
Hence,
2. If E D F1 ˚F2 , then F1 \F2 D f0E g (Theorem 4.3.10) and then dimK .F1 \F2 / D
0. Thus, by (4.16), the identity (4.17) holds.
Mn .R/ D S ˚ W: (4.21)
Solution
1. It is clear that the zero matrix 0 D 0Mn .R/ satisfies
0T D 0 D 0:
.A C B/T D AT C BT D A B D .A C B/
and
1 1
AD .A C AT / C .A AT /:
2 2
Mn .R/ D S C W:
190 Chapter 4 • General Vector Spaces
It remains to show that (see Theorem 4.3.10) S \ W D f0Mn .R/ g. So, let A be an element of
the intersection S \ W. Thus, A satisfies
A D AT and AT D A:
4 This means 2A D 0Mn .R/ , and so A D 0Mn .R/ . Hence, S \ W D f0Mn .R/ g and therefore
(4.21) holds. J
ad bc ¤ 0:
det.A/ ¤ 0;
3. Show that the vectors v1 D .1; 2; 3/; v2 D .1; 1; 0/, and v3 D .3; 4; 3/ are linearly
dependent in R3 .
Solution
1. Assume that u1 and u2 are linearly independent. Then the equation
1 u1 C 2 u2 D 0R2 ; (4.22)
with 1 and 2 in R, has the trivial solution 1 D 2 D 0 as the uniques solution. Equation
(4.22) is equivalent to the system
(
a1 C c2 D 0;
b1 C d2 D 0:
This system has the trivial solution 1 D 2 D 0 as the unique solution if and only if (see
Theorem 1.2.9) the matrix
" #
ac
AD
bd
4.7 Exercises
191 4
is invertible. That is, if and only if
det.A/ D ad bc ¤ 0:
1 u1 C 2 u2 C C n un D 0Rn
This system has the unique solution 1 D 2 D D n D 0 if and only if the matrix
2 3
u11 u21 u31 ::: un1
6u ::: un2 7
6 12 u22 u32 7
AD6
6 :: :: :: :: :: 7
7 D Œu1 ; u2 ; : : : ; un
4 : : : : : 5
u1n u2n u3n : : : unn
Exercise 4.6
Let E be a vector space over a field K. Let G; F1 , and F2 be three subspaces of E. Show that
if
E D G ˚ F1 D G ˚ F2 and F1 F2 ; (4.23)
then F1 D F2 .
192 Chapter 4 • General Vector Spaces
Solution
To show that F1 D F2 , it is enough to prove that F2 F1 . Let w2 be an element of F2 , then
w2 is an element of E. Hence, using (4.23), we write w2 as
w2 D v C w1 ;
4 where v 2 G and w1 2 F1 . Since F1 F2 , we deduce that w1 2 F2 . This implies that
v D w2 w1 2 F2
where .u1 ; v1 / and .u2 ; v2 / are elements of E F. Also, for .u; v/ in E F and for in K,
we define the multiplication by scalars as
and
E F D E1 ˚ F1 : (4.24)
4.7 Exercises
193 4
Solution
1. We leave it to the reader to verify that E F satisfies the axioms in Definition 4.1.1.
2. Denote dimK E D r and dimK F D s. Then, according to Theorem 4.6.1, there exist
elements u1 ; u2 ; : : : ; ur of E and v1 ; v2 ; : : : ; vs of F such that fu1 ; u2 ; : : : ; ur g is a basis of E
and fv1 ; v2 ; : : : ; vs g is a basis of F. Now define in E F the sets
and
So, we want to show that the set B D fB1 ; B2 g which consists of the elements of B1 [ B2 is a
basis of E F. First we prove that B spans E F. So, let .u; v/ be an element of E F, with
u 2 E and v 2 F. There exist ˛1 ; ˛2 ; : : : ; ˛r and ˇ1 ; ˇ2 ; : : : ; ˇs in K such that
u D ˛1 u1 C ˛2 u2 C C ˛s us and v D ˇ1 v1 C ˇ2 v2 C C ˇs vs :
Then
Thus, it is clear that the set B spans E F. Now, since B1 and B2 are linearly independent
sets, one can easily show that B is a linearly independent set (we leave this to the reader) and
thus conclude that B is a basis of E F. Hence
E F D E1 C F1 and F1 \ F2 D f0EF g:
194 Chapter 4 • General Vector Spaces
E D U ˚ W:
Solution
Since E has finite dimension, so has U, since U is a subspace of E (Theorem 4.6.7). Denote
First, if m D n, then F D E (Theorem 4.6.7) and thus in this case we can take W D f0E g.
Second, if m < n, then, according to Theorem 4.6.1, U has a basis, say, B1 D
fu1 ; u2 ; : : : ; um g. Thus, B is a linearly independent set in U and hence is a linearly
independent set in E. By Theorem 4.6.8, there exist elements wmC1 ; : : : ; wm of E such that
the set
B D fu1 ; u2 ; : : : ; um ; wmC1 ; : : : ; wn g
W D spanfwmC1 ; : : : ; wn g:
To prove the first equality, let v be an element of E. Since B is a basis in E, then it spans E
and thus there exist 1 ; 2 ; : : : ; n in K such that
v D 1 u1 C 2 u2 C C m um C mC1 wmC1 C C n wn :
4.7 Exercises
195 4
Put
v D ˛1 u1 C ˛2 u2 C C ˛m um
and
v D ˛mC1 wmC1 C C ˛n wn :
Consequently,
˛1 u1 C ˛2 u2 C C ˛m um ˛mC1 wmC1 ˛n wn D 0E :
˛i D 0; i D 1; : : : ; n:
Solution
First, assume that F1 [ F2 is a subspace of E and let w1 be an element of F1 and w2 be an
element of F2 . Then, both w1 and w2 are elements of F1 [ F2 and since F1 [ F2 is a subspace,
w D w1 C w2
w2 D w w1 2 F1 ;
w1 D w w2 2 F2 ;
.F C G/? D F ? \ G? ; (4.25)
? ? ?
.F \ G/ DF CG : (4.26)
Solution
1. It is clear that 0Rn is an element of F ? since for any vector u 2 Rn , we have u 0Rn D 0.
In particular u 0Rn D 0 for all vectors u 2 F. Now, let u and v be two vectors in F ? and
and be two real numbers. By Theorem 3.3.5, for any w in F, we have
u D w1 C w2
3
In fact since dimK Rn is finite, we even have F D .F? /? , but the proof of .F? /? F requires some
knowledge of topology. We omit it here.
4.7 Exercises
197 4
3. To prove (4.25), we need to show that
v u D 0: (4.27)
It is clear that (4.27) is also satisfied for the elements of F and for the elements of G, since
F F C G and G F C G. Thus, u 2 F ? and u 2 G? , i.e., u 2 F ? \ G? . This implies
that F C G F ? \ G? .
Next, let w be an element of F ? \ G? . Then
w u D 0; and w v D 0;
w .u C v/ D 0:
Rn D F C G and F \ G D f0Rn g:
F ? C G? D .F \ G/? D f0Rn g? D Rn ;
since f0Rn g is orthogonal to all the elements of Rn . On the other hand, making use of (4.25),
we get
since f0Rn g is the only vector which is orthogonal to all vectors of Rn . So, we have already
proved that
Rn D F ? C G? and F ? \ G? D f0Rn g:
4 Consequently, Rn D F ? ˚ G? . J
199 5
Linear Transformations
Belkacem Said-Houari
So, by doing the above matrix multiplication, we have transformed linearly the
elements of the space Rn to elements of the space Rm . This is what we call linear
transformation, and we can generalize this notion to any two vector spaces as follows.
200 Chapter 5 • Linear Transformations
and
5
f .u/ D f .u/: (5.2)
ⓘ Remark 5.1.1 The two properties (5.1) and (5.2) can be combined in a single property
and one writes: f is a linear transformation from the vector space E to the vector space F
if for all u; v 2 E and all ; 2 K, we have
0L .E;F/ .u/ D 0F :
Clearly, 0L .E;F/ is a linear transformation. Indeed, for all u; v 2 E and all ; 2 K, we have
IdE .u/ D u:
5.2 Fundamental Properties of Linear Transformations
201 5
Then, IdE is an endomorphism, because for all u; v 2 E and all ; 2 K, we have
f .u/ D Au (5.4)
for any column vector u in Rn . This transformation f is linear. Indeed, let u, v be two vectors
in Rn , and , be real numbers. Then, using the properties of matrices, we have
D f .u/ C f .v/:
df
D. f / D :
dx
d
D.f C g/ D .f C g/ D D. f / C D.g/:
dx
One can easily verify that L .E; F/ with the above addition and multiplication by scalars
is a vector space over K.
In addition to the above algebraic structure of the vector space L .E; F/, we exhibit
5 now more properties of linear transformations.
Proof
We have
gıf W E !G
u 7! .g ı f /.u/ D gŒ f .u/:
Theorem 5.2.2
Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Then we have:
1. f .0E / D 0F .
2. f .u/ D f .u/.
3. If H is a subspace of E, then f .H/ is a subspace of F.
4. If M is a subspace of F, then f 1 .M/ is a subspace of E.
Proof
1. To show that f .0E / D 0F , since f .0E / is an element of f .E/, we need to show that for all v
in f .E/,
this will imply that f .0E / D 0F . So, let v be an element of f .E/. Then there exists u 2 E such
that v D f .u/. We have
u C 0E D 0E C u D u:
Similarly,
u C .u/ D 0E ;
This gives f .u/ D f .u/, due to the uniqueness of the inverse in the Abelian group .F; C/.
3. If H is a subspace of E, then H contains 0E . Consequently, f .H/ contains f .0E / D 0F .
This implies that f .H/ is not empty. Now, let v1 , v2 be two elements of f .H/ and , be two
elements of K. Then there exist u1 and u2 in H such that
This shows that ˛u1 C ˇu2 is an element of f 1 .M/. Consequently, f 1 .M/ is a subspace
of E. t
u
We saw in Example 4.12 that for a fixed matrix A in Mmn .R/ the null space
where f is defined in (5.4). The space N is called the null space of the matrix A, or the
kernel of the linear transformation f . We can generalize this to any linear transformation
as follows.
As usual, when introducing a new set in algebra, a natural question is to check if this
set has an algebraic structure. Here we show that the kernel of a linear transformation is
a subspace.
5.2 Fundamental Properties of Linear Transformations
205 5
Proof
We may prove Theorem 5.2.3 directly, by showing that Ker. f / satisfies the properties of
a subspace. Or, we may easily see that since f0F g is a subspace of F and since Ker. f / D
f 1 f0F g, Theorem 5.2.2 gives that Ker. f / is a subspace of E. t
u
Example 5.5
Consider the linear transformation f W R2 ! R defined as
f .u; v/ D u 2v:
Then
Proof
First, we show that .1/ implies .2/. That is, we assume that Ker. f / D f0E g and show that f
is injective. So, let u, v be elements in E such that f .u/ D f .v/, That is
where we have used the linearity of f . The identity (5.6) implies that u v 2 Ker. f /, and
since Ker. f / D f0E g, we deduce that u v D 0E , that is u D v. This implies that f is
injective.
Conversely, assume that f is injective; we need to show that Ker. f / D f0E g. First, it is
clear that f0E g Ker. f / since Ker. f / is a subspace of E (or since f .0E / D 0F ). Now, let u
be an element of Ker. f /. Then, by definition, we have
5 f .u/ D 0F D f .0E /:
Since f is injective, it follows that u D 0E . Thus, we proved that Ker. f / f0E g and therefore
Ker. f / D f0E g. Hence, we have shown that .2/ implies .1/. This completes the proof of
Theorem 5.2.4. t
u
As mentioned earlier, one of the main properties of linear transformations is that they
allow to transfer some algebraic properties from one vector space to another. As we saw
in Chap. 4, it is very important to know the dimension of a vector space. In other
words, it is important to know, or to be able to construct a basis in a vector space. More
precisely, let E and F be two vector spaces over the same field K and let f be a linear
transformation from E into F. Assume that dimK E D n and let BE D fu1 ; u2 ; : : : ; un g
be a basis of E (which exists according to Theorem 4.6.1). A natural question is: under
what conditions on f we have dimK E D dimK F? And if dimK E D dimK F, then could
BF D f f .u1/; f .u2 /; : : : ; f .un/g be a basis of F? To answer these questions, we start
with the following statement.
Theorem 5.2.5
Let E and F be two vector spaces over the same field K and f be an injective linear
transformation from E into F. Let u1 ; u2 ; : : : ; un be linearly independent elements of E.
Then f .u1 /; f .u2 /; : : : ; f .un / are linearly independent elements of F.
Proof
Let 1 ; 2 ; : : : ; n be elements of K such that
f .1 u1 C 2 u2 C C n un / D 0F :
1 u1 C 2 u2 C C n un D 0E :
5.2 Fundamental Properties of Linear Transformations
207 5
Since u1 ; u2 ; : : : ; un are linearly independent elements of E, it follows
1 D 2 D D n D 0K : (5.8)
Thus, we have proved that (5.7) yields (5.8), which proves that f .u1 /; f .u2 /; : : : ; f .un / are
linearly independent. t
u
Theorem 5.2.6
Let E and F be two vector spaces over the same field K, with dimK F D n; and f
be an injective linear transformation from E into F. Let u1 ; u2 ; : : : ; un be linearly
independent elements of E. Then the set BF D f f .u1 /; f .u2 /; : : : ; f .un /g is a basis
of F.
Proof
By Theorem 5.2.5, BF is a linearly independent set of elements in F. Since the cardinality
of BF is equal to n, Lemma 4.6.4 implies that BF is a maximal linearly independent set of
elements in F. Hence, Theorem 4.6.2 implies that BF is a basis in F. t
u
ⓘ Remark 5.2.7 Theorem 5.2.6 can be used to construct a basis of a vector space as
follows: suppose that we have a vector space F of finite dimension n and we want to
construct a basis for F. Then it is enough to find an injective linear transformation f
from another space E to F and n linearly independent elements of E. The basis of F will
then be given by the images of the n linearly independent elements of E under f .
Example 5.6
In Example 5.3, the image of the linear transformation defined by f .u/ D Au is the set of all
vectors w in Rm that can be written as the product of the matrix A and a vector u in Rn .
Proof
5 First, it is clear that 0F is an element of F since 0F D f .0E / (Theorem 5.2.2). Second, let w1 ,
w2 be elements of Im. f / and , be elements of K. Then there exist u1 and u2 in E such that
Since f is linear,
Since E is a vector space, u1 C u2 2 E. Thus, in (5.9) we have expressed w1 C w2 as the
image of an element of E. Hence, w1 C w2 2 Im. f /. Consequently, Im. f / is a subspace
of F. t
u
The image of f can be used to determine if the linear transformation is surjective (or
onto), as in the following theorem.
Proof
We prove first that .1/ implies .2/. Let w be an element of F. Since Im. f / D F, then w 2
Im. f /. Thus, by definition there exists u 2 E such that w D f .u/. This means that f is
surjective. Conversely, assume that f is surjective and let z be an element of F. Then there
exists v in E such that z D f .v/. This implies that z 2 Im. f /, which shows that F Im. f /.
Since by definition Im. f / F, we deduce that Im. f / D F, and so .2/ implies .1/. t
u
Next, we introduce one of the fundamental theorems of linear algebra that relates
the dimension of the kernel and the dimension of the image of a linear transformation
to the dimension of the vector space on which the transformation is defined.
5.2 Fundamental Properties of Linear Transformations
209 5
The dimension of the image of f , dimK Im. f /, is also called the rank of f and is denoted
by rank. f /. Also, the dimension of the kernel of f dimK Ker. f / is called the nullity of
f and is denoted by null. f /. Thus, (5.10) can be also recast as
Proof
Denote
First, if Im. f / D f0F g, then f .u/ D 0F for any u in E. This means that u 2 Ker. f /. Hence,
E D Ker. f / and thus
which is exactly (5.10). Now, if Im. f / ¤ f0F g, then we have s > 0, and so by Theorem 4.6.1
the space Im. f / has a basis. Let fw1 ; w2 ; : : : ; ws g be a basis of Im. f /. Thus, by definition
there exist u1 ; : : : ; us 2 E such that
wi D f .ui /; i D 1; : : : ; s: (5.11)
˛1 u1 C ˛2 u2 C C ˛s us D 0E :
f .˛1 u1 C ˛2 u2 C C ˛s us / D f .0E / D 0F ;
or equivalently
˛1 w1 C ˛2 w2 C C ˛s ws D 0F :
210 Chapter 5 • Linear Transformations
˛1 D ˛2 D D ˛s D 0K :
Now, if Ker. f / D f0E g, then we can show that the set fu1 ; u2 ; : : : ; us g spans E. Indeed, let
v be an element of E. Then f .v/ 2 Im. f /. Since fw1 ; w2 ; : : : ; ws g spans Im. f /, there exist
1 ; 2 ; : : : ; s in K such that
5
f .v/ D 1 w1 C 2 w2 C C s ws
D 1 f .u1 / C 2 f .u2 / C C s f .us /
D f .1 u1 C 2 u2 C C s us /:
v D 1 u1 C 2 u2 C C s us :
so (5.10) holds.
Next, if Ker. f / ¤ f0E g, then q > 0, and hence there exists a basis fv1 ; v2 ; : : : ; vq g of
Ker. f /. Our goal is to show that the set
B D fu1 ; u2 ; : : : ; us ; v1 ; v2 ; : : : ; vq g
is a basis for E. This will suffice to prove (5.10). First, we show that B spans E. Let v be an
element in E. Then, as above, there exist 1 ; 2 ; : : : ; s in K such that
f .v/ D f .1 u1 C 2 u2 C C s us /:
f .v 1 u1 2 u2 s us / D 0F :
v 1 u1 2 u2 s us D ˇ1 v1 C ˇ2 v2 C C ˇq vq :
This gives
v D 1 u1 C 2 u2 C C s us C ˇ1 v1 C ˇ2 v2 C C ˇq vq ;
5.3 Isomorphism of Vector Spaces
211 5
which shows that B spans E. To prove that B is a linearly independent set, let
1 ; 2 : : : ; s ; ı1 ; ı2 : : : ; ıq be elements of K satisfying
X
s X
q
i ui C ıj vj D 0E : (5.12)
iD1 jD1
! 0 1
X
s Xq
f i ui Cf @ ıj vj A D f .0E / D 0F ;
iD1 jD1
whence
!
X
s X
s
f i ui D i wi D 0F ;
iD1 iD1
X
q
since ıj vj 2 Ker. f /. Therefore,
jD1
1 D 2 D D s D 0K ;
since fw1 ; w2 ; : : : ; ws g is a linearly independent set. Plugging this into (5.12) yields (for the
same reason)
ı1 D ı2 D D ıq D 0K :
Therefore, B is a linearly independent set and hence a basis of E. This completes the proof
of Theorem 5.2.10. t
u
(Continued )
212 Chapter 5 • Linear Transformations
Example 5.7
5 The identity IdE transformation defined in Example 5.2 is an automorphism.
f W R2 ! C
.x; y/ 7! x C iy:
We can easily show that f is an isomorphism. Thus, we deduce directly that dimR C D 2,
since dimR R2 D 2:
We may also easily prove that the vector space Cn over the field R is isomorphic
to R2n .
Proof
It is clear that if f is bijective, then f 1 is bijective. We need to show only that if f is linear,
then f 1 is also linear. So, let w1 , w2 be elements in F and , be elements of K. Then, since
f is bijective, then there exist a unique u1 and a unique u2 in E such that
and so
Now, we have
D f .u1 C u2 /;
5.3 Isomorphism of Vector Spaces
213 5
whence
In the following theorem we show that the composition of two isomorphisms is also an
isomorphism.
The proof of Theorem 5.3.2 is obvious, since the composition of two bijective
transformations is bijective and in Theorem 5.2.1 we have seen that the composition
of two linear transformations is a linear transformation.
We have observed above that in order to prove that a transformation is an
isomorphism, we need to show that this transformation is linear and bijective. This last
requirement can be relaxed under some algebraic assumptions as we will see in the
following theorem.
Theorem 5.3.3
Let E and F be two vector spaces over the same field K and f be a linear transformation
from E into F. Assume that
Proof
First, assume that f is injective. Then according to Theorem 5.2.4, Ker. f / D f0E g. Then
using (5.10) and (5.13), we deduce that
Since Im. f / is a subspace of F, Theorem 4.6.7 together with (5.14) implies that F D Im. f /.
Hence, by Theorem 5.2.9, f is surjective and since we assumed that f is injective, then f is
bijective and therefore, an isomorphism.
214 Chapter 5 • Linear Transformations
Therefore, (5.10) leads to dimK Ker. f / D 0. Then, Ker. f / D f0E g, and hence by
Theorem 5.2.4, f is injective and consequently f is bijective. u
t
5 ⓘ Remark 5.3.4 Theorem 5.3.3 implies that if f is an element in L .E/ (with dimK E
finite), then to show that f is an automorphism, we need just to show that Ker. f / D f0E g
or Im. f / D E.
As we have seen in Chap. 4, it is very useful to find a basis of a vector space, since
this could allow one to infer many properties of the space. Now suppose that we have
two vector spaces E and F over the same field K, f is an isomorphism from E to F, and
B is a basis of E. So the natural question is whether f .B/ is basis of F. This turns out to
be true as we will see in the following theorem.
Theorem 5.3.5
Let E and F be two vector spaces over the same field K and f be an isomorphism from
E into F. Let B be a basis of E. Then f .B/ is a basis of F.
Proof
Let B be a basis of E. Since f is injective and B is a linearly independent set in E, then
according to Theorem 5.2.5, f .B/ is a linearly independent set in F. Clearly, since B spans E,
then f .B/ spans f .E/. Since f is surjective, then we have (see Theorem 5.2.9) f .E/ D F and
thus f .B/ spans F. Consequently, f .B/ is a basis of F. t
u
ⓘ Corollary 5.3.6 Two finite-dimensional vector spaces over the same field are
isomorphic if and only if they have the same dimension.
Proof
First, assume that E Š F. Then, by Theorem 5.3.5, we deduce that
dimK E D dimK F:
dimK E D dimK F D n:
5.4 Exercises
215 5
Then, Theorem 4.6.1 implies that there exists BE D fu1 ; u2 ; : : : ; un g, a basis of E, and BF D
fw1 ; w2 ; : : : ; wn g, a basis of F. We define the transformation f from E to F as follows:
f WE!F
u D 1 u1 C 2 u2 C C n un 7! f .u/ D w D 1 w1 C 2 w2 C C n wn ;
for 1 ; 2 ; : : : ; n 2 K. It is clear that f is linear. Now, let u and v be two elements in E, such
that
f .u/ D f .v/:
u D 1 u1 C 2 u2 C C n un and v D 1 u1 C 2 u2 C C n un :
1 w1 C 2 w2 C C n wn D 1 w1 C 2 w2 C C n wn :
i D i ; i D 1; 2; : : : ; n:
5.4 Exercises
1
A linear transformation satisfying this property is called a projection.
216 Chapter 5 • Linear Transformations
Solution
1. We need to show that Ker. f / Ker. f 2 / and Ker. f 2 / Ker. f /. Let u be an element in
Ker. f /. i.e., f .u/ D 0E . This gives
f 2 .w/ D f .w/ D 0E ;
u D .u f .u// C f .u/:
Thus, u f .u/ 2 Ker. f /. Hence, we have proved the first assertion in (5.15).
Next, let v be an element in Ker. f / \ Im. f /. Then there exists u in E such that
This gives f 2 .u/ D 0E . Thus, u 2 Ker. f 2 /. Since Ker. f / D Ker. f 2 /, then u 2 Ker. f / and we
have v D f .u/ D 0E . Hence, Ker. f /\Im. f / D f0E g and consequently, E D Ker. f /˚Im. f /:
3. Assume that g ı f D f ı g and let w be an element in g.Ker. f //. Then there exists u in
Ker. f / such that w D g.u/. Now, since g ı f D f ı g, we have
because u 2 Ker. f / and g is a linear transformation. Thus, we have shown that w 2 Ker. f /.
Therefore g.Ker. f // Ker. f /.
Now, let z be an element of g.Im. f //. Then there exists w in Im. f / such that z D g.w/.
Since w 2 Im. f /, there exists u in E such that w D f .u/. Thus,
u D u1 C u2 ;
Now, we have
D g. f .u2 // (5.17)
and
Next, it is clear that g.u1 / 2 g.Ker. f //, and since g.Ker. f // Ker. f /, we deduce that
g.u1 / 2 Ker. f / and hence f .g.u1 // D 0: Also, since u2 2 Im. f /, we have g.u2 / 2
g.Im. f // Im. f /, and so there exists u4 in E such that
g.u2 / D f .u4 /:
Consequently, taking all these into account, we can rewrite (5.17) and (5.18) as
and respectively
Exercise 5.2
Consider the endomorphism f W Rn ! Rn defined as
Solution
First, we need to find the subspace Ker. f /. So, let u D .u1 ; u2 ; : : : ; un / be a vector in Rn .
Then u is a vector in Ker. f / if and only f .u/ D 0Rn . This implies that
8
ˆ u1 C un D 0;
ˆ
ˆ
ˆ
< u2 C un1 D 0;
ˆ :::
ˆ
ˆ
5 :̂
un C u1 D 0:
Thus, the set of vectors B D fa1 ; a2 ; : : : ; ap g such that aj ; 1 j p has all components zero
except for the jth component which is 1 and the n j C 1 component which is 1, spans
Ker. f /. We may easily show that B is a linearly independent set, so a basis of Ker. f /. Thus,
if n is even, then
n
dimR Ker. f / D p D :
2
n n
rank. f / D dimR Rn dimR Ker. f / D n D :
2 2
and then
u D u1 b1 C u2 b2 C C uq bq ;
5.4 Exercises
219 5
where bk ; 1 k q, is the vector with all components zero, except for the kth component
which is equal to 1 and the .n .k C 1/ C 2/-nd component, which is equal to 1. As above,
we can easily show that the set of vectors S D fb1 ; b2 ; : : : ; bq g is a basis of Ker. f / and we
have
n1
dimR Ker. f / D q D :
2
n1 nC1
rank. f / D n D :
2 2
J
Solution
First, it is clear that g ı f is an element of L .E; G/, (see Theorem 5.2.1).
1. Let w be an element in Im.g ı f /. Then there exists u in E such that
w D .g ı f /.u/ D g. f .u//:
g. f .u// D g.0F / D 0G
On the other hand, from .2/ we have, for the same reason,
5 Exercise 5.4
Let E be a vector space over a field K such that dimK E is finite and f be an element of L .E/.
Show that the following statements are equivalent:
1. There exist two projections P and Q in L .E/ such that
2. f 2 D 0L .E/ :
Solution
We have seen in Exercise 5.1 that P and Q satisfy P2 D P and Q2 D Q. Now, we need to
show that .1/ ) .2/ and .2/ ) .1/.
First, assume that .1/ is satisfied and let u be an element of F. Then, we have
Now, it is clear that P.u/ 2 Im.P/ D Im.Q/. Then, we have Q.P.u// D P.u/.2 By the same
argument P.Q.u// D Q.u/. Consequently, taking these into account, we have from above
that f . f .u// D 0E for all u in E. This means that f 2 D 0L .E/ . Thus, we have proved that
.1/ ) .2/.
Conversely, since dimK E is finite, then (see Exercise 4.8), Ker. f / has a complement in
E. Hence, there exists a projection P such that Im.P/ D Ker. f /.
We define Q as Q D P f so f D P Q. To show that Q is a projection, we need to
prove that Q2 D Q. We have
Q2 D .P f / ı .P f / D P2 P ı f f ı P C f 2 :
Q2 D P P ı f f ı P: (5.24)
2
Since Q is a projection, we have Q.y/ D y for all y in Im.Q/.
5.4 Exercises
221 5
Now, since Im.P/ Ker. f /, then we deduce that f ı P D 0L .E/ . Also, since f 2 D 0L .E/ ,
we have
This gives
whence P ı f D f . J
Q2 D P P ı f f ı P D P f D Q:
Thus, Q is a projection.
Now, we need to show that Im.P/ D Im.Q/. That is Im.P/ Im.Q/ and Im.Q/
Im.P/. So, first, let w be an element of Im.P/ D Ker. f /. Since w 2 Im.P/, then P.w/ D
w. We also have f .w/ D 0E . On the other hand, we have
whence
u D Q.u/ D P.u/:
This means that u 2 Im.P/. Thus Im.Q/ Im.P/, which concludes our proof.
Exercise 5.5
Let E be a vector space over a field K such that dimK E D n and f be an element of L .E/.
Show that the following statements are equivalent:
1. Ker. f / D Im. f /.
n
2. f 2 D 0L .E/ and rank. f / D .
2
222 Chapter 5 • Linear Transformations
Solution
As usual, we show that .1/ ) .2/ and .2/ ) .1/. First, assume that Ker. f / D Im. f / and
let u be an element of E. Then f .u/ 2 Im. f /, and since Ker. f / D Im. f /, then f .u/ 2 Ker. f /
and so
f . f .u// D f 2 .u/ D 0E :
This gives rank. f / D n=2. Thus, we have proved that .1/ ) .2/.
Conversely, assume that .2/ holds; we need to show that Ker. f / D Im. f /. Let w be an
element in Im. f /, then there exists u in E such that w D f .u/. Now, we have
f k D f ı f ı ı f D 0L .E/ :
„ƒ‚…
k times
Show that if f is nilpotent of index n, then for any u in E satisfying f n1 .u/ ¤ 0E , the set
B D fu; f .u/; f 2 .u/; : : : ; f n1 .u/g is a basis of E.
Solution
Since the cardinality of B equals n, Theorem 4.6.5 shows that it is enough to show that B is
a linearly independent set. First, it is clear that since f n1 .u/ ¤ 0E , the linearity of f shows
that for any 0 k .n 1/, we have f k .u/ ¤ 0E . Now, let 0 ; 1 ; : : : ; n1 be elements in
K satisfying
3
See Exercise 1.2 for the definition of a nilpotent matrix.
5.4 Exercises
223 5
Applying f n1 to this identity, using the linearity of f n1 and the fact that f n .u/ D 0E , we get
f n1 .0 u C 1 f .u/ C 2 f 2 .u/ C C n1 f n1 .u// D f n1 .0E / D 0E : (5.25)
0 f n1 .u/ D 0E :
Since f n1 .u/ ¤ 0E , then 0 D 0K . Next, arguing in the same way, we apply f n2 to (5.25)
and using the fact that 0 D 0K , we obtain 1 D 0K . By continuing the process and applying
each time f n` ; 1 ` n to (5.25), we can show that `1 D 0K : Hence,
0 D 1 D D n1 D 0K :
Deduce that
(5.27)
Solution
1. First, it is clear that f C g 2 L .E; F/ since L .E; F/ is a vector space. It is also obvious
that Im. f /, Im.g/ and Im. f C g/ are subspaces of F (see Theorem 5.2.8). Now, let w be an
element of Im. f C g/. Then there exists u in E such that
Since f .u/ 2 Im. f / and g.u/ 2 Im.g/, we see that w 2 Im. f C g/. This shows that Im. f C g/
Im. f / C Im.g/.
2. Applying formula (4.16) for F1 D Im. f / and F2 D Im.g/, we get
On the other hand, since Im. f C g/ Im. f / C Im.g/, Theorem 4.6.7 shows that
5
rank. f C g/ D dimK Im. f C g/ dimK .Im. f / C Im.g// rank. f / C rank.g/:
Equivalently,
Combining the above two relations, then we obtain the desired result.
4. First, we need to prove (5.26). Since
which is true since Im. f2 ı f1 / Im. f2 / (see the first question in Exercise 5.3). J
227 6
Belkacem Said-Houari
The goal of this chapter is to make a connection between matrices and linear transfor-
mations. So, let E and F be two finite-dimensional vector spaces over the same field K
such that dimK E D n and dimK F D m. Then, according to Theorem 4.6.1, both spaces
have bases. So, let BE D fu1 ; u2 ; : : : ; un g be a basis of E and BF D fw1 ; w2 ; : : : ; wm g
be a basis of F. Let f be an element of L .E; F/. Since BF is a basis of F, for any
uj ; 1 j n in BE , f .uj / is uniquely written as a linear combination of the elements
of BF :
where
aij ; 1 i m; 1jn
are elements of K. It is clear that knowledge of the aij , completely determines the linear
transformation f and allows us to define the matrix associate to f as follows.
.i; j/ 7! aij ;
(Continued )
228 Chapter 6 • Linear Transformations and Matrices
It is clear that the entries of the jth column (1 j n) of this matrix M. f / are the
components of f .uj / in the basis BF . We call M. f / the matrix of f in the bases BE
and BF .
6
Example 6.1
Let f be the linear transformation defined as
f W R2 ! R3 ;
Solution
The standard bases of R2 and R3 are respectively
BR2 D f.1; 0/; .0; 1/g; and BR3 D f.1; 0; 0/; .0; 1; 0/; .0; 0; 1/g:
To find the matrix M. f / associated to the linear transformation f , we need to find the
components of f .1; 0/ and f .0; 1/ in the basis BR3 ; then f .1; 0/ will be the first column of
M. f / and f .1; 0/ will be the second column of M. f /. We have
Thus,
2 3
1
6 7
f .1; 0/ D 4 1 5 :
0
Similarly,
It is clear that
2 3
" # 1
1 6 7
f .1; 0/ D M. f / D 415:
0
0
and
2 3
" # 1
0 6 7
f .1; 0/ D M. f / D 4 2 5:
1
1
Now, a natural question is: how to define matrices associated to the sum and composition
6 of two linear transformations? To answer this question, we may easily show some
properties of these matrices:
Theorem 6.1.1
Assume that E and F are as above and let G be another vector space defined over the
same field K with dimK G D r. Let BG D fv1 ; v2 ; : : : ; vr g be a basis of G. Let f and g
be elements of L .E; F/, h be an element of L .F; G/, and be an element of K. Then,
it holds that
1. M. f C g/ D M. f / C M.g/.
2. M.f / D M. f /:
3. M.h ı f / D M.h/M. f /.
4. If E D F and f is bijective, then M. f / is invertible and M. f 1 / D .M. f //1 .
Proof
1. As above, assume that
and
. f C g/.uj / D f .uj / C g.uj / D .aij C d1j /w1 C .a2j C d2j /w2 C C .amj C dmj /wm :
Consequently,
2 3
a11 C d11 a12 C d12 a13 C d13 ::: a1n C d1n
6 7
6 a21 C d21 a22 C d22 a23 C d23 ::: a2n C d2n 7
M. f C g/ D 6
6 :: :: :: :: :: 7
7
4 : : : : : 5
am1 C dm1 am2 C dm2 am3 C dm3 ::: amn C dmn
D M. f / C M.g/:
6.1 Definition and Examples
231 6
2. In the same way, we can show that M.f / D M. f /.
3. Assume that
X
m
D aij .b1i v1 C b2i v2 C C bri vr /
iD1
! ! !
X
m X
m X
m
D b1i aij v1 C b2i aij v2 C C bri aij vr : (6.2)
iD1 iD1 iD1
Comparing (6.1) and (6.2) and using Theorem 4.4.3, we find that
X
m
ckj D bki aij ; 1 k r;
iD1
which are exactly the entries of the matrix product M.h/M. f / as introduced in Defini-
tion 1.1.11 (with some changes in the indices).
4. If f is an automorphism, then
f ı f 1 D f 1 ı f D IdL .E/ ;
M. f ı f 1 / D M. f /M. f 1 / D M. f 1 /M. f / D I:
The uniqueness of the inverse (Theorem 1.2.3) shows that M. f 1 / D .M. f //1 : t
u
232 Chapter 6 • Linear Transformations and Matrices
Example 6.4
Consider the linear transformation f defined in Example 6.1 and the linear transformation
h W R3 ! R2 defined by
h.w; y; z/ D .w C y z; y C z/:
Solution
We need first to find the matrix M.h/. We have
6
h.1; 0; 0/ D .1; 0/ D 1.1; 0/ C 0.0; 1/; h.0; 1; 0/ D .1; 1/ D 1.1; 0/ C 1.0; 1/
and
so
" #
1 1 1
M.h/ D :
01 1
h ı f W R2 ! R2
Thus, we have
.hıf /.1; 0/ D .2; 1/ D 2.1; 0/C1.0; 1/ and .hıf /.0; 1/ D .0; 3/ D 0.1; 0/C3.0; 1/:
Consequently,
" #
20
M.h ı f / D :
13
J
6.1 Definition and Examples
233 6
f .uj / D M. f /uj :
Example 6.5
Assume, for example, that we know the matrix M. f / defined in Example 6.1. Then
2 3
" # 1
1 6 7
f .1; 0/ D M. f / D 415:
0
0
and
2 3
" # 1
0 6 7
f .0; 1/ D M. f / D 4 2 5:
1
1
So, we have seen above, that for any linear transformation f in L .E; F/ we associate
a unique matrix M. f / in Mmn .K/ and for any matrix in Mmn .K/, we have a
unique linear transformation associated to it. So, clearly, the transformation T defined
as follows:
is a bijection from L .E; F/ to Mmn .K/. In addition, as we have seen in Theorem 6.1.1,
this transformation is linear. Thus, it defines an isomorphism between L .E; F/ and
Mmn .K/. Thus, we summarize these in the following theorem.
234 Chapter 6 • Linear Transformations and Matrices
ⓘ Remark 6.1.4 Since dimK Mmn .K/ D m n, Theorem 6.1.3 shows that,
We have seen that a finite-dimensional vector space may has more than one basis. So,
one question is: what happens to the matrix associated to a linear transformation if
we change bases? Also, we have seen in Chap. 1 that it is much easier to deal with
diagonal matrices, especially when dealing with differential equations or powers of a
matrix or eigenvalues, as we will see in the coming chapters, since diagonal matrices
enjoy nice properties. So, suppose that we have a linear operator f in L .E/. Can we
choose two bases in E in which M. f / is a diagonal matrix? Thus, it is so important to
investigate the effect of a change of bases on the matrix M. f /.
Let E be a vector space over a field K. Denote dimK E D n and let
pWE!E
uj 7! p.uj / D vj ; 1 j n:
1 D . p; B2 ; B1 / D . p1 ; B1 ; B2 ; /: (6.3)
Similarly, if Œuj Bk are the components of uj with respect to the basis Bk ; k D 1; 2, then
Example 6.6
Consider the vector space R3 and the two bases
and
Solution
First, one can easily check that B1 and B2 are bases of R3 . Now, we need to find the
components of vj ; j D 1; 2; 3; in the basis B1 . We easily see that
3 1
v1 D u1 C u2 C 0u3 ;
2 2
v2 D u1 2u2 C 2u3 ;
7 5
v3 D u1 u2 C 3u3 :
2 2
236 Chapter 6 • Linear Transformations and Matrices
Thus, we obtain
2 3
3=2 1 7=2
6 7
D 4 1=2 2 5=2 5 :
0 2 3
Since
2 3 2 3 2 3
1 0 0
6 7 6 7 6 7
Œv1 B2 D 405; Œv2 B2 D 415; Œv3 B2 D 405;
0 0 1
we have
2 3
2=7 8=7 9=7
6 7
1 D 4 3=7 9=7 4=7 5 :
2=7 6=7 5=7
Example 6.7
Consider P2 to be the set of all polynomials with real coefficients and of degree less than or
equal to 2. We have seen in Example 4.19, that
B1 D f1; x; x2 g
B2 D f1; 1 C x; 1 C x C x2 g:
6.2 Change of Basis and Similarity
237 6
Solution
1. By Theorem 4.6.5, it is enough to show that B2 is a linearly independent set. So, let ˛; ˇ
and be real numbers satisfying, for all x 2 R,
˛ C ˇ.1 C x/ C .1 C x C x2 / D 0:
Taking some particular values of x, such as .x D 0; x D 1; x D 1/; we get the system of
equations
8
ˆ
< ˛ C ˇ C D 0;
˛ C D 0;
:̂
˛ C 2ˇ C 3 D 0:
It is clear that this system has the unique solution ˛ D ˇ D D 0: Thus, B2 is a linearly
independent set and hence a basis of P2 .
2. To find the transition matrix from B1 to B2 , we need to find the components of the
elements of B2 with respect to B1 . We have
1 D 1 C 0x C 0x2 ;
1 C x D 1 C x C 0x2 ;
1 C x C x2 D 1 C 1x C 1x2 :
Thus,
2 3
111
6 7
D 40 1 15:
001
Now, let E and F be two finite-dimensional vector spaces over the same field K.
Denote dimK E D n and dimK F D m. Let B be a basis of E and let S1 and S2 be
6 two bases of F. Let f be a linear transformation from E to F and let M. f ; B; S1 / be the
corresponding matrix of f with respect to the bases B and S1 . Then, the question is: how
to find M. f ; B; S2 /?
f p
(E, B) (F, S1) (F, S2)
p◦f
Let p be the linear operator that transform the elements of S1 into the element of S2 as
shown above and . p; S1 ; S2 / be its corresponding matrix. Then, Theorem 6.1.1, yields
M. f ; B; S2 / D M. p ı f ; B; S2 / D . p; S1 ; S2 /M. f ; B; S1 /: (6.4)
Example 6.8
Consider the linear transformation f defined as
f W R2 ! .R3 ; B1 /
.x; y/ 7! .x y; x C y; y/;
where in R2 we consider the standard basis. Let p be the linear operator defined from R3 to
R3 as
p.uj / D vj ; j D 1; 2; 3;
where uj , vj and B1 are given in Example 6.6. Find M. pıf /, the matrix associated to p ıf .
Solution
First, we need to find M. f /. So, we need to find the components of f .1; 0/ and f .0; 1/ with
respect to the basis B1 . We have
and
M. p ı f / D M. f /
2 32 3 2 3
3=2 1 7=2 1 1 3=2 3
6 76 7 6 7
D 4 1=2 2 5=2 5 4 0 2 5 D 4 1=2 1 5 :
0 2 3 0 1 0 1
Now, let E and F be two finite-dimensional vector spaces over the same field K,
with dimK E D n and dimK F D m. Let B1 and B2 be two bases of E and S1 and S2
be two bases of F. Let f be a linear transformation from E into F and M. f ; B1 ; S1 /
be the corresponding matrix to f with respect to the bases B1 and S1 . We want now
to find M. f ; B2 ; S2 /, the corresponding matrix of f with respect to B2 and S2 . So, let
. p1 ; B1 ; B2 / be the transition matrix from B1 to B2 and . p2 ; S1 ; S2 / be the transition
matrix from S1 to S2 , as shown in the following diagram:
p−1
1 f p2
(E, B2) (E, B1 ) (F, S1 ) (F, S 2 )
p2 ◦ f ◦ p−1
1
M. f ; B2 ; S2 / D M. p2 ı f ı p1
1 ; B2 ; S2 /
D . p2 ; S1 ; S2 // M. f ; B1 ; S1 / . p1
1 ; B1 ; B2 /; (6.5)
1
D . p2 ; S1 ; S2 // M. f ; B1 ; S1 / . p1 ; B1 ; B2 /;
M. f ; B1 ; S1 / D 1 . p2 ; S1 ; S2 // M. f ; B2 ; S2 / . p1 ; B1 ; B2 /:
Two matrices satisfying (6.5) are called equivalent and we may give a general definition
of two equivalent matrices as follows.
240 Chapter 6 • Linear Transformations and Matrices
B D RAS:
M. f ; B1 ; B1 / D 1 . p1 ; B1 ; B2 // M. f ; B2 ; B2 / . p1 ; B1 ; B2 /:
It is clear that two similar matrices are equivalent. As we have seen above, similar
matrices represent the same endomorphism with respect to two different bases. The
matrix P is also called a change of bases matrix.
As indicated in the beginning of Sect. 6.2, one of the main goals of the change of
bases is to transform matrices to diagonal form. Formula (6.6) is a very important tool to
obtain diagonal matrices from some square matrices called diagonalizable matrices; the
process is known as diagonalization of matrices. We will see in the coming chapters that
in this case A and B have something in common. For instance, similar matrices share the
same eigenvalues. We will come back to this in details in the next chapter, but just to
clarify the ideas, we give the following example.
Example 6.9
Consider in M3 .R/ the matrix
2 3
200
6 7
A D 40 3 45:
049
6.3 Rank of Matrices
241 6
Take two bases in R3 :
B1 Dfe1 ; e2 ; e3 g; and B2 Dfu1 ; u2 ; u3 g; u1 D.0; 2; 1/; u2 D.1; 0; 0/; u3 D.0; 1; 2/:
Find the matrix B D P1 AP, where P is the transition matrix from B1 to B2 .
Solution
It is clear that since B1 is the standard basis of R3 ,
2 3
0 10
6 7
P D Œu1 ; u2 ; u3 D 4 2 0 1 5 :
1 02
Now, we may easily check, by using the methods in Chap. 1 (for instance), that
2 3
0 2=5 1=5
6 7
P1 D 4 1 2=5 4=5 5 :
0 1=5 2=5
It is clear that B is a diagonal matrix. This method of obtaining B from the matrix A is a very
important topic in linear algebra and linear programming. J
We have defined in Sect. 5.2.1 the rank of a linear transformation to be the dimension
of its image. Also, we have seen in Chap. 6 that any linear transformation between
two finite-dimensional vector spaces can be represented by a matrix. So, now, in order
to define the rank of a matrix, we have to define first a subspace corresponding to the
image of a linear transformation. This subspace is known as the column space of the
matrix.
So, let E and F be two finite-dimensional vector spaces over the same field K
and let f be a linear transformation from E into F, as in Definition 6.1.1. Then,
once bases are chosen in E and F, one can associate a unique matrix to this linear
transformation. Conversely, we have also seen in Remark 6.1.2 that we can always
associate a unique linear transformation to a given matrix. So, let M. f / be the matrix
given in Definition 6.1.1. We define the set
As we have seen in Theorem 5.2.8, this set is a subspace of F. Thus, the rank of M. f / is
the rank of f and it is the dimension of R .M. f //, and we have the following definition.
ⓘ Remark 6.3.1 From above and since R .A/ is a subspace of F, Theorem 4.6.7 shows
that rank.A/ dimK F D m:
In addition, recall from Example 4.12 that the null space N .A/ of the matrix A is
defined as
One of the main goals in Chaps. 1 and 2 was to determine whether a matrix is
invertible or not. One of the requirements for A to be invertible was det.A/ ¤ 0, (see
Theorem 2.4.8). Here we can easily deduce an equivalent invertibility criterion as given
in the following theorem.
Next, we discuss some properties of the rank of a matrix. To clarify the ideas, we assume
that E D Kn , F D Km and, for simplicity, take K D R, but the method works for any
field K. In Rn and Rm we use the standard bases, that is
Let A be a matrix in Mmn .K/ and f be the linear transformation associated to it. It is
clear that the components of f .ej /; 1 j n, with respect to the basis BRm form the jth
column of A. Moreover, the rank of f is the number of the linearly independent columns
of A and we have the following definition.
Definition 6.3.2
Let A be a matrix in Mmn .K/. Then, the rank of A is equal to the largest number of
columns of A which are linear independent.
So, now the question is: how to find the linearly independent columns of a matrix?
If n D m, then we have seen that a square matrix A in Mn .K/ is invertible if and
only if one of the following conditions is satisfied:
1. det.A/ ¤ 0,
2. rank.A/ D n.
So, necessarily these two conditions should be equivalent and thus, we have the
following theorem.
Theorem 6.3.3
Let A be a square matrix in Mn .K/. Then,
rank.A/ D n
Proof
We have seen in Exercise 4.5 that the columns of A are linearly independent if and only if
det.A/ ¤ 0. This means that rank.A/ D n if and only if det.A/ ¤ 0. u
t
244 Chapter 6 • Linear Transformations and Matrices
Theorem 6.3.3 shows that in the case where det.A/ ¤ 0, the problem of finding the
rank of A is completely solved: it is equal to the size of the square matrix A.
Combining Theorem 6.3.3 and Theorem 2.4.8, we easily deduce another invertibility
criterion.
Theorem 6.3.4
Let A be a square matrix in Mn .K/. Then, A is invertible if and only if
rank.A/ D n:
6
We have seen in Theorem 2.3.1 that det.A/ D det.AT /; in particular, det.A/ ¤ 0 if
and only if det.AT / ¤ 0. In this case, and according to Theorem 6.3.3,
Since the columns of AT are the rows of A, then we can say that the rank of A is also the
maximal number of its linearly independent rows. In fact (6.8) is true for any matrix in
Mmn .K/ and the fact that the rank of a matrix A is equal to the rank of its transpose AT
is one of the important theorems in linear algebra. The space R .AT / is also called the
row space of A. We have the following assertion.
Theorem 6.3.5
Let A be a matrix in Mmn .K/. Then we have
rank.A/ D rank.AT /:
Proof
Actually, several proofs of the above theorem are available. Here we present a simple proof
from [16], assuming that K D R (the same proof also works for K D C). First, it is clear that
if Y is a vector in Rn , then
Y T Y D kYk2 D 0; (6.9)
if and only if Y D 0Rn , (Theorem 3.3.1). Hence, using (6.9), we deduce, by taking Y D AX,
that
AX D 0Rm
if and only if AT AX D 0Rn for any vector X in Rn . Using this last property, we see
that the vectors AX1 ; AX2 ; : : : ; AXk are linearly independent if and only if the vectors
AT AX1 ; AT AX2 ; : : : ; AT AXk are linearly independent, where X1 ; X2 ; : : : ; Xk are vectors in Rn .
6.3 Rank of Matrices
245 6
Consequently, we deduce that
Now, we have
It is natural to expect that equivalent matrices share some algebraic properties. So,
do equivalent matrices have the same rank? In fact this turns out to be true.
rank.A/ D rank.B/:
Proof
As we have seen in (6.5), if two matrices are equivalent, then they represent the same linear
transformation, with respect to different bases and so they have the same rank. t
u
ⓘ Corollary 6.3.7 (Similarity and Rank) Let A and B be two similar matrices in
Mn .K/. Then
rank.B/ D rank.A/:
ⓘ Remark 6.3.8 The opposite of Corollary 6.3.7 is not true: The fact that two matrices
have the same rank does not imply that they are similar. For example, the matrices
2 3 2 3
0 0 0 0 0 1 0 0
6 7 6 7
60 0 1 07 60 0 0 07
AD6 7 and BD6 7
40 0 0 15 40 0 0 15
0 0 0 0 0 0 0 0
246 Chapter 6 • Linear Transformations and Matrices
have the same rank equal to 2, but they are not similar. Indeed, we can easily see that
A2 ¤ 0M4 .R/ , whereas B2 D 0M4 .R/ . So, if there exists an invertible matrix P such that
B D P1 AP, then
Here we introduce a method for finding the rank of a matrix A in Mmn .K/. Given A,
we can change bases in Kn in such a way that there exists 1 r min.m; n/ such that
the matrix A can be written (in the new basis) as
" #
Ir 0r.nr/
AD ; (6.11)
0.mr/r 0.mr/.nr/
where 0pq is the p q matrix with all its entries equal to zero and Ir is the r r identity
matrix.
We have seen in Sect. 6.2 that the change of basis is equivalent to the multiplica-
tion of the original matrix by an invertible matrix. Since the form (6.11) can be obtained
by a series of changes of bases, the procedure is equivalent to the multiplication of the
original matrix by a series of invertible matrices. That is, for any matrix A in Mmn .K/,
we can show that there exists a finite sequence of invertible matrices E1 ; E2 ; : : : ; EkCs
such that the matrix
has the form (6.11). It is clear that the matrices on the left-hand side of (6.12) belong to
Mm .K/ and the matrices in the right-hand side belong to Mn .K/. Of course according
to Theorem 6.3.6, this operation does not change the rank of the matrix, since all the
matrices that we obtain by means of such a multiplication will be equivalent.
Now, once the form (6.11) is obtained, then it is clear that rank.A/ D r, and we can
also show that each matrix of rank r is equivalent to the matrix in (6.11).
6.4 Methods for Finding the Rank of a Matrix
247 6
So, the question now is: how to choose the matrices in (6.12) so as to reach the form
(6.11)?
To clarify this, consider the example
" #
1 3 1
AD :
01 7
we get
" #
146
A1 D E1 A D :
017
we get
" #
100
A3 D A2 E3 D :
017
A3 by c3 7c2 , we get
" #
100
A4 D : (6.13)
010
This last operation is also equivalent to multiply A3 from the right by the matrix
2 3
10 0
6 7
E4 D 4 0 1 7 5 :
6 00 1
A4 D E2 E1 AE3 E4 :
It is clear that all the above matrices Ei ; 1 i 4 are invertible. Now, if we put
R D E2 E1 and S D E3 E4 , then we obtain
A4 D RAS:
This means that the matrices A4 and A are equivalent, so they have the same rank. It is
clear that A4 is in the form (6.11), with r D 2. Consequently,
rank.A/ D rank.A4 / D 2:
What we have done above is a series of changes of bases in the vector spaces R2
and R3 to find the appropriate bases in these spaces in which A has the final form (6.13).
Also, as we have seen above, to find these appropriate bases, we need to perform some
row and column operations on the matrix. Basically, these row and column operations
are:
▬ Multiply a row (or a column) through by a nonzero constant.
▬ Interchange two rows (or two columns).
▬ Add a constant times one row to another row (or a constant times one column to
another column).
Example 6.10
Find the rank of the matrix
2 3
0 1 2 1
6 7
A D 4 2 0 0 6 5 :
4 2 4 10
6.4 Methods for Finding the Rank of a Matrix
249 6
Solution
Denote the columns of A and the matrices obtained by means of elementary operations on A
by cj ; 1 j 4, and their rows by ri ; 1 i 3. First, keeping in mind (6.7), we deduce
that rank.A/ 3: Now, our goal is to perform some row or column elementary operations so
as to obtain a matrix of the form (6.11). Interchanging r1 and r2 , we get
2 3
2 0 0 6
6 7
A1 D 4 0 1 2 1 5 ;
4 2 4 10
rank.A/ D rank.A7 / D 2:
ⓘ Remark 6.4.1 As we have seen in Example 6.10 it can be a long process to reach the
form (6.11), especially if the size of the matrix is large. But if on the way we found the
rank of the matrix, then we can stop even before finding the final form (6.11), since our
main goal is to find the rank of the matrix, not writing it as in (6.11). For instance, in
6 Example 6.10, we can easily determine the rank from A2 since in A2 , r2 and r3 are
linearly dependent and r1 and r2 and linearly independent. This gives
rank.A2 / D 2 D rank.A/. So, in the process of finding the form (6.11), by applying
elementary row (or column) operations, it is very helpful to check, after each step, for
linearly independent columns and linearly independent rows, since this together with
the above theorems may help us finding the rank of a matrix quickly without even
reaching the final form (6.11). We illustrate this in the following example.
Example 6.11
Find the rank of the matrix
2 3
2 3 4
6 7
6 3 1 5 7
AD6 7:
4 1 0 1 5
0 2 4
Solution
First, using (6.7), we deduce that rank.A/ 3. Our goal now is to perform some elementary
row and column operations so as to obtain a matrix in the form (6.11). We interchange the
second row r2 and the fourth row r4 , and obtain
2 3
2 3 4
6 0 2 4 7
6 7
A1 D 6 7:
4 1 0 1 5
3 1 5
We see immediately that c3 D 2c2 . Thus, we deduce that rank.A3 / 2. On the other hand,
we see that c1 and c2 are linearly independent. Then rank.A3 / 2 (since the rank is the
maximal number of linearly independent columns). Thus, we deduce that rank.A/ D 2. J
Example 6.12
Find the rank of the matrix
2 3
2 4 1
6 7
A D 4 1 2 0 5 :
0 5 3
Solution
Since A is a square matrix, we may first calculate the determinant of A and get
det.A/ D 29:
In Example 6.12, we have seen that when det.A/ ¤ 0, the problem of finding the
rank of A is easily solved. This approach is more convenient than the elementary row
and column operations in Example 6.10. But up until now, it seems to work only for
invertible square matrices. So, it is natural to look for a possible extension of this
approach to matrices that are not necessarily square or invertible. In fact, it turns out that
this method can be applied to an arbitrary matrix and we have the following theorem.
Proof
Let A be the matrix written in the standard bases of Kn and Km as
2 3
a11 a12 a13 ::: a1n
6a ::: a2n 7
6 21 a22 a23 7
AD6
6 :: :: :: :: :: 7
7:
4 : : : : : 5
am1 am2 am3 : : : amn
Assume first that rank.A/ D r. Then, there exist r linearly independent column vectors
6 v1 ; v2 ; : : : ; vr (without loss of generality, we assume that these vectors are the first r columns
of A). Let B1 D fe1 ; e2 ; : : : ; em g be the standard basis of Km .
First, if r D m n, then B2 D fv1 ; v2 ; : : : ; vm g constitutes a basis of Km and we have
X
m
vj D aij ei ; 1 j mI
iD1
It is clear that is invertible, so det./ ¤ 0. Also, det./ is a minor of A since the matrix
can be obtained from A by removing the last n m columns.
Second, if r < m, since the elements of B2 are linearly independent, Theorem 4.6.8 shows
that there exist vrC1 ; : : : ; vm such that (we are allowed to choose vj D ej ; r C 1 j m)
S D fv1 ; v2 ; : : : ; vr ; erC1 ; : : : ; em g
is a basis of Km . In this case, the transition matrix from B to S takes the form
2 3
a11 a12 : : : a1r 0 ::: 0
6 : : : a2r 0 ::: 07
6 a21 a22 7
6 :: :: :: :: :: :: :: 7
6 7
6 : : : : : : :7
6 7
D6
6 ar1 an2 : : : arr 0 ::: 07 7:
6 7
6 a.rC1/1 a.rC1/2 : : : a.rC1/r 1 ::: 07
6 7
6 :: :: :: :: :: :: :: 7
4 : : : : : : :5
am1 am2 : : : amr 0 ::: 1
6.4 Methods for Finding the Rank of a Matrix
253 6
Since S is a basis of Km , is invertible and det./ ¤ 0: This determinant can be computed
by using the last columns (m r times) as
2 3
a11 a12 ::: a1r
6a ::: a2r 7
6 21 a22 7
det./ D det 6
6 :: :: :: :: 7
7:
4 : : : : 5
ar1 an2 ::: arr
is not zero (det.B/ ¤ 0). Now, to show that v1 ; v2 ; : : : ; vr are linearly independent, we take
1 ; 2 ; : : : ; r in K such that
1 v1 C 2 v2 C C r vr D 0Km ; (6.14)
X
m
vj D aij ei ; 1 j r: (6.15)
iD1
Taking into account (6.15), then (6.14) can be expressed as a linear system of the form
8
ˆ a11 1 C a12 2 C C a1r r D 0;
ˆ
ˆ
ˆ
< a21 1 C a22 2 C C a2r r D 0;
ˆ :::
ˆ
ˆ
:̂
am1 1 C am2 2 C C amr r D 0:
Example 6.13
Use Theorem 6.4.2 to find the rank of the matrix
2 3
2 3 4
6 7
6 3 1 5 7
AD6 7:
4 1 0 1 5
6 0 2 4
Solution
First, according to (6.7), it is clear that rank.A/ 3. We may easily check that the
determinants of all the 3 3 submatrices are zero. On the other hand, we have
" #
3 1
det D 1 ¤ 0:
1 0
Thus, rank.A/ D 2. J
Example 6.14
Consider the matrix
2 3
a 2 1 b
6 7
A D 4 3 0 1 4 5 ;
5 4 1 2
Solution
1. Using (6.7), we deduce that rank.A/ 3: To show that rank.A/ 2 and according to
Theorem 6.4.2, it suffices to find a nonzero minor of order 2. Indeed, we have
" #
30
det D 12 ¤ 0:
54
6.5 Exercises
Exercise 6.1
Consider in M3 .R/ the matrix
2 3
a0b
6 7
A D 4b a 05:
0ba
Solution
First, if a D b D 0, then rank.A/ D 0. Also, by (6.7), rank.A/ 3. On the other hand, we
have
det.A/ D a3 C b3 :
Hence, if a ¤ b, then det.A/ ¤ 0, and Theorem 6.3.3 shows that rank.A/ D 3.
Now, if a D b ¤ 0, then A becomes
2 3
a 0 a
6 7
A D 4 a a 0 5 :
0 a a
256 Chapter 6 • Linear Transformations and Matrices
Clearly the second and third columns are linearly dependent. Hence, rank.A1 / 2. On the
other hand, since a ¤ 0, then the first and second columns are linearly independent. Thus,
rank.A1 / 2. Consequently, if a D b ¤ 0, then rank.A/ D rank.A1 / D 2: J
6 Exercise 6.2 (A Property of a Matrix of Rank 1)
1. Let A and B be two matrices in Mmn .K/. Show that if rank.B/ D 1, then
2. Let A be a square matrix in Mn .K/ and B be a matrix in Mnm .K/. Show that if A is
invertible, then
rank.AB/ D rank.B/:
Solution
1. Since any matrix can be represented by a linear transformation and its rank is the rank
of this transformation, then all the properties for rank and nullity of linear transformations
obtained in Chap. 5 remain true for matrices. So, it is clear from Exercise 5.7 that
2. Deduce that for any two matrices A in Mmr .K/ and K in Mrn .K/
Solution
1. Inequality (6.20) is a direct consequence of (5.27).
2. Applying the Frobenius inequality for p D r, C D K and B D Ir , we obtain
null.A/ D n rank.A/:
rank.A/ D tr.A/:b
6
3. Deduce that if A is an idempotent matrix in Mn .K/ with rank.A/ D n, then A D In .
4. Show that if A is idempotent with rank.A/ D r, then
rank.In A/ D n r:
a
Idempotent matrices are the matrices associated to projections.
b
In fact if A2 D kA, then we have tr.A/ D k rank.A/.
Solution
1. Since rank.A/ D r, then there exists r linearly independent column vectors B1 ; B2 ; : : : ; Br
of A. Then, these vectors form a basis of the column space R .A/. We introduce the matrix B
whose columns are the vectors B1 ; B2 ; : : : ; Br as
B D ŒB1 ; B2 ; : : : ; Br :
It is clear that B is in Mnr .K/. Since B1 ; B2 ; : : : ; Br form a basis, they are linearly
independent. Hence, rank.B/ D r.
Now, it is clear that any column of A, say the ith column Ai , may be expressed as
Ai D BCi
6.5 Exercises
259 6
where Ci is the vector of the coefficients of the linear combination of B1 ; B2 ; : : : ; Br that gives
Ai . Denoting
C D ŒC1 ; C2 ; : : : ; Cn
and
A D ŒA1 ; A2 ; : : : ; An ;
then we have
A D BC:
Hence, rank.C/ D r.
2. Now, since A is idempotent, we have
rank.CB/ rank.A/ D r:
rank.CB/ D r;
so CB is invertible (Theorem 6.3.2). Hence, multiplying (6.23) from the left by .CB/1 C and
from the right by B.CB/1 , we obtain
CB D Ir :
5. We have
.A C B/2 D A2 C B2 C AB C BA D A2 C B2 D A C B;
6. We may easily check that A2 D A and thus A is idempotent. Then applying (1), we get
rank.A/ D tr.A/ D 2 C 3 3 D 2:
Exercise 6.5
Let f be an endomorphism of R3 whose matrix with respect to the standard basis B D
fe1 ; e2 ; e3 g is
2 3
0 1 0
6 7
M. f / D 4 0 0 1 5 :
1 3 3
3. Find the transition matrix from the basis B to the basis S and find 1 .
Solution
First, we have rank. f / D dimR Im. f / D rank.M. f //: Now, we have by a simple computation
det.M. f // D 1 ¤ 0:
Im. f / D R3 :
Now Theorem 5.2.9 shows that f is surjective and hence, applying Theorem 5.3.3, we deduce
that f is bijective and therefore an automorphism.
Applying Theorem 6.1.1, we have
M. f 1 / D .M. f //1 :
So, we can easily, by using the methods in Chap. 1 or Chap. 2, find that
2 3
3 3 1
6 7
.M. f //1 D 41 0 05:
0 1 0
2. It is clear that
2 3 2 3
u v
6 7 6 7
f .u; v; w/ D M. f / 4 v 5 D 4 w 5;
w u 3v C 3w
so
So, s2 D .u2 ; u2 C 1; 2 C u2 /, and we can choose s2 D .0; 1; 2/. By the same method and
since f .s3 / D f .u3 ; v3 ; w3 / D s2 C s3 , we obtain the system of equations
6 8
ˆ
< v3 D u3 ;
w2 D 1 C v3 ;
:̂
u3 3v3 C 3w3 D 2 C w3 :
It is clear that this matrix is invertible. To find 1 , we simply need to find the components
of ej ; j D 1; 2; 3 with respect to the basis S. We have
8
ˆ
< s1 D e1 C e2 C e3 ;
s2 D e2 C 2e3 ;
:̂
s3 D e3 :
Consequently, we obtain
2 3
1 0 0
6 7
1 D 4 1 1 0 5 :
1 2 1
J
6.5 Exercises
263 6
Exercise 6.6
Let A be a matrix in Mmn .K/ and B be a matrix in Mnm .K/. Show that if m > n, then
det.AB/ D 0:
Solution
First, using (5.21) together with Theorem 6.1.1, we deduce that
Also, keeping in mind (6.7), we have rank.A/ min.n; m/. Since m > n, we deduce that
rank.A/ n. Consequently, applying (6.24), we obtain
Since AB is a square matrix in Mm .K/, Theorem 6.3.3 and the fact that rank.AB/ ¤ m imply
that det.AB/ D 0: J
AX D b; (6.25)
where A is a matrix in Mmn .K/, X is a vector in Mn1 .K/, and b is a vector in Mm1 .K/.
The system (6.25) is said to be consistent (or solvable) if it possesses at least one solution.
1. Show that system (6.25) is consistent if and only if
h i
rank A b D rank.A/:
is inconsistent.
264 Chapter 6 • Linear Transformations and Matrices
Solution
1. First, if the system is consistent, then it has at least one solution. Then, in this case b should
be a linear combination of the columns of A, that is
b D x1 A1 C x2 A2 C C xn An ;
h i
where A1 ; A2 ; : : : ; Am are the columns of A. Therefore, the rank of A b is the number of
linearly independent elements from the set fA1 ; A2 ; : : : ; An g, which is exactly (by definition)
the rank of A. h i
Conversely, assume that rank A b D rank.A/; then b is a linear combination of the
6 columns of A, and the coefficients of this combination provide a solution to (6.25). This
means that (6.25) is consistent.
2. We have
2 3
1 2 0
6 7
det 4 2 3 7 5 D 0;
1 4 2
and
" #
1 2
det D 7 ¤ 0:
2 3
we have
2 3
1 2 11
6 7
det 4 2 3 2 5 D 174 ¤ 0:
1 4 7
h i
Consequently, rank A b D 3. Hence, the system (6.26) is inconsistent. J
is invertible.
Solution
To show that A is invertible it is equivalent to prove that the null space
is f0Cn g, since in this case rank.A/ D n and Theorem 6.3.2 shows that A is invertible. Assume
that A is not invertible. Then, there exists a vector X in N .A/ with
2
3
x1
6x 7
6 27
XD6 7
6 :: 7 ¤ 0Cn :
4 : 5
xn
It is clear that since X ¤ 0Cn , then jxi0 j > 0: On the other hand, since X is in N .A/, we have
AX D 0Cn . This implies that for all i D 1; : : : ; n, we have
X
n
aij xj D 0:
jD1
X
n
ai0 j xj D 0;
jD1
which yields
ˇ X ˇ X X
ˇ ˇ
jai0 i0 xi0 j D ˇ ai0 j xj ˇ jai0 j j jxj j jxi0 j jai0 j j:
j¤i0 j¤i0 j¤i0
266 Chapter 6 • Linear Transformations and Matrices
Consequently, (6.27) is not satisfied for i D i0 . Thus, we deduce that if (6.27) holds, then A
has to be invertible.
For the application, we have for the matrix A,
j3j > j1j C j0j; j 3j > j1j C j1j; j4j > j 1j C j2j:
6
Consequently, A is invertible. The converse of the above result is not true. For instance, the
matrix
2 3
3 2 0
6 7
4 1 4 4 5
1 2 4
1. Show that there exists two elements and in K such that the two matrices P and Q
defined as:
Find ˛ and ˇ such that A satisfies (6.28) and deduce Ak for any integer k.
6.5 Exercises
267 6
Solution
1. We need to find and such that P2 D P and Q2 D Q. Indeed, we have
1
where we have used (6.28). Thus, P2 D P if and only if D . By the same argument,
˛ˇ
1
we can show that Q2 D Q if and only if D . Now, it is clear that
˛ˇ
1 1
PCQD .A ˛In / C .A ˇIn / D In :
˛ˇ ˛ˇ
D ˇP C ˛Q:
Ak D ˇ k P C ˛ k Q:
˛Cˇ 1
A1 D In A:
˛ˇ ˛ˇ
In terms of P and Q, the above formula can be rewritten as
1 1
A1 D P C Q:
ˇ ˛
whence
2 32 3 2 3
˛ m m2 ˇ m m2 000
6 76 7 6 7
4 1=m ˛ m 5 4 1=m ˇ m 5 D 4 0 0 0 5 :
1=m2 1=m ˛ 1=m2 1=m ˇ 000
That is,
2 3
˛ˇ C 2 m.˛ C ˇ C 1/ m2 .˛ C ˇ C 1/ 2 3
6 ˛CˇC1 7 000
6 ˛ˇ C 2 m.˛ C ˇ C 1/ 7 6 7
6 6
4 m 7 D 40 0 05:
5
˛CˇC1 ˛CˇC1 000
˛ˇ C 2
m2 m
˛ˇ C 2 D 0 and ˛ C ˇ C 1:
That is, ˛ D 2 and ˇ D 1. Hence, the matrices P and Q now becomes
2 3
2=3 m=3 m2 =3
1 6 7
P D .A 2I3 / D 4 1=.3m/ 2=3 m=3 5
3 2
1=.3m / 1=.3m/ 2=3
and
2 3
1=3 m=3 m2 =3
1 6 7
Q D .A C I3 / D 4 1=.3m/ 1=3 m=3 5 :
3
1=.3m2 / 1=.3m/ 1=3
Belkacem Said-Houari
7.1 Definitions
In the previous chapters, we have defined some numbers associated to a matrix, such as
the determinant, trace, and rank. In this chapter, we focus on scalars and vectors known
as eigenvalues and eigenvectors. The eigenvalues and eigenvectors have many important
applications, in particular, in the study of differential equations.
Let E be a vector space over a field K with dimK E D n, and f be an endomorphism
in L .E/. We have seen in Example 6.9 that it might be possible to find a basis of E
in which the matrix M. f / associated to f is diagonal. So, the question is: Does such
basis always exist, and if so, how can we find it? In addition, is the matrix M. f / always
diagonal in this basis? One of the main goals here is to answer these questions.
Proof
Assume that there exists another eigenvalue associated to u and satisfying (7.1). Then
u D u:
This gives, . /u D 0E . This yields, by using Theorem 4.2.1, D 0K , which shows
that D . This gives the uniqueness of the eigenvalue. t
u
We have seen above that the eigenvalue associated to the eigenvector u is unique.
On the other hand, the eigenvector u is not unique. In fact, if u is an eigenvector, then
for any ˛ ¤ 0K , ˛u is also an eigenvector, since
Moreover, if u and v are two eigenvectors associated to the same eigenvalue , then
u C v is also an eigenvector associated to , since
Definition 7.1.2
Let be an eigenvalue of f as in Definition 7.1.1. Then the eigenspace V./ is
defined to be the set of all u in E such that f .u/ D u:
Theorem 7.1.2
Let f be an endomorphism as in Definition 7.1.1. Let 1 and 2 be two eigenvalues of
f such that 1 ¤ 2 . Then we have
Proof
Let u be an element of V.1 / \ V.2 /. Then we have
f .u/ D 1 u D 2 u:
Example 7.1
Consider the endomorphism f defined as follows:
f W R3 ! R3 ;
We have
Example 7.2
Consider the endomorphism f defined as follows:
f W R3 ! R3 ;
.x; y; z/ 7! .x y; x C y C z; 3z/:
Solution
Let be an eigenvalue of f , i.e.,
D 0; and x D y; z D 0
D 2; and x D y; z D 0;
D 3; and y D 2x; z D 3x:
V.1 / D f.x; x; 0/g; V.2 / D f.x; x; 0/g; V.3 / D f.x; 2x; 3x/g;
with x 2 R. Clearly, V.1 / is spanned by the vector X1 D .1; 1; 0/, V.2 / by the vector
7 X2 D .1; 1; 0/, and V.3 / by the vector X3 D .1; 2; 3/. Thus, X1 , X2 and X3 are the
eigenvectors associated to the eigenvalues 1 , 2 , and 3 , respectively. J
Solution
We have defined in Exercise 5.1 a projection as an element f in L .E/ that satisfies f ı f D f .
So, let be an eigenvalue of f , i.e.,
f .u/ D u;
This is also equivalent to say that Ker. f IdE / contains u and since u ¤ 0E , then
we deduce that is an eigenvalue of f if and only if Ker. f IdE / ¤ f0E g. Hence,
Theorem 5.2.4 shows that f IdE is not injective, and therefore is not bijective. Hence,
f IdE is not invertible.
So, we have already proved the following statement.
B D fu1 ; u2 ; : : : ; un g
is a basis of E.
Proof
Let ˛1 ; ˛2 ; : : : ; ˛n be n elements of K satisfying
˛1 u1 C ˛2 u2 C C ˛n un D 0E : (7.3)
Now, let
gi D f i IdE ; i D 1; 2; : : : ; n:
274 Chapter 7 • Eigenvalues and Eigenvectors
Applying g1 to (7.3) and using the fact that g1 .u1 / D 0E (since 1 is an eigenvalue of f ) and
we obtain
X
n
.j 1 /˛j uj D 0E : (7.4)
jD2
X
n
we obtain
This gives ˛n D 0K , since all the eigenvalues are distinct. Since the ordering of the
eigenvalues and eigenvectors is arbitrary, we can easily verify that ˛1 D ˛2 D D
˛n1 D 0K . This shows that the set B D fu1 ; u2 ; : : : ; un g is linearly independent, and since
dimK E D n, Theorem 4.6.5 implies that B is a basis of E. t
u
ⓘ Corollary 7.2.3 Let E be a vector space over a field K such that dimK E D n and f be
an endomorphism in L .E/. Then f has at most n distinct eigenvalues.
Proof
Suppose that 1 ; 2 ; : : : ; m are distinct eigenvalues of f . Let u1 ; u2 ; : : : ; um be their
corresponding eigenvectors. Thus, Theorem 7.2.2 shows that u1 ; u2 ; : : : ; um are linearly
independent. Hence, Lemma 4.6.4 shows that m n. t
u
f W R3 ! R3 ;
.x; y; z/ 7! .2x C 4y C 3z; 4x 6y 3z; 3x C 3y C z/: (7.7)
1 D 1; 2 D 3 D 2:
7.2 Properties of Eigenvalues and Eigenvectors
275 7
One of the eigenvalues has a multiplicity 2. We may also show that the eigenspace V.1 /
is spanned by the vector X1 D .1; 1; 1/ and V.2 / D V.3 / is spanned by the vector
X2 D .1; 1; 0/. By Theorem 4.6.3, the set fX1 ; X2 g is not a basis of R3 .
So, we have seen in Theorem 7.2.2, that if E is a vector space over a field K with
dimK E D n and f is an endomorphism of E which has n distinct eigenvalues (all with
multiplicity one), then the associated eigenvectors form a basis of E. On the other hand,
we have shown that for the endomorphism defined in (7.7), the eigenvectors associated
to the eigenvalues of f does not form a basis of R3 , since not all the eigenvalues are
of multiplicity one. Thus, the question now is: when do the eigenvectors associated to
eigenvalues with multiplicities not necessary equal to one form a basis of E? To answer
this question, we define what we call algebraic multiplicity and geometric multiplicity
of an eigenvalue.
For example for the endomorphism defined in (7.7), the eigenvalue 2 D 2 has
algebraic multiplicity 2 and geometric multiplicity 1.
Example 7.4
Show that all the eigenvalues of the endomorphism
f W R3 ! R3 ;
are complete.
276 Chapter 7 • Eigenvalues and Eigenvectors
Solution
First, we can easily show that f has two eigenvalues,
1 D 0; 2 D 3 D 3:
It is clear that 2 has algebraic multiplicity 2 (that is ` D 2). Now, in order for 2 to be
complete, we need to find two independent eigenvectors associated to 2 D 3. That is, we
need to show that the geometric multiplicity is also equal to 2. So, let X D .x; y; z/ be an
eigenvector associated to 2 D 3, i.e.,
f .X/ D 3X:
7 Equivalently,
8
ˆ
< x C y C z D 0;
x C y C z D 0;
:̂
x C y C z D 0:
x C y C z D 0;
or
z D x y:
Consequently,
are two linearly independent eigenvectors associated to 2 D 3. Therefore, the geometric
multiplicity is equal to 2, so indeed 2 D 3 is a complete eigenvalue. J
Example 7.5
Show that the endomorphism
f W R2 ! R2 ;
Theorem 7.2.4
Let E be a vector space over a field K such that dimK E D n and f be an endomorphism
in L .E/ such that all its eigenvalues are complete. Then, the set of eigenvectors
associated to the complete eigenvalues form a basis of E.
Proof
Let 1 ; 2 ; : : : ; ` be the set of complete eigenvalues of f and let ki ; i D 1; 2; : : : ; `; be the
algebraic multiplicities of the i . Then,
k1 C k2 C C k` D n:
dimK V.i / D ki ; i D 1; 2; : : : ; `;
Hence, the union of the bases of V.i /; i D 1; 2; : : : ; ` which consists of all the eigenvectors
of f forms a basis of E. This completes the proof of Theorem 7.2.4. t
u
In this section, we can restate for matrices all the results concerning the eigenvalues of
an endomorphism. As we saw in Chap. 6, to each linear transformation f , one can
associate a unique matrix A D M. f / and vice versa. The eigenvalues, eigenvectors, and
eigenspaces of A D M. f / are then, by definition, the eigenvalues, eigenvectors, and
eigenspaces of f .
278 Chapter 7 • Eigenvalues and Eigenvectors
AX D X: (7.10)
7 Example 7.6
Consider the matrix
2 3
1 b1 c1
6 7
A D 4 2 b2 c2 5 :
3 b3 c3
Find the entries bi ; ci ; i D 1; 2; 3 such that A will have the following eigenvectors:
2 3 2 3 2 3
1 1 0
6 7 6 7 6 7
X1 D 4 0 5 ; X2 D 4 1 5 ; X3 D 4 1 5 :
1 0 1
Solution
By Definition 7.3.1, X1 is an eigenvector of A if
AX1 D 1 X1 ;
Consequently, we obtain
b3 D 3; b1 D c1 ; b1 C b2 D 3; b2 C b3 D c2 C c3 : (7.12)
b1 D 5; b2 D 2; b3 D 3; c1 D 5; c2 D 2; c3 D 3:
Now, from the above systems, we can easily see that 1 D 6; 2 D 4 and 3 D 0, which
are the eigenvalues of A. J
In Examples 7.1 and 7.6, we have found the eigenvalues and eigenvectors simul-
taneously. The third statement in Theorem 7.3.1 separates completely the problem of
finding the eigenvalues of a matrix from that of finding the associated eigenvectors. So,
we can easily find an eigenvalue of a matrix, without needing to know a corresponding
eigenvector.
280 Chapter 7 • Eigenvalues and Eigenvectors
Example 7.7
Find the eigenvalues and the corresponding eigenvectors for the matrices:
2 3
" # 5 01
1 6 6 7
AD and B D 4 1 1 05:
0 5
7 1 0
Solution
1. To find the eigenvalues of A, we use Theorem 7.3.1 and we compute det.A I2 /. We get
" #
7 1 6
det.A I2 / D det D .1 /.5 /:
0 5
1 D 1; 2 D 5:
Now, if
" #
x1
X1 D
x2
Similarly, if
" #
x1
X2 D
x2
7.3 Eigenvalues and Eigenvectors of a Matrix
281 7
is an eigenvector associated to 2 D 5, then we have
" # " #
x1 0
.A 2 I2 / D :
x2 0
Thus,
" #
1
X2 D :
1
Example 7.8
Find the eigenvalues and the corresponding eigenvectors of the matrix
2 3
200
6 7
A D 40 3 45:
049
Solution
As above, we compute
1 D 1; 2 D 2; 3 D 11:
Now, if
2 3
x1
6 7
X1 D 4 x2 5
x3
Similarly, if
2 3
x1
6 7
X2 D 4 x2 5
x3
Thus,
2 3
1
6 7
X2 D 4 0 5 :
0
Example 7.9
Consider the matrix A in M3 .C/ given by
2 3
cos sin 0
6 7
A D 4 sin cos 0 5 ; 0 < < 2:
0 0 1
Solution
We have
det.A I/ D . 1/ 2 cos C 2 C 1 :
we have
J
7
Definition 7.3.2 (Characteristic Polynomial)
Let A be a matrix in Mn .K/. Then, for any in K,
So, it is clear from Theorem 7.3.1 that is an eigenvalue of A if and only if p./ D 0.
That is, if and only if is a zero of p./.
Example 7.10
The characteristic polynomial associated to the matrix A in Example 7.8 is
Or equivalently,
In fact this formula can be generalized to any matrix A in Mn .K/ and we have the
following theorem.
7.3 Eigenvalues and Eigenvectors of a Matrix
285 7
Theorem 7.3.2
Let A be a matrix in Mn .K/. Then, its characteristic polynomial has the form
p./ D .1/n n C.1/n1 tr.A/n1 Can2 n2 Can3 n3 C Ca1 Cdet.A/;
(7.14)
Proof
Let A D .aij /; 1 i; j n. Then, we have
2 3
a11 a12 ::: a1n
6 7
6 a21 a22 ::: a2n 7
p./ D det.A In / D det 6
6 :: :: :: 7:
7
4 : : : 5
an1 an2 : : : ann
Computing this determinant using the cofactor expansion along the first row, we get
2 3
a22 a23 ::: a2n
6 a a33 ::: a3n 7
6 32 7
det.A In / D .a11 / det 6
6 :: :: :: 7 C Qn2 ./;
7
4 : : : 5
an2 an3 : : : ann
where Qn2 is a polynomial of degree n2 with coefficients in K. The determinant appearing
here is the characteristic polynomial of an .n 1/ .n 1/ matrix. So, by induction, we find
We compute
.a11 /.a22 / .ann / D ./n C ./n1 .a11 C a22 C C ann / Ce Qn2 ./;
„ƒ‚…
WD tr.A/
(7.16)
where e
Qn2 is a polynomial of degree at most n 2. Thus, combining (7.15) and (7.16), we
get
The last coefficient in (7.17) can be obtained easily. Indeed, from (7.14), we have
p.0/ D det.A/:
p.0/ D a0 :
X
n Y
n
tr.A/ D i and det.A/ D i :
iD1 iD1
Proof
The eigenvalues of A are the roots of its characteristic polynomial p./. Hence, from
elementary algebra, the polynomial p./ can be factored as
Y
n
p./ D .1/n . i /:
iD1
Xn
Expanding this last formula, we find that the coefficient of n1 is .1/n1 i and the
Qn iD1
constant term is iD1 i : Comparing this with (7.14), then the result follows. u
t
.AT / D .A/:
Proof
This is clear, since
Proof
We prove the statement for diagonal matrices; the same argument applies to triangular
matrices. So, let
2 3
d1 0 0 0
6 7
6 0 d2 0 0 7
6 7
6 0 0 d3 0 7
ADDD6 7;
6 :: :: 7
6 : 7
4 : 5
0 0 0 dn
Then,
2 3
d1 0 0 0
6 7
6 0 d2 0 0 7
6 7
6 7
D In D 6 0 0 d3 0 7:
6 : :: 7
6 : : 7
4 : 5
0 0 0 dn
i D di ; i D 1; : : : ; n:
Example 7.11
Consider the two matrices
2 3 2 3
1 0 0 1 2 5
6 7 6 7
A D 4 0 3 0 5 and B D 4 0 0 9 5:
0 0 4 0 0 5
We have seen in Theorem 6.3.6 that similar matrices have the same rank. So, one
can ask: do similar matrices share the same spectrum? The answer turns out to be
7 affirmative:
Proof
To show (7.18), we need to prove that .A/ .B/ and .B/ .A/. Since A and B are
similar matrices, (see Definition 6.2.3), there exists an invertible matrix P such that
Let be an eigenvalue of A, i.e., there exists X in Kn with X ¤ 0Kn such that AX D X. This
implies that PBP1 X D X. This gives
Now, since X ¤ 0Kn and P1 is invertible, then Y D P1 X ¤ 0Kn and we have BY D Y.
Hence is an eigenvalue of B and Y is its associated eigenvector. This shows that .A/
.B/. Conversely, let be an eigenvalue of B, then there exists Y in Kn such that Y ¤ 0Kn
and BY D Y. This gives P1 APY D Y: This yields
A.PY/ D .PY/:
where we have used the fact that In D PIn P1 , (2.26), and (2.30). t
u
ⓘ Remark 7.3.7 The converse of Theorem 7.3.6 is not true. For example, the two
matrices
" # " #
10 12
A D I2 D and BD
01 01
have the same eigenvalues, but they are not similar. Indeed, assuming that there exists
an invertible matrix P such that
B D P1 AP;
we would have
B D P1 AP D P1 I2 P D I2
AX D X;
this gives
A2 X D A.AX/ D .AX/ D 2 X;
Example 7.12
The matrix
" #
2 1
AD
1 2
Then
" #
2 5 4
A D
4 5
7.4 Diagonalization
We have seen before the importance of diagonal matrices. In this section, we look for
necessary conditions for a square matrix A to be diagonalizable (similar to a diagonal
7 matrix). We start with the following definition.
B D P1 AP
is diagonal.
Example 7.13
The matrix A in Example 7.8 is diagonalizable, since the matrix
2 3 2 32 32 3
10 0 0 2=5 1=5 200 0 10
6 7 6 76 76 7
B D 40 2 0 5 D 41 0 0 5 4 0 3 4 5 4 2 0 1 5 D P1 AP
0 0 11 0 1=5 2=5 049 1 02
with
2 3 2 3
0 2=5 1=5 0 10
6 7 6 7
P1 D 41 0 0 5; P D 4 2 0 1 5 ;
0 1=5 2=5 1 02
is a diagonal matrix.
The diagonalization of a matrix has many important applications. We give here three
of them.
7.4 Diagonalization
291 7
Powers of Diagonalizable Matrices
Suppose we have a square matrix A in Mn .K/ and we want to compute Ak for any k 0.
If A is diagonalizable, i.e., there exist a diagonal matrix B and an invertible matrix P such
that
then, we have
Ak D AA A
„ƒ‚…
k times
dX.t/
D AX.t/; (7.19)
dt
we obtain
dY.t/ dX.t/
D P1
dt dt
D P1 AX.t/
D P1 APY.t/
D BY.t/;
where B D .i /; 1 i n, is the diagonal matrix which has on its main diagonal the
eigenvalues of A. The last system can be rewritten as
8
7 ˆ
ˆ
dy1 .t/
D 1 y1 .t/
ˆ
ˆ
< dt
::
ˆ : ::::::
ˆ
ˆ dyn .t/
:̂ D n yn .t/:
dt
This system is decoupled and each of its equations can be solved separately.
The Solution of a System of Recurrence Sequences
In numerical analysis, we sometimes end up with a system of recurrence sequences, and
if the matrix representing this system is diagonalizable, then it is easy to write each term
of the resulting sequences as a function of n, as we show in the following example.
Example 7.14
Consider the two sequences .un / and .vn / defined for all n in N by the relations
(
unC1 D 2un C vn ;
(7.20)
vnC1 D un C 2vn
Solution
We can write system (7.20) as
" # " # " #
21 un unC1
XnC1 D AXn ; with A D ; Xn D ; XnC1 D :
12 vn vnC1
Consequently,
2 1 3n 1 3n 3 2 1 3n 3
C C " # C
6 2 2 2 2 7 1 6 2 2 7
Xn D An X0 D 6
4 1 3n 1 3n
7 6
5 0 D4 1 3n
7:
5
C C C
2 2 2 2 2 2
1 3n 1 3n
un D C ; vn D C :
2 2 2 2
J
Proof
First, assume that A has n linearly independent eigenvectors u1 ; u2 ; : : : ; un . Then the matrix
P whose columns are these eigenvectors, that is
P D Œu1 ; u2 ; : : : ; un (7.21)
D P1 Œ1 u1 ; 2 u2 ; : : : ; n un
D P1 Œu1 ; u2 ; : : : ; un B
D P1 PB D B;
294 Chapter 7 • Eigenvalues and Eigenvectors
where
2 3
1 0 0 0
6 7
6 0 2 0 0 7
6 7
6 0 7
BD6 0 0 3 7:
6 : :: :: 7
6 : : 7
4 : : 5
0 0 0 n
P D Œv1 ; v2 ; : : : ; vn :
Avi D bi vi ; i D 1; 2; : : : ; n (7.23)
ⓘ Corollary 7.4.2 Let A be a matrix in Mn .K/, all of whose eigenvalues are complete.
Then A is diagonalizable.
Example 7.15
Show that the matrix
2 3
1 2 0 0
6 7
60 1 2 07
AD6 7
40 0 1 25
0 0 0 1
is defective.
Solution
The matrix A has one eigenvalue D 1 with algebraic multiplicity equal to 4: this is easily
seen because A is a triangular matrix. By Definition 7.2.2 the geometric multiplicity of D 1
7.4 Diagonalization
295 7
is not equal to the algebraic multiplicity. Or, more importantly, we can use Corollary 7.4.2
to show that A is not diagonalizable, and thus is defective. Indeed, since D 1 is the only
eigenvalue of A, then assuming that A is diagonalizable, there exists and invertible matrix P
such that
A D PI4 P1 D I4 :
ⓘ Remark 7.4.3 The transition matrix P defined in (7.21) is also called the eigenvector
matrix and it is not unique, since the eigenvectors are not unique. For instance, in
Example 7.13, if we multiply the first column by 2, then we get
2 3
0 10
6 7
Q D 4 4 0 1 5 ;
2 02
B D Q1 AQ:
Proof
First, assume that A and B are diagonalizable, with the same eigenvector matrix P. Then there
exist two diagonal matrices, D1 and D2 , such that
Hence,
AX D X (7.24)
for some in K (we assume that ¤ 0K , since for D 0K the result is trivial). We need to
show that X is also an eigenvector of B. Indeed, we have
7 From (7.24) and (7.25) we deduce that both X and BX are eigenvectors of A sharing the
same eigenvalue (unless BX D 0Kn , which is not the case since ¤ 0K and X ¤ 0Kn ).
If is a simple eigenvalue, then the eigenspace V./ is a one-dimensional vector space, so
necessarily BX D X, for some in K with ¤ 0K (since X and BX must be linearly
dependent). Therefore, X is an eigenvector of B with corresponding eigenvalue . We leave
the case where the eigenvalues are not simple to the reader, since it requires some additional
work.
t
u
Proof
By Definition 1.4.2, the matrix A is symmetric if and only if A D AT . Now, let be an
eigenvalue of A. To show that is real, we need just to prove that N D , where N is the
complex conjugate of . Since is an eigenvalue of A, there exists u in Cn with u ¤ 0Cn such
that Au D u: Then, we can take the conjugate to get AN N u. Since the entries of A are
N u D N
7.4 Diagonalization
297 7
real, we have AN D A. Hence,
N u:
ANu D N
Now, by taking the transpose of the this equation and using Theorem 1.4.1, we get
N uT :
uN T AT D uN T A D N
Next, we multiply both sides of the above equation from the right by u, we get
N uT u:
uN T Au D N
This yields
N uT u:
uN T .u/ D NuT u D N
The proof of Theorem 7.4.6 is beyond the scope of this book and can be found in
advanced linear algebra textbooks.
Example 7.16
Consider the matrix
2 3
13 4 2
6 7
4 4 13 2 5 :
2 2 10
Solution
It is not hard to check that A has the eigenvalues 1 D 9; 2 D 9, and 3 D 18. Consequently,
1 has algebraic multiplicity 2. Now, since A is symmetric, then according to Theorem 7.4.6,
the geometric multiplicity of 1 equal to 2. This can be easily seen by computing the
298 Chapter 7 • Eigenvalues and Eigenvectors
The two vectors u1 and u2 are linearly independent, showing that the geometric multiplicity
of 1 is equal to 2. Hence, A is diagonalizable. J
We have seen above that if a matrix A is complete (that is, all its eigenvalues are
complete), then it is similar to a diagonal matrix (diagonalizable), with the eigenvector
matrix providing the transition matrix. In this case, the matrix satisfies some nice
properties that are inherited from the properties of diagonal matrices. Now, if A is not
diagonalizable (as in the defective case), is it possible for A to be similar to a triangular
matrix, for instance? That is to say, is there an invertible matrix M such that the matrix
B D M 1 AM is triangular? If such a matrix M exists, then we say that the matrix A is
triangularizable. Of course in this case we do not expect to have all the nice properties
of diagonalizable matrices, but at least we can keep some of them. So, in this section, we
will be looking for the necessary assumptions that a defective matrix A should satisfy in
order to be triangularizable. We start with the following definition.
Proof
First assume that the characteristic polynomial pA ./ of A has n zeros in K. We prove by
induction that A is similar to a triangular matrix.
For n D 1, it is clear that a matrix of order 1 is triangular, and thus triangularizable.
Now, assume that all matrices of order n whose characteristic polynomials have n zeros
in K are triangularizable. Let A be a matrix of order n C 1 for which pA has n C 1 zeros in
K. Then there exist at least an eigenvalue in K and an eigenvector Y in KnC1 associated
to it, i.e., Y ¤ 0KnC1 and AY D Y. Now, according to Theorem 4.6.8, there exist vectors
X1 ; X2 ; : : : ; Xn in KnC1 such that the set fY; X1 ; : : : ; Xn g forms a basis in KnC1 . Consider the
matrix P1 with Y; X1 ; : : : ; Xn as its column vectors. That is,
P1 D ŒY; X1 ; : : : ; Xn :
This matrix P1 is the transition matrix from the standard basis of KnC1 to the basis
fY; X1 ; : : : ; Xn g. It is clear that P1 is invertible and we have
2 3
:::
6 7
6 7 " #
6 7 VT
6 7
A1 D P1 AP1 D 6 0
1
7D ;
6: 7 0n1 B
6: 7
4: B 5
0
We have
" #" #" # " #
1 0Tn1 VT 1 0Tn1 VTQ
T D P1
2 A1 P2 D D :
0n1 Q1 0n1 B 0n1 Q 0n1 A2
7
This last matrix is triangular and we have
T D M 1 AM; with M D P1 P2 :
pK ./ D . 1 /. 2 / . n /:
Hence, since pK ./ D pA ./ (Theorem 7.3.6), we deduce that pA ./ has n zeros in K. This
finishes the proof of Theorem 7.5.1. t
u
Example 7.17
Consider the matrix
2 3
2 1 2
6 7
A D 4 15 6 11 5 :
14 6 11
Thus, A has one eigenvalue D 1 with algebraic multiplicity 3. We have only one
eigenvector (up to a constant multiple) associated to , namely
2 3
1
6 7
X D 415:
2
Thus, the geometric multiplicity of A is equal to 1. Hence, A is defective and therefore, not
diagonalizable. But since pA has three zeros (one zero with algebraic multiplicity equal to 3)
in R, then A is triangularizable. One can also verify that
3 2 2 3 2 3
100 11 0 1 0 0
6 7 6 7 6 7
A D M 1 BM; with M D 4 1 3 2 5 ; B D 4 0 1 1 5 ; M 1 D 4 3 1 2 5 :
221 00 1 4 2 3
There are several methods for finding the matrix M. We will not discuss those
methods now. J
According to the theorem of d’Alembert, that says that each polynomial of order n
with coefficients in C has n zeros in C, we deduce from Theorem 7.5.1, the following
corollary.
A2 A3 Ak
eA D I C A C C C C C ::: (7.26)
2Š 3Š kŠ
One of the challenging problems for instance, in the study of differential equations is to
compute (7.26). For a diagonal matrix D D diag.d1 ; d2 ; : : : ; dn /, this is simple and we
obtain
2 3
ed1 0 0 0
6 0 ed2 0 0 7
6 7
6 7
e D6
D
6
0 0 ed3 0 7 D diag.ed1 ; ed2 ; : : : ; edn /:
7
6 :: :: :: 7
4 : : : 5
0 0 0 edn
302 Chapter 7 • Eigenvalues and Eigenvectors
A D PDP1 ;
eA D PeD P1 :
Another case is when we can write the matrix A as the sum of two matrices,
A D D C N;
7 where D is a diagonal matrix and N (called a nilpotent matrix, see Exercise 1.2) satisfies
N k0 D 0;
(here zero is the n n matrix with all its entries equal to zero) for some positive integer
k0 > 0: The above decomposition is also known as the Dunford decomposition.
Here the simplicity of the method depends on the smallness of k0 . In this case, D
commutes with any other square matrix of the same order. In particular, DN D ND.
Thus, applying the formula eDCN D eD eN D eN eD ,1 we obtain
eA D eN eD
N2 Ak0 1
D ICNC CC diag.ed1 ; ed2 ; ; edn /: (7.27)
2Š .k0 1/Š
Example 7.18
Find eA for
" #
ab
AD ;
0a
Solution
The matrix A can be written as A D D C N with
" # " #
a0 0b
DD and ND :
0a 00
1
This formula holds only for commuting matrices.
7.5 Triangularization and the Jordan Canonical Form
303 7
It is clear that N 2 D 0. Thus, applying formula (7.27) we get J
eA D ea .I C N/
" # " #
a 1 b ea bea
De D :
01 0 ea
Example 7.19
Find the Dunford decomposition of the matrix
2 3
110
6 7
A D 40 1 15:
001
Solution
We can simply write A as
2 3 2 3
100 010
6 7 6 7
A D 4 0 1 0 5 C 4 0 0 1 5 D I3 C N:
001 000
Clearly,
2 3
001
6 7
N2 D 4 0 0 0 5
000
and N 3 D 0. Thus, N is nilpotent and NI3 D I3 N. Hence, the above decomposition is the
Dunford decomposition of the matrix A. J
The problem now is to find eA when A is not diagonalizable. It turns out that there are
several methods of how to do this (at least 19 of them, see [19]). One of these methods
is to reduce the matrix to the so-called Jordan canonical form.
We can easily see that a matrix of the form
2 3
1 0 0
6 7
6 0 1 ::: 07
6 7
6 : 7
J./ D 6
6 0 0 :: 07
7 (7.28)
6: 7
6: :: 7
4: : 15
0 0 0
304 Chapter 7 • Eigenvalues and Eigenvectors
Clearly, the matrix N is nilpotent and thus we can compute the exponential of the matrix
J as we have shown above. The matrix J./ is called a Jordan block.
Thus, we know how to compute the exponential of a Jordan block using the Dunford
decomposition. Thus if we can show that there exists an invertible matrix P such that
A D PJP1 ; (7.30)
.A I/k X D 0Kn
for some positive integer k, and also requires some knowledge of the generalized
eigenspace of ,
˚
V./ D Vj .A I/k X D 0Kn ; for some k :
So now, the question is when does the matrix J exists? Here is a theorem that answers
this question.
The proof of Theorem 7.5.3 is very technical, and we omit it here. In fact several
proofs are available; for a discussion of these proofs we refer the reader to the paper
[31] and references therein.
Next, we formulate one of the important theorems in linear algebra.
pA ./ D a0 C a1 C C an n
pA .A/ D a0 In C a1 A C C an An D 0:
306 Chapter 7 • Eigenvalues and Eigenvectors
Proof
First, if A is a diagonal matrix, that is A D diag.1 ; 2 ; : : : ; n /, then
Hence,
D 0:
7 Second, if A is diagonalizable, i.e., there exists a diagonal matrix B such that A D PBP1 ,
then
since pB .B/ D 0:
Third, in general, if A is not diagonalizable, then, according to Theorem 7.5.3, A is
similar to a matrix J of Jordan form. In this case, we have J D PAP1 and the characteristic
polynomial of J can be written as
Since, J is a block diagonal matrix, it suffices to show that pJ .Ji / D 0. Indeed, we have
pJ .Ji / D 0:
Hence, we obtain as before pA .A/ D 0: This completes the proof of Theorem 7.5.4. t
u
7.6 Exercises
307 7
Example 7.20
Consider the matrix
" #
12
AD :
34
Then, we have
pA ./ D 2 5 2:
Moreover,
" #
2 7 10
A D :
15 22
A2 5A 2I2 D 0:
7.6 Exercises
Solution
We need to show that
.AB/X D X:
308 Chapter 7 • Eigenvalues and Eigenvectors
We have
Solution
1. Since rank.A/ D k, then it is clear from Theorem 6.4.2, that all minors of A of order strictly
bigger than k are zero. Hence, in this case, the characteristic polynomial of A reads as
Solution
1. First, if all the eigenvalues of N are equal to 0, then its characteristic polynomial is
pN .N/ D .1/n N n D 0:
NX D X:
Hence,
N k X D k X:
This means that k is an eigenvalue of N k and since N k is the zero matrix, k D 0K . This
yields D 0K .
2. Since N is nilpotent, all its eigenvalues are equal to zero. Then, using (7.3.3), we have
X
n
tr.N/ D i D 0:
iD1
3. Since N is nilpotent, its characteristic polynomial has one root equal to 0, with
algebraic multiplicity n. Then according to Theorem 7.5.1, N is triangularizable. That is,
there exist an invertible matrix P and a triangular matrix T such that N D P1 TP. Since
N and T are similar matrices, they share the same eigenvalues, (Theorem 7.3.6). Since, the
eigenvalues of a triangular matrix are on its main diagonal, the entries of the main diagonal
of T are all equal to zero (because the eigenvalues of N are all equal to zero). Thus, the matrix
T must be strictly upper or lower triangular.
4. Let B1 be a basis of Kn and N be a nilpotent matrix with respect to the basis B1 . Let
B2 be another basis of Kn and P be the transition matrix from B1 to B2 . Then, there exists a
matrix B such that
B D P1 NP:
310 Chapter 7 • Eigenvalues and Eigenvectors
Bk D P1 N k P D 0;
so B is nilpotent. J
det.A C N/ D det.A/:
7
Study first the case where A is invertible.
Solution
1. Assume that A is invertible. Then, A1 N is nilpotent, since if for some k0 we have N k0 D 0,
then (since A1 also commute with N)
A C N D A.In C A1 N/
We claim that
det.In C A1 N/ D 1:
Indeed, we proved in Exercise 7.3 that a nilpotent matrix is similar to a triangular matrix in
which all the entries of the main diagonal are equal to zero. We denote this triangular matrix
by T. Thus, there exists an invertible matrix P such that A1 N D PTP1 . Then, we have
Consequently, the matrix In C A1 N is similar to the matrix In C T, and thus det.A1 N/ D
det.In C T/ (similar matrices have the same determinant). Since In C T is a triangular matrix
7.6 Exercises
311 7
with all the entries on the main diagonal are equal to 1, then, det.In C T/ D 1. This proves
the claim.
If A is not invertible, then det.A/ D 0 (see Theorem 2.4.8). If det.A C N/ ¤ 0, then
A C N is invertible. In addition, .A C N/ commute with N, which is a nilpotent matrix. So,
applying what we have proved above, we find that
Exercise 7.5
Let E D Pn be the vector space of all polynomials with real coefficients and of degree less
or equals to n. Consider the endomorphism f in L .E/ defined for any p in E as
Solution
Let p be an element of E such that p ¤ 0E . Then, for in R,
f . p/ D p;
implies that
or equivalently,
(
b D .2 C /a;
.2 C 2 3/a D 0:
Exercise 7.6
Let A and B be two matrices in Mn .R/. Assume that AB BA D A.
1. Show that for any k in N, we have
Ak B BAk D kAk :
Solution
1. It is clear that the above identity is true of k D 0. Now, for any k in N f0g, we have
7
Ak B BAk D Ak B Ak1 BA C Ak1 BA Ak2 BA2 C Ak2 BA2
X
k1
D Aki1 .AB BA/Ai
iD0
X
k1
D Aki1 AAi
iD0
D kAk :
f W Mn .R/ ! Mn .R/
K 7! KB BK:
f .Ak / D kAk :
Ỳ
pA ./ D . i /ai ;
iD1
Ỳ
mA ./ D . i /mi ; (7.33)
iD1
where 1 mi ai .
7. Use the result in (6) to find the minimal polynomial of the matrix
2 3
3 1 1
6 7
A D 41 0 15:
1 1 2
8. Show that if A and B are similar matrices in Mn .K/, then they have the same
characteristic polynomial.
Solution
1. We have seen that dimK Mn .K/ D n2 . Consequently, any set of n2 C 1 elements in
2
Mn .K/ is a linearly dependent set (Lemma 4.6.4). So, consider the set I; A; A2 ; : : : ; An .
This set contains n2 C 1 elements, so it is linearly dependent. Therefore, there exist
a0 ; a1 ; a2 ; : : : ; an2 C1 elements in K, not all zero, such that
2 C1
a0 I C a1 A C a2 A2 C C an2 C1 An D 0: (7.34)
2 C1
p.x/ D a0 C a1 x C a2 x2 C C an2 C1 xn ;
and
7 Thus, we have
This contradicts the minimality of mA unless r is the zero polynomial. Thus, mA divides p.
5. First, assume that is an eigenvalue of A, then we have AX D X, for some X ¤ 0Kn .
Thus, as we know, that Ak X D k X, for any integer k 0. Hence, for any polynomial p, we
have p.A/ D p./X. In particular, mA .A/ D mA ./X. But since mA .A/ D 0 and X ¤ 0Kn , we
deduce that mA ./ D 0K .
Conversely, if is a root of mA , then according to (4), is also a root of pA , since mA
divides pA . Thus, is an eigenvalue of A.
6. Since pA .A/ D 0, from (4), we deduce that mA divides pA . Therefore, the roots of pA
should be the roots of mA with different multiplicity. Indeed, since pA D qmA , if is a root
of mA , then it is clear that is also a root of pA . Conversely, if is a root of pA , then is an
eigenvalue of A, hence, according to (5), is a root of mA . Thus, the only possibility for mA
is to have the form (7.33).
7. The characteristic polynomial of A is
First, we compute .A 2I/.A I/. If this matrix is the zero matrix, then the minimal
polynomial is mA ./ D . 2/. 1/. Otherwise, mA ./ D . 2/2 . 1/. We may
easily check that .A 2I/.A I/ ¤ 0, so indeed mA ./ D . 2/2 . 1/.
8. Since A and B are similar matrices, there exists an invertible matrix S such that B D
S1 AS. Then
D 0:
So, if there is a minimal polynomial of B of a smaller degree, say mB , then we have, by the
same argument, mB .A/ D 0 which contradicts the minimality of mA . Thus, we conclude that
mA is a minimal polynomial for B, and since the minimal polynomial is unique, we deduce
that mA D mB . J
Ỳ
mA ./ D . i /mi ; (7.36)
iD1
where mi ; 1 i `, is the order (the size) of the largest Jordan block J.i / in the Jordan
canonical form of A.
316 Chapter 7 • Eigenvalues and Eigenvectors
Solution
1. It is clear from the Dunford decomposition (7.29), that N D J.0 / 0 Im0 is a nilpotent
matrix of order m0 . Thus, we have
and N m0 1 ¤ 0. Hence, we have shown that mJ.0 / is the polynomial of the smallest degree
which satisfies mJ.0 / .J.0 // D 0. Thus, mJ.0 / is the minimal polynomial of J.0 /.
2. Let J be the block diagonal matrix
2 3
J.1 / 0 0
6 0 J. / 0 7
6 2 7
7 JD6
6 :: :: :: :: 7 7 (7.37)
4 : : : : 5
0 0 J.` /
A D PJP1 ;
is the minimal polynomial of J and since A and J are similar matrices, they have the same
minimal polynomial (Exercise 7.7). J
Y
n
mA ./ D . i /:
iD1
Is A diagonalizable?
a
As this exercise shows, the minimal polynomial provides another criterion for diagonalizability.
7.6 Exercises
317 7
Solution
1. In this case, according to Exercise 7.8, each Jordan block is of order 1. Hence, the matrix
J defined in (7.37) is diagonal, and it is similar to A. Hence, A is diagonalizable.
2. We compute the characteristic polynomial of A and find that
mA ./ D . 1/m0 ; m0 3:
Solution
1. Suppose that there exists one eigenvalue k of A such that to k there correspond two
Jordan blocks. Let the order of the first block be n1 and that of the second block be n2 , with
n1 n2 . Then the characteristic polynomial of A is
Let mCp be the minimal polynomial of Cp . By the preceding results, mCp divides p. If mCp is
a polynomial of degree r < n, then
7 We have
mCp .Cp / D 0
Moreover,
CpT ei D eiC1 ; 1 i n 1;
where ei is the vector with all components equal to 0K , except for the ith component, which
is 1K . Since Cp and CpT have the same characteristic and minimal polynomials, we have for
e1 D .1K ; 0K ; : : : ; 0K /T that
Exercise 7.11
Consider the matrix
2 3
1 0 0 0
6 7
6 a1 1 0 07
AD6 7;
4 a2 b1 2 05
a3 b2 c1 2
According to the first question in Exercise 7.9, A is diagonalizable if and only if its minimal
polynomial has the form
.A I4 /.A 2I4 / D 0;
or equivalently,
2 3 2 3
0 0 0 0 0000
6 a1 0 0 07 6 7
6 7 60 0 0 07
6 7D6 7:
4 a1 b1 0 0 05 40 0 0 05
a1 b2 C a2 c1 b1 c1 c1 0 0000
We see that circulant matrices have constant values on the main diagonal.
1. Find the eigenvalues and the corresponding eigenvectors of the matrix C.
2. Show that if !j ; j D 1; 2; 3 are the cubic roots of the unity, then the eigenvalues of C are
j D q.!j /; j D 1; 2; 3;
where
q.t/ D a C bt C ct2 ;
p.t/ D t2 C ˛t C ˇ:
Show that there exists a 2 2 circulant matrix such that p.t/ is its characteristic
polynomial and find a polynomial q.t/ such that the eigenvalues 1 and 2 are
7 a
The goal of this exercise is to exhibit the beautiful unity of the solutions of the quadratic and cubic equations,
in a form that is easy to remember, which is based on the circulant matrices. This exercise is based on a result
in [12].
Solution
1. By a simple computation, the eigenvalues of C are
1 D a C b C c; 2 D a C b! C c! 2 ; 3 D a C b!N C c!N 2 ;
p
3
where ! D 12 C 2
i with ! 2 D !;
N ! 3 D 1. The corresponding eigenvectors are
2 3 2
3 2 3
1 1 1
6 7 6 7 6 7
V1 D 4 1 5 ; V2 D 4 ! 5 ; V3 D 4 !N 5 :
1 !2 !N 2
We see that ! is the cubic root of the unity and satisfies ! D e2i=3 .
2. The cubic roots of the unity are 1; !, and !.
N Hence, we have
Now, we can construct the polynomial q.t/ whose coefficients are the entries of the first row
of C, as:
r
˛ ˛2
q.t/ D C t ˇ:
2 4
Now, we have
r r
˛ ˛2 ˛ ˛2
q.1/ D C ˇ and q.1/ D ˇ;
2 4 2 4
Belkacem Said-Houari
ⓘ Remark 8.1.1 We have a similar situation if K D C, the matrix will be then called a
unitary matrix and it enjoys properties similar to those of an orthogonal matrix, but with
respect to the inner product in Cn defined by .u1 ; u2 ; : : : ; un / .v1 ; v2 ; : : : ; vn / D
u1 vN 1 C C un vN n . We will not discuss this here, since all the results on orthogonal
matrices can be easily adapted to the case of unitary matrices.
324 Chapter 8 • Orthogonal Matrices and Quadratic Forms
Example 8.1
The matrices
2 3
" # " # 3=7 2=7 6=7
10 cos sin 6 7
; ; 4 6=7 3=7 2=7 5
01 sin cos
2=7 6=7 3=7
are orthogonal.
It is clear that a permutation matrix is orthogonal since all its row vectors and column vectors
are orthonormal vectors.
A1 D AT ; (8.1)
or equivalently, if
AAT D AT A D In : (8.2)
Proof
To prove the theorem it is enough to show that (8.2) holds. Then, the uniqueness of the inverse
(Theorem 1.2.3) gives (8.1). Let v1 ; v2 ; : : : ; vn be the row vectors of the matrix A. Thus
23
v1
6v 7
6 27
AD6 7
6 :: 7 I
4 : 5
vn
8.1 Orthogonal Matrices
325 8
hence, the columns of AT are v1T ; v1T ; : : : ; vnT . Then, we have
23
v1
6 7
6 v2 7
AAT D 6 7 Œv1T ; v2T ; : : : ; vnT D Œe1 ; e2 ; : : : ; en D In ;
4:::5
vn
where ei ; 1 i n, are the standard unit vectors in Rn . Here we used the fact that the
row vectors of A are orthonormal, that is vi vjT D 0 for i ¤ j and vi viT D 1. By the same
argument, we may show that AT A D In . Thus, (8.2) holds and the proof of Theorem 8.1.2 is
complete. t
u
det.A/ D ˙1:
Proof
Since A is orthogonal, then according to Theorem 8.1.2,
AAT D AT A D In :
as claimed. t
u
In the following theorem we show and important property of the orthogonal matrices
in M2 .R/.
Proof
The first matrix in (8.3) is called the counterclockwise rotation matrix and if D =2, the
second one is called the reflection matrix. First, it is clear that the matrices defined in (8.3)
are orthogonal. Now, let
" #
ab
AD
cd
be a matrix in M2 .R/. By definition, A is orthogonal if and only if its columns and row
vectors of A are orthonormal vectors, i.e., if and only if the following holds:
8 2
ˆ
ˆ a C b2 D 1;
ˆ
< 2
a C c2 D 1;
ˆ 2 2
ˆ c C d D 1;
8 :̂ 2
b C d2 D 1
and
(
ac C bd D 0;
ab C cd D 0:
From the first system, we deduce that there exists and angle such that
or
Now, we can easily show that the inverse of any orthogonal matrix is orthogonal.
Indeed, if A is orthogonal, then
AAT D AT A D In ;
Note that the set of orthogonal matrices is not empty, since it contains at least the matrix
In . Consequently, we have here the algebraic structure of a subgroup (see Exercise 1.11)
and thus we have already proved the following theorem.
We have proved in Theorem 7.4.5, that the eigenvalues of a real symmetric matrix are
all real. Now, we introduce a very important property of the eigenvectors of a symmetric
matrix.
Proof
Let 1 and 2 be two eigenvalues of A with 1 ¤ 2 and let X1 and X2 be corresponding
associated eigenvectors, i.e.,
D .AX2 /T X1
D 2 X2T X1 ;
328 Chapter 8 • Orthogonal Matrices and Quadratic Forms
.1 2 /X2T X1 D 0:
Now, the task is to choose u2 such that it is orthogonal to u1 and has norm equal to 1.
We proceed exactly as we did in Theorem 3.4.1. If we choose
v2 u1
w2 D v2 u1 D v2 proju1 v2 ;
ku1 k2
w2
u2 D ;
kw2 k
since u2 is required to be a unit vector. We see that u1 has the same direction as v1 , but
for u2 we subtracted from v2 the component in the direction of u1 (which is the direction
of v1 ). Now, at this point, the vectors u1 and u2 are set. Now, we need to choose u3 such
that it will not lie in the plane of u1 and u2 , which is exactly the plane of v1 and v2 . So,
we simply need to subtract from v3 any component of u3 in the plane of u1 and u2 . Thus,
we take w3 to be the vector
w3 D v3 proju1 v3 proju2 v3 :
8.1 Orthogonal Matrices
329 8
It is clear that w3 is orthogonal to u1 and u2 . Since u3 is required to be a unit vector, we
choose
w3
u3 D :
kw3 k
This process of choosing u1 ; u2 , and u3 is called the the Gram–Schmidt process and we
may apply the same ideas to any finite number of vectors. So, suppose now that we
have the set S0 D fv1 ; v2 ; : : : ; vk g; k n, of linearly independent vectors. Hence, to
construct the set S1 D fu1 ; u2 ; : : : ; uk g described above, we use the following algorithm:
w1
w1 D v1 ; u1 D ;
kw1 k
w2
w2 D v2 proju1 v2 ; u2 D ;
kw2 k
w3
w3 D v3 proju1 v3 proju2 v3 ; u3 D ; (8.5)
kw3 k
:: ::
: :
X
k1
wk
wk D vk projuj vk ; uk D :
jD1
kwk k
We may easily show that the vectors u1 ; u2 ; : : : ; uk are orthonormal. We see that at the
step k, we subtracted from vk its components in the directions that are already settled.
Now, if S0 contains n vectors, then according to Theorem 4.6.2, it forms a basis of
Rn . Therefore, the set S1 is also a basis of Rn , but it is already an orthonormal basis.
Thus, we have already proved the following theorem.
Example 8.3
Apply the Gram–Schmidt process to the following vectors in R4 :
2 3 2 3
1 1
627 627
6 7 6 7
v1 D 6 7 and v2 D 6 7 :
435 405
0 0
330 Chapter 8 • Orthogonal Matrices and Quadratic Forms
Solution
We follow the Gram–Schmidt process described in (8.5) and define
2 3
p1
14
6 q 7
v1 1 6 27
6 7
u1 D D p v1 D 6 7 7 :
kv1 k 14 6 p3 7
4 14 5
0
v2 u1
proju1 v2 D u1 D .v2 u1 /u1 :
ku1 k2
p
8 We have v2 u1 D 5= 14: Thus,
2 5
3
14
6 5 7
5 6 7
proju1 v2 D u1 D 6
6
7 7;
7
14 4 15
5
14
0
and then
2 3
9
6 14 7
6 9 7
6 7 7
w2 D v2 proju1 v2 D 6 7:
6 15 7
4 14 5
0
Finally,
2 p5 3
27 14
6 14 7
6 27 p 5 7
w2 6 14 7
u2 D D6
6
7
q 7:
7
kw2 k 6 45 5 7
4 14 14 5
0
To be convinced, one can easily verify that u1 and u2 are orthonormal vectors. J
One of the important ideas in linear algebra is to write a real square matrix as the product
of two matrices, one orthogonal and the other one upper triangular. This process is called
8.1 Orthogonal Matrices
331 8
the QR factorization or QR decomposition. So, if A is a square matrix in Mn .R/, we will
show that one can write
A D QR;
where Q is an orthogonal matrix and R is an upper triangular matrix. We will also show
that if A is invertible, then the above decomposition is unique. Actually, several methods
of finding the QR decomposition are available; here we discuss the method based on the
Gram–Schmidt process. We state the following result.
A D QR; (8.6)
Proof
First, let us prove the uniqueness. So, assume that A is invertible and assume that there exists
two orthogonal matrices Q1 and Q2 and two upper triangular matrices R1 and R2 , such that
A D Q1 R1 D Q2 R2 :
Q D Q1 1
2 Q1 D R2 R1 D R:
QT D Q1 D R1 :
Since the inverse of a triangular matrix is triangular of the same type, we deduce that Q and
QT are both upper triangular matrices. Hence, Q is an upper as well as a lower triangular
matrix, and so it is a diagonal matrix. In addition its diagonal entries are strictly positive,
then we have
Q2 D QQT D In :
This means that Q D In and therefore the uniqueness of the inverse gives Q1 D Q2 and
R1 D R2 .
332 Chapter 8 • Orthogonal Matrices and Quadratic Forms
The existence of the decomposition (8.6) follows from the Gram–Schmidt process.
Indeed, let v1 ; v2 ; : : : ; vn be the column vectors of the matrix A. Then, according to the Gram–
Schmidt process, we can form a set of orthogonal vectors u1 ; u2 ; : : : ; un as described in (8.5).
Thus, the matrix Q defined as
Q D Œu1 ; u2 ; : : : ; un
is an orthogonal matrix. Thus, we need to find the matrix R D Œr1 ; r2 ; : : : ; rn with the column
vectors r1 ; r2 ; : : : ; rn such that
It is clear that R is an upper triangular matrix. This completes the proof of Theorem 8.1.8. u
t
Example 8.4
Find the QR decomposition of the matrix
2 3
110
6 7
A D 41 0 15:
011
Solution
Let v1 ; v2 , and v3 be the column vectors of A. Now, we need to find u1 ; u2 , and u3 using the
Gram–Schmidt process. Indeed, we have
2 1
3
p
2
v1 6 7
u1 D D6
4
1
p 7:
5
kv1 k 2
0
Thus,
2 3
1
2
6 7
w2 D v2 proju1 v2 D 4 12 5 :
1
Hence,
2 1
3
p
w2 6 16 7
u2 D D 6 p 7:
kw2 k 4 6 5
p2
6
We have
2 3
1
627
proju1 v3 D .v3 u1 /u1 D 4 12 5 ;
0
and similarly,
2 3
1
6
6 7
proju2 v3 D .v3 u2 /u2 D 4 16 5 :
2
6
Hence,
2 3
23
6 2 7
w3 D v3 proju1 v3 proju2 v3 D 4 3 5:
2
3
Then,
2 3
p1
6 3 7
w3
u3 D D6
4
1
p 7:
5
kw3 k 3
1
p
3
334 Chapter 8 • Orthogonal Matrices and Quadratic Forms
D D S1 AS
is diagonal. Moreover, the matrix S is the eigenvector matrix (the matrix whose column
vectors are the eigenvectors of A). So, in the light of Theorem 8.1.6, we may ask
whether one can choose these eigenvectors to be orthonormal? In other words, is there
an orthogonal matrix S such that the identity
S1 AS D ST AS
holds true? The answer to this question is affirmative and in this case the matrix A is
called orthogonally diagonalizable .
D D S1 AS D ST AS (8.7)
Proof
First, if (8.7) is satisfied, then, A can be written as
A D SDS1 D SDST :
8.1 Orthogonal Matrices
335 8
Hence,
T D ST AS (8.8)
.ST AS/T D ST AT S D ST AS D T T :
ⓘ Lemma 8.1.10 (Schur’s Lemma) Let A be a matrix in Mn .R/ with real eigenvalues.a
Then, there exists an orthogonal matrix S such that the matrix
T D ST AS (8.9)
is upper triangular.
a
If A is a matrix in Mn .C/; then the assumption of real eigenvalues is not needed and we need to use a unitary
matrix instead of the orthogonal one.
Proof
We proceed by induction on the size of the matrix n. For n D 1, A D Œa, therefore, we can
take S D Œ1. Now suppose that the lemma holds true for any .n 1/ .n 1/ (with n 2)
matrix. Let A be an n n matrix. Let 1 be an eigenvalue of A and v1 be an eigenvector
associated to 1 . By dividing v1 by kv1 k, one can assume that v1 is a unit vector. Thus,
according to Theorem 4.6.8, we can extend the set fv1 g to a basis fv1 ; u2 ; : : : ; un g of Rn .
Using the Gram–Schmidt process, we transform this basis into an orthonormal basis B D
fv1 ; v2 ; : : : ; vn g of Rn . Consider the matrix Q whose columns are v1 ; v2 ; : : : ; vn ,
Q D Œv1 ; v2 ; : : : ; vn :
we have
S D QW1 :
Then it is clear that S is orthogonal (since the product of two orthogonal matrices is
orthogonal) and we have
In many applications where linear systems arise, one needs to solve the equation
AX D b, where A is an invertible matrix in Mn .K/1 and X and b are vectors in Kn . The
best way to solve this equation (system) is to replace the coefficient matrix A (through
some row operations) with another matrix that is triangular. This procedure is known as
the Gauss elimination method and is basically equivalent of writing the matrix A in the
form
A D LU;
where L is a lower triangular matrix and U is an upper triangular matrix. The question
now is the following: does such decomposition always exist, and if so, is it unique? To
provide an answer, we start with the following definitions.
1
K here is not necessary R or C.
8.1 Orthogonal Matrices
337 8
Definition 8.1.2 (Principal Minor)
Let A be a matrix in Mn .K/. The principal submatrix of order k (1 k n) is the
submatrix formed by deleting from A n k rows and the n k columns with the
same indices (for example, delete rows 1; 2 and 5 and columns 1; 2 and 5). The
determinant of this submatrix is called the principal minor of order k of A.
Example 8.5
Consider the matrix
2 3
1 20
6 7
A D 4 1 3 4 5 :
2 25
We will see in Sect. 8.2 that the leading principal minors can be used as a test for
the definiteness of symmetric matrices.
A D LU; (8.10)
where L is a lower triangular matrix with 1’s on the diagonal and U is an upper
triangular matrix, if and only if all its leading principal minors are nonzero.
338 Chapter 8 • Orthogonal Matrices and Quadratic Forms
Proof
We first prove the uniqueness. Assume that there exist L1 ; L2 ; U1 , and U2 satisfying the
assumptions in Theorem 8.1.11, such that
A D L1 U1 D L2 U2 :
L1 1
2 L1 U1 U2 D In :
where Ln1 and Un1 are .n 1/ .n 1/ matrices with Ln1 a lower triangular matrix
whose diagonal elements are equal to 1 and Un1 an upper triangular matrix.
Now, consider the two n n matrices
" # " #
Ln1 0.n1/1 Un1 c
LD and UD (8.11)
dT 1 01.n1/ ann dT c
where c and d are vectors in Kn to be determined. Now, the formula A D LU, gives
" # " #
Ln1 Un1 Ln1 c An1 a1
LU D D :
dT Un1 ann aT2 ann
This yields
c D L1
n1 a1 ; and dT D aT2 Un1
T
:
Thus, we have shown that A D LU, with L and U unique and given as in (8.11).
The converse is clear, since if A has the factorization (8.11), then, we may easily show
that if A.k/ is the leading principal minor of A of order k, then A.k/ D L.k/ U .k/ ; where L.k/ and
U .k/ are the principal leading minors of order k of L and U, respectively. Consequently,
Y
A.k/ D ujj ;
1jp
which is nonzero, since U is an invertible matrix. This finishes the proof of Theorem 8.1.11.
t
u
Example 8.6
Find the LU decomposition of the matrix
2 3
12 4
6 7
A D 4 3 8 14 5 :
2 6 13
Solution
First, we compute the leading principal minors of A, obtaining
" #
.1/ .2/ 12
A D detŒ1 D 1; A D det D 2; A.3/ D det.A/ D 6:
38
J
340 Chapter 8 • Orthogonal Matrices and Quadratic Forms
The LU Algorithm
Computers usually solve square linear systems using the LU decomposition, since it
is simpler and less costly (see Remark 8.2.9 below). When such a decomposition is
available, then solving the system
AX D b (8.13)
LY D b (8.14)
for Y. Of course here we get a unique solution Y, since L is invertible. Also, since L is a
lower triangular matrix, then system (8.14) should be solved in the “forward” direction.
That is, if y1 ; y2 ; : : : ; yn are the components of the vector Y, then these components are
found successively in the same order. Once Y is obtained, one solves for X the system
8
UX D Y: (8.15)
Once again the triangular form of the matrix U makes the computations of X from (8.15)
easy. In this case, we solve the system (8.15) in the “backward” direction. That is, the
components of X are found in the order xn ; xn1 ; : : : ; x2 ; x1 . Consequently, (8.14) and
(8.15) yield
AX D L.UX/ D LY D b:
Example 8.7
Solve the system of equations
8
ˆ
< x1 C 2x2 C 4x3 D 1;
3x1 C 8x2 C 14x3 D 0; (8.16)
:̂
2x1 C 6x2 C 13x3 D 1:
Solution
The system (8.16) can be written as
AX D b
with
2 3 2 3 2 3
12 4 x1 1
6 7 6 7 6 7
A D 4 3 8 14 5 ; X D 4 x2 5 and b D 405:
2 6 13 x3 1
8.2 Positive Definite Matrices
341 8
We have seen in Example 8.6 that the matrix A can be written as A D LU, with L and U as in
(8.12). Next, we solve the system LY D b, where Y is the vector in R3 with the components
y1 ; y2 , and y3 . This gives
8
ˆ
< y1 D 1;
3y1 C y2 D 0;
:̂
2y1 C y2 C y3 D 1:
This yields x3 D 2=3; x2 D 13=6 and x1 D 8=3. Hence, the solution of (8.16) is
2 3
8
6 3 7
X D 4 13
6 5
:
2
3
We have seen in Theorem 7.4.5, that real symmetric matrices have only real eigenvalues;
these, of course can be positive or negative. Thus, the question now is what happen
if all the eigenvalues of a symmetric matrix are positive? Does the matrix enjoy
some particular properties? The signs of the eigenvalues of a matrix are important
in applications, for instance in the stability theory of differential equations. So, it is
quite important to determine which matrices have positive eigenvalues. These matrices
are called positive definite matrices. Symmetric positive definite matrices have rather
nice properties; for example, as we will see later on, every positive definite matrix is
invertible. Also, by studying these matrices we will bring together many things that we
have learned about determinants, eigenvalues,: : : We restrict our discussion here to the
case K D R, but all the properties can be easily extended to the case K D C.
Now, we start with the definition.
Example 8.8
The matrix
" #
12
AD
25
is positive definite, since it is symmetric and its eigenvalues 1 and 2 satisfy (see
Corollary 7.3.3)
Notation Here we introduce a new notation which is very useful in this chapter. If X and Y
are two vectors in Rn , we denote
8
hX; Yi D X T Y:
Proof
First assume that A is positive definite. Then, by Theorem 8.1.9, the eigenvectors
v1 ; v2 ; : : : ; vn of A form an orthonormal basis of Rn . Thus, if i is an eigenvalue of A
and vi is an associated eigenvector, then we have
Example 8.9
Show that the matrix
2 3
2 1 0
6 7
A D 4 1 2 1 5
0 1 2
is positive definite.
Solution
To do so, we apply Theorem 8.2.1. So, let
2 3
x1
6 7
X D 4 x2 5 (8.18)
x3
Since
1 1
x2 x1 .x21 C x22 / and x2 x3 .x22 C x23 /;
2 2
it is clear that
if x1 and x3 are not both zero. Otherwise, if x1 D x3 D 0, then x2 ¤ 0 and hence we have
Proof
If A is positive definite, then all its eigenvalues 1 ; 2 ; : : : ; n are necessarily positive. Then,
by Corollary 7.3.3, we have
Y
n
det.A/ D i ¤ 0;
iD1
8
and Theorem 2.4.8 shows that A is invertible.
Now, A1 is symmetric since .A1 /T D .AT /1 D A1 . Also, if is a positive
eigenvalue of A, then 1 is a positive eigenvalue of A1 . Thus, A1 is positive definite. t
u
ⓘ Remark 8.2.3 It is clear that if A is positive definite, then det.A/ > 0. However, if
det.A/ > 0, then A is not necessarily positive definite. For, example, the determinant of
the matrix
" #
1 1
AD
0 1
To prove Theorem 8.2.4, we need the following lemma, known as the Rayleigh–Ritz
theorem, which gives the relation between the eigenvalues of the matrix A and those
of its principal submatrices. We do not prove this lemma; we need it in order to show
Theorem 8.2.4. The reader is refereed to [13, Theorem 8.5.1].
i i iCnk ; i D 1; : : : ; k:
8.2 Positive Definite Matrices
345 8
Proof of Theorem 8.2.4
First assume that A is positive definite. Then we can prove that all its principal minors
are positive. Let Ak be the principal submatrix of order k, and let 1 ; 2 ; : : : ; k be the
eigenvalues of Ak . Since Ak is symmetric, its eigenvalues are real (Theorem 7.4.5). Hence,
1 ; 2 ; : : : ; k can be ordered as 1 2 k , for instance. Applying Lemma 8.2.5,
we deduce that
0 < 1 1 2 k :
(Recall that 1 > 0, since A is positive definite). Hence, all the principal minors of A are
positive. Consequently, in particular its leading principal minors are positive.
Conversely, assume that all the leading principal minors of A are positive. We denote
by Ak the principal leading submatrix of order k. Then, we prove by induction on k.k D
1; 2; : : : ; n/ that the matrix A D An is positive definite. For k D 1, A1 D Œa is positive
definite, since in this case a D det.A/ > 0 (by assumption) and at the same time a is the
eigenvalue of A1 . Now, for k 2 we assume that Ak1 is positive definite and show that Ak
is positive definite. Let
˛1 1 ˛2 ˛k1 k ˛k :
The above formula together with (8.19) show the positivity of ˛2 ; : : : ; ˛k . Since all the
leading principal minors of A are positive, det.Ak / is positive and we have
!
Y
det.Ak / D ˛1 ˛i > 0:
2ik
Thus, ˛1 > 0. Hence, all the eigenvalues of Ak are positive, therefore Ak is positive definite.
We conclude that A is positive definite. t
u
Example 8.10
Use Theorem 8.2.4 to show that the matrix
2 3
3 0 3
6 7
A D 4 0 1 2 5
3 2 8
is positive definite.
346 Chapter 8 • Orthogonal Matrices and Quadratic Forms
Solution
We just need to verify that the principal minors of A are positive. Indeed, we have
" #
.1/ .2/ 30
A D 3; A D det D 3; A.3/ D det A D 3:
01
ⓘ Lemma 8.2.7 Let A and C be two invertible matrices. Then for any matrix B, we have
rank.ABC/ D rank.B/;
provided that the size of the above matrices are chosen such that the product ABC
makes sense.
Proof
We define the matrix K D ABC. Now, our goal is to show that rank.K/ D rank.B/. Applying
(6.24), we deduce that
On the other hand, B D A1 KC1 . Hence, applying (6.24) again, we get
Example 8.11
Show that the matrices
2 3
" # 111
00 6 7
AD and B D 41 1 15
01
111
Solution
First, the matrix A is positive semi-definite since its eigenvalues are 1 D 0 and 2 D 1.
Second, for the matrix B, we have
Example 8.12
Show that for any rectangular matrix A in Mmn .R/, the matrices AT A and AAT are positive
semi-definite.
Solution
First, it is clear that AT A is a square symmetric matrix in Mn .R/. For any nonzero vector X
in Rn ,
Second, it is also clear that AAT is a symmetric matrix in Mm .R/. Thus, for any nonzero
vector Y in Rm , we have
We have seen in Theorem 8.1.8 that if A is a square matrix in Mn .R/, then we can write
A as
A D QR;
8 where Q is an orthogonal matrix and R is an upper triangular matrix.
Also, we have seen in Theorem 8.1.11 that if A is an invertible matrix and if all its
leading principal minors are nonzero, then we can write A as
A D LU;
where L and U are as before. Now, if we assume that the matrix A is positive definite,
then U has to be equal to LT and we write A as the product of a lower triangular
matrix L and its transpose LT . This product is known as the Cholesky decomposition
(or factorization) and it is very useful in numerical analysis. See Remark 8.2.9 below.
A D LLT : (8.21)
a
We obtain uniqueness only if we assume that the diagonal entries of L are positive.
Proof
Let us first prove the uniqueness. Assume that there exist two lower triangular matrices L1
and L2 satisfying (8.21). Then,
L1 LT1 D L2 LT2 :
L1 D LT : (8.22)
Since L is a lower triangular matrix (as the product of two lower triangular matrices), then
L1 is a lower triangular matrix and LT is an upper triangular matrix. Then (8.22) shows that
L is a diagonal matrix and satisfies L2 D In . Since its diagonal entries are positive (keep in
mind that the diagonal entries of L1 and L2 are positive). Then we obtain L D In , and thus
L1 D L2 . This shows the uniqueness of the decomposition (8.21).
To establish the existence of the decomposition (8.21), we proceed by induction on the
size of the matrix A. The statement is trivial for n D 1, since if A D Œa, with a > 0, we can
p
take L D LT D Œ a.
Now, assume that the decomposition (8.21) exists for any .n 1/ .n 1/ matrix. The
matrix A can be written as
" #
An1 b
AD ;
bT ann
where An1 is a leading principal submatrix of A which is positive definite (Theorem 8.2.4),
b is a vector in Rn1 , and ann is a real positive number. By the induction hypothesis, An1
satisfies (8.21). Thus, there exists a unique lower triangular matrix Ln1 with strictly positive
diagonal entries, such that
where c a vector in Rn1 and ˛ > 0 are to be determined. Now, the desired identity
An1 D LLT
leads to
" # " #" #
An1 b Ln1 0.n1/1 LTn1 c
D :
bT ann cT ˛ 01.n1/ ˛
Example 8.13
Find the Cholesky decomposition of the matrix
2 3
25 15 5
6 7
A D 4 15 18 0 5 :
5 0 11
Solution
First, it is clear that A is symmetric. To show that A is positive definite, we need to compute
hX; AXi for any nonzero vector
2 3
x1
6 7
8 X D 4 x2 5
x3
of R3 . We have
1 1
x2 x1 .x21 C x22 / and x1 x3 .x21 C x23 /;
2 2
Now, according to Theorem 8.2.8, there exists a unique lower triangular matrix
2 3
l11 0 0
6 7
L D 4 l21 l22 0 5
l31 l32 l33
Consequently,
2 3
5 00
6 7
L D 4 3 3 05:
1 1 3
ⓘ Remark 8.2.9 The Cholesky decomposition can be used to solve linear systems of the
form
AX D b;
f W Rn ! R
X 7! X T AX D hX; AXi:
This function is called the quadratic form associated to the symmetric matrix A. In
8 Example 8.9, we have
f .X/ D f .x1 ; x2 ; x3 / D 2 x21 x2 x1 C x22 C x23 x2 x3 ; (8.23)
It is clear that the function in (8.24) is a polynomial of degree two and it can be
written as
f .X/ D a11 x21 C a22 x22 C C ann x2n C 2a12 x1 x2 C 2a13x1 x3 C C 2an1;nxn1 xn ;
(8.25)
where x1 ; x2 ; : : : ; xn , are the components of the vector X and aij ; 1 i; j n are the
entries of the matrix A. The terms involving the products xi xj are called the mixed
products, and the matrix A is called the coefficient matrix of the quadratic form f .X/.
We have seen above that the symmetric matrix A is positive definite if the function
f .X/ defined in (8.25) is positive for each nonzero vector X in Rn . One way to see this is
to write the quadratic form (8.25) as the sum of squares. This can be accomplished by
using the spectral theorem (Theorem 8.1.9), as shown in the following theorem.
8.3 Quadratic Forms
353 8
Proof
Putting X D SY, we have
D Y T .ST AS/Y
D Y T DY
where D is the diagonal matrix defined in (8.7). This process is also called the diagonalization
of the quadratic form f .X/. t
u
Example 8.14
Consider the quadratic form
Solution
The quadratic form can be written as f .X/ D X T AX, with
2 3 2 3
x1 4 1 0
6 7 6 7
X D 4 x2 5 and A D 4 1 4 0 5 :
x3 0 0 1
We see that the diagonal entries of A are the coefficients of the squared terms in (8.26) and
the off-diagonal entries are half the coefficient of the mixed product.
The eigenvalues of A are 1 D 1; 2 D 5, and 3 D 3. Thus, f can be written in the
standard form as
J
354 Chapter 8 • Orthogonal Matrices and Quadratic Forms
We have seen in the principal axes theorem (Theorem 8.3.1) that if f .X/ is the quadratic
form associated to a symmetric matrix A, then we can write f .X/ as
In.A/ D . p;
; q/:
8.3 Quadratic Forms
355 8
Example 8.15
Find the inertia of the matrix
2 3
20 6 8
6 7
A D 4 6 3 05:
8 08
Solution
We compute the eigenvalues of A, finding
1 p 1 p
1 D 31 C 385 ; 2 D 31 385 ; 3 D 0:
2 2
Hence, we have p D 2;
D 0, and q D 1: In.A/ D .2; 0; 1/. J
A D ST BS: (8.27)
Theorem 8.3.2
Let A be a symmetric matrix in Mn .R/. Then, A is congruent to the matrix
2 3
Ip 0 0
6 7
D0 D 4 0 I
0 5 (8.28)
0 0 0qq
with
p C
C q D n;
D r p; and q D n r;
In terms of the quadratic form, this means that if there exists an orthonormal basis
fe1 ; e2 ; : : : ; en g of Rn such that the quadratic form associated to the symmetric matrix A
has the form
X
n
f .X/ D ai x2i ;
iD1
p
then there exists an orthogonal basis fv1 ; v2 ; : : : ; vn g with vi D jai jei ; i D 1; 2; : : : ; n,
in which the quadratic form can be written as
X
p
X
f .X/ D x2i x2j :
iD1 jD1
A D SDST ; (8.29)
D D D1 D0 D1 ;
where D0 is the matrix defined in (8.28). Now, by substituting into (8.29), we obtain
A D SD1 D0 D1 ST D QD0 QT ;
with Q D SD1 , which is orthogonal. This completes the proof of Theorem 8.3.2. t
u
A D SIn ST :
f .X/ D X T AX
D X T SIn ST X
D Y T In Y
X
n
D y2i > 0;
iD1
Some properties are preserved under the congruence transformation, among them,
the rank.
rank.A/ D rank.B/:
Proof
Since A and B are congruent, there exists an invertible matrix S such that A D ST BS. Applying
the result of Exercise 5.3 for two matrices C and D, we have
Im.B/ D Im.BS/:
358 Chapter 8 • Orthogonal Matrices and Quadratic Forms
Hence,
rank.B/ D rank.BS/:
D n null.ST BS/
D n null.BS/
8 D n null.B/
D rank.B/:
Finding a matrix S satisfying (8.27) can be quite difficult in practice. So, we need
a test that allows us to quickly determine whether two matrices are congruent or not
without computing the matrix S. This test is given by Sylvester’s law of inertia. This
law asserts that two congruent matrices have the same number of positive eigenvalues,
the same number of negative eigenvalues, and the same number of zero eigenvalues.
Moreover, this law is very useful in the study of the stability of solutions of differential
equations, which usually requires the knowledge of the signs of the eigenvalues of some
symmetric matrices.
First, we start with the following definition.
r D p C
:
s D p
:
8.3 Quadratic Forms
359 8
Hence,
s D 2p r:
Proof
First, assume that A and B are congruent. Then Theorem 8.3.4 implies that rank.A/ D
rank.B/. Now, let p1 be the number of positive eigenvalues of A and p2 be the number of
positive eigenvalues of B. Then to show that In.A/ D In.B/ it is enough to prove that p1 D p2 ,
since the two matrices have the same rank. Now, since A and B are congruent, there exists an
invertible matrix S such that
A D SBST : (8.32)
.1/ .2/
Now, using Theorem 8.3.2, we deduce that there exist two matrices D0 and D0 such that
2 3
Ip1 0 0
.1/ 6 7
D0 D 4 0 Irp1 0 5 D PAP
T
0 0 0.nr/.nr/
and
2 3
Ip2 0 0
.2/ 6 7
D0 D 4 0 Irp2 0 5
0 0 0.nr/.nr/
.2/
with B D QD0 QT , where P and Q are invertible matrices. Plugging these into (8.32), we get
with R D PSQ. Now, assume that p2 ¤ p1 , for instance p2 < p1 . Then, we have to reach a
contradiction. Let X be the nonzero vector in Rn , with its first p1 components are not all zero,
but with its last n p1 components all equal to zero. That is
2
3 2 3
X1 x1
6 0 7 6x 7
6 7 6 27
XD6 7
6 :: 7 ; with X1 D 6
6 :: 7
7
4 : 5 4 : 5
0 xp1
360 Chapter 8 • Orthogonal Matrices and Quadratic Forms
.1/
and X1 ¤ 0Rp1 . Hence, the quadratic form associated to D0 reads
.1/
X
p1
X T D0 X D x2i > 0: (8.34)
iD1
.1/ .2/
X
rp2
X T D0 X D .RT X/T D0 .RT X/ D y2i 0:
jD1
.1/ .2/
This contradicts (8.34). Similarly, interchanging the roles of D0 and D0 , we can prove that
it is impossible to have p1 < p2 . Consequently, p1 D p2 .
Conversely, if A and B have the same inertia, In.A/ D In.B/ D . p;
; q/, then both
matrices are congruent to the matrix
2 3
Ip 0 0
6 7
D0 D 4 0 I
0 5 ;
0 0 0qq
and then they are congruent to each other (since congruence is an equivalence relation). t
u
Theorem 8.3.5, is very interesting since it tells us that we can determine whether two
symmetric matrices are congruent by just computing their eigenvalues as shown in the
following example.
Example 8.16
Show that the matrices
2 3 2 3
202 1 1 1
6 7 6 7
A D 40 6 25 and B D 4 1 5 1 5
224 1 1 5
are congruent.
8.4 Exercises
361 8
Solution
The eigenvalues of A are
p p
1 D 2.2 C 3/; 2 D 4; 3 D 2.2 3/:
1 p 1 p
1 D .7 C 33/; 2 D 4; 3 D .7 33/:
2 2
Thus, In.B/ D .3; 0; 0/. Since In.A/ D In.B/, A and B are congruent. J
8.4 Exercises
Solution
1. We give the proof for the case of a positive definite matrix (the positive semi-definite
case can be proved by the same method). So, assume that A is a positive definite matrix.
Since A is symmetric, then according to the spectral theorem (Theorem 8.1.9), there
exists an orthogonal matrix S such that A D SDST , where D is the diagonal matrix
D D diag.1 ; 2 ; : : : ; n /. Now, since A is positive definite, all its eigenvalues are positive.
Introduce the matrix
p p p
D0 D diag. 1 ; 2 ; : : : ; n /:
A0 D SD0 ST ; (8.35)
362 Chapter 8 • Orthogonal Matrices and Quadratic Forms
we have
Consequently, A0 is the square root of A. In addition, it is clear from above that the eigenval-
p p p
ues of A0 are 1 ; 2 ; : : : ; n , which are positive. Hence, A0 is positive definite.
Conversely, if A0 is positive definite, then its eigenvalues are positive. Since, the eigen-
values of A are the squares of those of A0 , they are also positive, hence, A is positive definite.
2. If A is positive definite, then according to Theorem 8.2.2, A is invertible. Also,
according to (1) A0 is positive definite, hence A0 is also invertible. Consequently,
Theorem 6.3.2 yields
rank.A/ D rank.A0 / D n:
A1 D QD0 QT ;
SD20 ST D QD20 QT ;
whence
This yields, upon multiplying from the left by S and from the right by QT , A1 D A0 .
4. The eigenvalues of A are 1 D 9 and 2 D 1. Hence, A is positive definite, and A has
a positive definite square root matrix A0 satisfying A20 D A. To find A0 , we see first that A
can be written as
A D SDST ;
8.4 Exercises
363 8
where D is the diagonal matrix
" #
90
DD
01
It is clear that A0 is also positive definite, since its eigenvalues are 1 D 3 and 2 D 1. J
That is,
2 3
u1 u1 u1 u2 u1 u`
6u u u u u2 u` 7
6 2 1 2 2 7
GD6
6 :: :: :: :: 7 7:
4 : : : : 5
u` u1 u` u2 u` u`
Solution
1. Let A be the n ` matrix whose columns are the vectors u1 ; u2 ; : : : ; u` . Then we can easily
check that G D AT A.
364 Chapter 8 • Orthogonal Matrices and Quadratic Forms
2. It is clear that G is symmetric, since ui uj D uj ui for all 1 i; j `. Now, to show that
G is positive semi-definite, we can use two approaches. First, using (1) and Example 8.12,
we deduce immediately that G is positive semi-definite.
Second, we can use a direct approach. Indeed, let X be a nonzero vector in R` . Then, by
using the properties of the dot product in R` we have
X̀
X T GX D .ui uj /xi xj
i;jD1
X̀
D ..xi ui / .xj uj //
i;jD1
! 0 1
X̀ X̀
D xi ui @ xj uj A
8 iD1 jD1
2
X̀
D xi ui 0; (8.36)
iD1
X̀
xi ui D 0Rn :
iD1
This is not the case if X is a nonzero vector and the vectors fu1 ; u2 ; : : : ; u` g are linearly
independent. Hence, in this case (8.36) is a strict inequality and therefore G is positive
definite.
Conversely, if G is positive definite, then X T GX > 0 whenever X ¤ 0R` . Hence,
X̀
xi ui ¤ 0R` :
iD1
Since fu1 ; u2 ; : : : ; u` g are the column vectors of A, we have (see Definition 6.3.1)
F.X/ D X T AX 2X T b C c: (8.38)
We say that f has a global minimum point X0 if f .X0 / f .X/ for all X in Rn .
1. Show that if A is positive definite, then the quadratic function F.X/ has a unique global
minimum X0 , which is the solution of the linear system
AX D b; namely X0 D A1 b:
3. Show that if A is positive definite, then the function F is strictly convex, that is
for all X1 and X2 in Rn with X1 ¤ X2 and for any in Œ0; 1. If we have “” instead of
“<” in (8.39), then we say that F is convex.
4. We define the function F by
F W R2 ! R;
a
The quadratic optimization problems appear frequently in applications. For instance, many problems in physics
and engineering can be stated as the minimization of some energy functions.
Solution
1. First, since A is positive definite, Theorem 8.2.2 implies that A is invertible. Hence the
linear system AX D b has a unique solution X0 D A1 b. Now, by replacing b by AX0 , we
write the function F.X/ as
F.X/ D X T AX 2X T b C c
D X T AX 2X T AX0 C c
D .X X0 /T A.X X0 / C .c X0T AX0 /: (8.40)
366 Chapter 8 • Orthogonal Matrices and Quadratic Forms
It is clear that if X ¤ X0 , then the first term on the right-hand side of (8.40) is strictly positive,
since the matrix A is positive definite. Also, this term is zero for X D X0 . Since, the second
term in the right-hand side of (8.40) does not depend on X, the minimum of F.X/ occurs at
X D X0 D A1 b.
2. We have from the above that
F.X0 / D F.A1 b/
D .A1 b/T A.A1 b/ 2.A1 b/T b C c
D c bT A1 b
D c bT X0
D c X0T AX0 ;
H.X/ D 2A:
Since A is positive definite, H.X/ is positive definite and consequently, the function f is
strictly convex.
4. The function F can we written in the form (8.38), with
" # " # " #
x1 4 1 32
XD ; AD ; bD ; c D 1:
x2 1 3 1
1 p 1 p
1 D 7C 5 ; 2 D 7 5 :
2 2
Hence, applying the above results, we deduce that F has a global minimum, which is
" #
7
1 22
X0 D A bD 5
:
22
8.4 Exercises
367 8
Now, the value of F at X0 is
13
F.X0 / D c X0T AX0 D :
44
13
Hence, we deduce that for any X in R2 we have F.X/ > 44
. J
Exercise 8.4
Let A and B be two matrices in Mn .R/.
1. Show that the following two statements are equivalent:
X T AY D X T BY: (8.41)
(ii) A D B.
2. Prove that if A and B are symmetric, then the following statements are equivalent:
X T AX D X T BX: (8.42)
(ii) A D B.
Solution
1. First, it is clear that (ii) implies (i). Now, let us assume that (i) holds and show that A D B.
So, (8.41) implies that for any vector X in Rn ,
X T .AY BY/ D 0:
In particular, this equality also holds for the vector AY BY, which is also a vector in Rn .
Hence, we have
AY BY D 0Rn :
Since the last equality is true for all vectors Y of Rn , then necessarily A B D 0Mn .R/ .
2. As above, it is trivial to see that (ii) implies (i). Now, if A and B are symmetric and
(8.42) holds, then, applying (8.42) for the vectors X; Y, and X C Y in Rn , we get
This implies
Y T AX C X T AY D Y T BX C X T BY:
Hence, we obtain X T AY D X T BY, for any X and Y in Rn . Thus, applying (1), we obtain
A D B. J
3. Prove that if H.X/ is positive definite (respectively, semi-definite) for all X in Rn , then f
is strictly convex (respectively, convex) in Rn .
4. Deduce that the function
is strictly convex on R3 .
8.4 Exercises
369 8
Solution
2 2
1. Since @x@i @xf j D @x@j @xf i , we deduce that H.X/ is symmetric.
2. First, assume that (8.43) is satisfied and let X and Y be two vectors in Rn . Let be in
Œ0; 1. We put Z D Y C .1 /X. Then,
and
Hence, multiplying (8.44) by and (8.45) by 1 and adding the results, we obtain
Hence, we get
D f .Y/ f .X/:
Recall that
f .X C d/ f .X/
rf .X/T d D lim ;
!0C
Therefore, from question .2/, we deduce that f is strictly convex. By the same argument, if
H is positive semi-definite, then f is convex.
4. To show that f is strictly convex, then it is enough to show that its Hessian matrix is
positive definite. Indeed, we have
2 3
@2 f 2 2
.X/ @x@1 @xf 2 .X/ @x@1 @xf 3 .X/ 2 3
6 @x21 7 222
8 6
H.X/ D 6
@2 f
@x2 @x1
.X/ @x@2 f
2 .X/ @x
@2 f
@x
.X/ 7
7
6 7
D 42 4 05:
4 2 2 3
5
@2 f @2 f @2 f 206
@x3 @x1 .X/ @x3 @x2 .X/ @x23 .X/
Hence, A is positive definite (see Theorem 8.2.4). Therefore, according to question .3/, we
deduce that f is strictly convex. J
Exercise 8.6
1. Let v1 ; v2 ; : : : ; vn be a basis of a vector space E over a field K. Let W be a k-dimensional
subspace of E. Show that if m < k, then there exists a nonzero vector in W which is a
linear combination of the vectors vmC1 ; : : : ; vn .
2. Let A be a symmetric matrix in Mn .R/. Show that if Y T AY > 0, for all nonzero vectors Y
in a k-dimensional subspace W of Rn , then A has at least k positive eigenvalues (counting
the multiplicity).a
a
There are several proofs of this result, here we adapt the one in [7].
Solution
1. Consider the subspace F defined as
F D spanfvmC1 ; : : : ; vn g:
Y D cmC1 vmC1 C C cn vn :
This is a contradiction, since we assumed that Y T AY > 0, for all nonzero vectors Y in
W. Consequently, m k.
X
n X
n
kAk2F D jaij j2 D tr.AT A/ Frobenius’ norm.b
jD1 iD1
2. Show that
X
n X
n
kAk1 D max jaij j and kAk1 D max jaij j satisfy (8.46):
1in 1jn
jD1 iD1
372 Chapter 8 • Orthogonal Matrices and Quadratic Forms
kAXk
kAk2 D sup D max kAXk; (8.47)
X¤0Rn kXk kXkD1
where k k is the Euclidean norm in Rn introduced in Definition 3.3.1. Show that k k2 is well
defined and satisfies the property (8.46), and kIk2 D 1.
6. Show that for any matrix A in Mn .R/,
p
kAk2 kAkF nkAk2 : (8.48)
8
a
All the results here remain true if we replace R by C. See Remark 8.1.1.
b
Sometimes referred to as the Hilbert–Schmidt norm and defined as the usual Euclidean norm of the matrix A
2
when it is regarded as a vector in Rn .
Solution
1. We have
2 32 3 2 3
1 0 6 1 2 3 37 10 21
6 76 7 6 7
AT A D 4 2 5 2 5 4 0 5 1 5 D 4 10 33 19 5 :
3 1 4 6 2 4 21 19 26
p
Hence, tr.AT A/ D 96, so kAkF D 96.
2. We put A D .aij /; B D .bij /, and C D AB D .cij /. Hence, we have (see
Definition (1.1.11))
X
n
cij D aik bkj :
kD1
Therefore, we obtain
ˇXn ˇ2 X
n 2
ˇ ˇ
jcij j2 D ˇ aik bkj ˇ jaik jjbkj j :
kD1 kD1
X
n 2 X
n X
n
jaik jbkj j jaik j2 jbkj j2 :
kD1 kD1 kD1
8.4 Exercises
373 8
This yields
X X X
kABk2F D jcij j2 jaik j2 jblj j2 D kAk2F kBk2F ;
i;j i;k l;j
which gives the desired result. As a side note, property (8.46) can be seen as a generalization
of the Cauchy–Schwarz inequality.
p
It is obvious that kIk2F D tr.I T I/ D tr.I/ D n. This yields kIkF D n.
3. First, we need to show that kAk1 has the submultiplicativity property (8.46). Indeed,
we have from above
X
n
jcij j jaik jjbkj j:
kD1
Hence,
X
n X n
X X
n
jcij j jaik jjbkj j D jaik j jbkj j :
jD1 j;k kD1 jD1
Since
X
n
jbkj j kBk1 ;
jD1
we obtain
X
n X
n
jcij j jaik j kBk1 kAk1 kBk1 :
jD1 kD1
Thus,
By the same reasoning, we can show that the norm k k1 has the submultiplicativity
property.
4. Since Q is orthogonal, we have (see (8.2)) QT Q D QQT D I. Hence, we get
Similarly,
where we have used the cyclic property of the trace, that is, tr.ABC/ D tr.CAB/. (The reader
should be careful, tr.ABC/ ¤ tr.ACB/ in general).
374 Chapter 8 • Orthogonal Matrices and Quadratic Forms
n ˇX
X n ˇ2
ˇ ˇ
kAXk2 D ˇ aij xj ˇ :
iD1 jD1
whence
and therefore
kAXk
kAkF ; for any X ¤ 0Rn :
kXk
n o
kAXk
This implies that the set of real numbers kXk
; X ¤ 0Rn is bounded and therefore is has a
supremum and we have
kAk2 kAkF :
Now, we need to show that k k2 satisfies (8.46). Indeed, we have for any vector X ¤ 0Rn .
Hence, we obtain,
k.AB/Xk
kAk2 kBk2 ;
kXk
and so
kAXk kXk
D D 1; X ¤ 0Rn :
kXk kXk
Hence, kIk2 D 1.
6. We have already proved in (5) that kAk2 kAkF . So, we just need to show that kAkF
p
nkAk2 . This inequality can be easily obtained due to the fact that kAkF D tr.AT A/
n
.AT A/ D kAk2 (see Exercise 8.8). J
1. Show that
.A/ kAk2 , where k k2 is defined in (8.47) (with C instead of R).a
2. Prove that
3. Show that
1=m
.A/ D lim kAm k2 spectral radius formula:b (8.49)
m!1
a
In fact, this holds for any matrix norm.
b
This formula yields a technique for estimating the top eigenvalue of A.
Solution
1. Let i be an eigenvalue of A, i.e.,
AXi D i Xi ; Xi ¤ 0Cn :
AX D X and Am X D m X:
1=m
.A/ kAm k2 ; for all m:
8 Now, to prove (8.49), we need to show that for any > 0, there exists a positive integer
N D N./ such that for any m N,
1=m
kAm k2
.A/ C :
1
A D A:
.A/ C
D 0:
lim Am
m!1
Consequently, there exists a positive integer l./ such that for m l./, we have
1
kD
kAm kAm k < 1:
.
.A/ C /m
Now, it is enough to choose N./ D l./. This finishes the proof of (8.49). J
377
Servicepart
References – 379
Index – 381
References
1. H. Anton, C. Rorres, Elementary Linear Algebra: with Supplemental Applications, 11th edn.
(Wiley, Hoboken, 2011)
2. M. Artin, Algebra, 2nd edn. (Pearson, Boston, 2011)
3. S. Axler, Linear Algebra Done Right. Undergraduate Texts in Mathematics, 2nd edn. (Springer,
New York, 1997)
4. E.F. Beckenbach, R. Bellman, Inequalities, vol. 30 (Springer, New York, 1965)
5. F. Boschet, B. Calvo, A. Calvo, J. Doyen, Exercices d’algèbre, 1er cycle scientifique, 1er année
(Librairie Armand Colin, Paris, 1971)
6. L. Brand, Eigenvalues of a matrix of rank k. Am. Math. Mon. 77(1), 62 (1970)
7. G.T. Gilbert, Positive definite matrices and Sylvester’s criterion. Am. Math. Mon. 98(1), 44–46
(1991)
8. R. Godement, Algebra (Houghton Mifflin Co., Boston, MA, 1968)
9. J. Grifone, Algèbre linéaire, 4th edn. (Cépaduès–éditions, Toulouse, 2011)
10. G.N. Hile, Entire solutions of linear elliptic equations with Laplacian principal part. Pac. J. Math
62, 127–140 (1976)
11. R.A. Horn, C.R. Johnson, Matrix Analysis, 2nd edn. (Cambridge University Press, Cambridge,
2013)
12. D. Kalman, J.E. White, Polynomial equations and circulant matrices. Am. Math. Mon. 108(9),
821–840 (2001)
13. P. Lancaster, M. Tismenetsky, The Theory of Matrices, 2nd edn. (Academic Press, Orlando, FL,
1985)
14. S. Lang, Linear Algebra. Undergraduate Texts in Mathematics, 3rd edn. (Springer, New York,
1987)
15. L. Lesieur, R. Temam, J. Lefebvre, Compléments d’algèbre linéaire (Librairie Armand Colin,
Paris, 1978)
16. H. Liebeck, A proof of the equality of column and row rank of a matrix. Am. Math. Mon. 73(10),
1114 (1966)
17. C.D. Meyer, Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia, PA, 2000)
18. D.S. Mitrinović, J.E. Pečarić, A.M. Fink, Classical and New Inequalities in Analysis. Mathematics
and Its Applications (East European Series), vol. 61 (Kluwer Academic, Dordrecht, 1993)
19. C. Moler, C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twenty-five
years later. SIAM Rev. 45(1), 3–49 (2003)
20. J.M. Monier, Algèbre et géométrie, PC-PST-PT, 5th edn. (Dunod, Paris, 2007)
21. P.J. Olver, Lecture notes on numerical analysis, http://www.math.umn.edu/~olver/num.html.
Accessed Sept 2016
22. F. Pécastaings, Chemins vers l’algèbre, Tome 2 (Vuibert, Paris, 1986)
23. M. Queysanne, Algebre, 13th edn. (Librairie Armand Colin, Paris, 1964)
24. J. Rivaud, Algèbre linéaire, Tome 1, 2nd edn. (Vuibert, Paris, 1982)
25. S. Roman, Advanced Linear Algebra. Graduate Texts in Mathematics, vol. 135 (Springer, New
York, 2008)
26. H. Roudier, Algèbre linéaire: cours et exercices, 3rd edn. (Vuibert, Paris, 2008)
27. B. Said-Houari, Differential Equations: Methods and Applications. Compact Textbook in
Mathematics (Springer, Cham, 2015)
28. D. Serre, Matrices. Theory and Applications. Graduate Texts in Mathematics, vol. 216, 2nd edn.
(Springer, New York, 2010)
29. G. Strang, Linear Algebra and Its Applications, 3rd edn. (Harcourt Brace Jovanovich, San Diego,
1988)
30. V. Sundarapandian, Numerical Linear Algebra (PHI Learning Pvt. Ltd., New Delhi, 2008)
31. H. Valiaho, An elementary approach to the Jordan form of a matrix. Am. Math. Mon. 93(9),
711–714 (1986)
381
Index
Elementary Identity
– matrix, 82 – operator, 200
Elementary row operation, 246 Image
Endomorphism, 200 – of a linear transformation, 207
Equation Inconsistent
– linear, 4 – system of linear equations, 263
Equivalent inconsistent, 264
– matrices, 240 Index
Euclidean – of a matrix, 358
– norm, 129 Inertia
– vector space, 126 – of a matrix, 354, 359
Euler Injective
– formula, 283 – linear transformation, 205
Exponential Inverse
– of a matrix, 32 – of an isomorphism, 212
Isomorphic
Factorization – vector spaces, 212
– LU, 336 Isomorphism, 200
– QR, 330 – of vector spaces, 211
Fibonacci
– matrix, 116 Jordan
Frobenius inequality, 257 – block, 304
– for rank, 223 – canonical form, 303
Hölder’s Matrix
– inequality, 152 – associated to a linear transformation, 227
Hankel – augmented, 40
– matrix, 110 – circulant, 319
Hessian – companion, 112, 317
– matrix, 368 – diagonal, 30, 166
Homogeneous – idempotent, 58, 60
– system, 5 – identity, 21
– inverse, 22
Idempotent – involutory , 60
– matrix, 258, 266 – nilpotent, 55, 302
383
Index
Similar Taylor’s
– matrices, 240, 288 – theorem, 369
Singular Trace, 57
– matrix, 279 – of a matrix, 37
Spectral Transition matrix, 235, 252
– radius, 375 Triangle inequality, 139
– theorem, 334 – for rank, 223
Spectrum, 278, 296 Triangularization
Square root – of a matrix, 298
– of a matrix, 361 – of an endomorphism, 298
Submatrix
– leading principal, 337
– principal, 337 Unit
Subspace, 163 – vector, 130
Surjective
– linear transformation, 208
Vector space, 127, 159
Sylvester’s law
– of nullity, 257
Sylvester’s law of inertia, 359 Young’s inequality, 138, 153