0% found this document useful (0 votes)
98 views

Linear Print

This document provides an introduction to matrices, including: 1. Matrices are rectangular arrays of numbers arranged in rows and columns. Common types include column/row vectors and square matrices. 2. Basic matrix operations are defined, such as addition, subtraction, scalar multiplication. Matrix multiplication is defined when the number of columns of the first matrix equals the number of rows of the second. 3. Special matrices include the zero matrix (all entries are 0) and identity matrix (diagonal entries are 1, others are 0). Properties of matrix multiplication and powers are introduced.

Uploaded by

V Ravichandran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Linear Print

This document provides an introduction to matrices, including: 1. Matrices are rectangular arrays of numbers arranged in rows and columns. Common types include column/row vectors and square matrices. 2. Basic matrix operations are defined, such as addition, subtraction, scalar multiplication. Matrix multiplication is defined when the number of columns of the first matrix equals the number of rows of the second. 3. Special matrices include the zero matrix (all entries are 0) and identity matrix (diagonal entries are 1, others are 0). Properties of matrix multiplication and powers are introduced.

Uploaded by

V Ravichandran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

Linear Algebra

V. Ravichandran
Department of Mathematics
University of Delhi
CONTENTS
Chapter 1
Matrices

1.1. D EFINITIONS AND EXAMPLES

A rectangular array of mn elements arranged in m rows and n columns,


enclosed in brackets is called a matrix of order m × n. Matrices are
denoted by uppercase letter or by writing the general entry in brackets.
Thus an m × n matrix A is written as
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A = [ai j ] =  
. . . . . . . . . . . . . . . . . . .
am1 am2 . . . amn
The i jth entry (the entry in the intersection of ith row and jth column)
ai j of the matrix A is sometime denoted by (A)i j .
Example 1.1.1
Following are some examples of matrices.
(1) The matrices
 
  0
1  2
A1 = −2 and A2 =  −3

3
33
have only one column. Such matrices are called col-
umn matrices or column vectors.
2 Chapter 1. Matrices

(2) A3 = [3 4 5 6] is a matrix with one row and such


matrices with one row are called row matrices or
row vectors.
(3) The matrix
 
1 2 3
A4 = 4 5 6
7 8 9
has equal number of rows and columns. Such matrix
of order n × n are called square matrix of order n.
In this example, the entries 1, 5, 6 are in the (main)
diagonal of A4 .

Given the square matrix A = [ai j ] of order n, the entries a11 , a22 , . . . , ann
are the diagonal entries of the matrix A.
Definition 1.1.1
A matrix whose all the entries are zero is called a zero matrix and
a square matrix of order n whose diagonal entries are 1 and other
entries are zero is called a identity matrix of order n. It is denoted
by In or simply I.

Example 1.1.2
The matrices  
  0 0 0
0 0 0 0 0 0
,
0 0 0
0 0 0
are both zero matrices while
 
  1 0 0
1 0
, 0 1 0
0 1
0 0 1
are identity matrices of order 2 and 3 respectively.

1.1.1. Operations on Matrices


Definition 1.1.2
Two matrices A = [ai j ] and B = [bi j ] are equal if they are of the
same order and ai j = bi j for all i and j.
1.1. Definitions and examples 3

Example 1.1.3
Let the matrices A, B and C be given by
 
    1 2
1 2 3 x 2 3
A= , B= , C = 3 4 .
4 5 6 4 y z
5 6
The matrices A and B are equal if x = 1, y = 5 and z = 6.
The matrices A and C are not equal as they have different
orders.

Definition 1.1.3
Let A = [ai j ] and B = [bi j ] be two matrices of the same order. Then
their sum A + B is defined by A + B = [ai j + bi j ]. Similarly sub-
traction and scalar multiplication are defined by
A − B = [ai j − bi j ]
and
kA = [kai j ]
where k is a scalar. If A1 , A2 , . . ., An are matrices of the same order
and c1 , c2 , . . ., cn are are scalars, then the expression
c1 A1 + c2 A2 + · · · + cn An
is called a linear combination of A1 , A2 , . . ., An with coefficients
c1 , c2 , . . ., cn .

Example 1.1.4
Let the matrices A, B and C be given by
 
1 −2 3
A= ,
−1 2 3
 
0 −1 2
B= ,
−3 2 1
 
3 2 −1
C= .
2 0 −1
Then we have
   
1 −2 3 0 −1 2
A+B = +
−1 2 3 −3 2 1
4 Chapter 1. Matrices
 
1 + 0 −2 − 1 3 + 2
=
−1 − 3 2+2 3+1
 
1 −3 5
= .
−4 4 4
Similarly
   
1 −2 3 0 −1 2
A−B = −
−1 2 3 −3 2 1
 
1−0 −2 − (−1) 3 − 2
=
−1 − (−3) 2−2 3−1
 
1 −1 1
= .
2 0 2
Also we have
   
3 2 −1 9 6 −3
3C = 3 = .
2 0 −1 6 0 −3
Also the linear combination of A, B and C with coeffi-
cients 2, 1, -1 is given by
2A + B −C
     
1 −2 3 0 −1 2 3 2 −1
=2 + −
−1 2 3 −3 2 1 2 0 −1
     
2 −4 6 0 −1 2 −3 −2 1
= + +
−2 4 6 −3 2 1 −2 0 1
 
−1 −7 9
= .
−7 6 8

Definition 1.1.4
A matrix A is conformable with the matrix B if the number of
columns of A is equal to the number of rows of B. That is, if A
is an m × p matrix and B is an q × n matrix, the matrix A is con-
formable with the matrix B if p = q.
If the matrix A is conformable with the matrix B, then the product
AB is defined as follows.
1.1. Definitions and examples 5

Definition 1.1.5
The product or multiplication of two matrices A = [ai j ] and B =
[b jk ] of order m × n and n × p respectively is another matrix of order
m × p given by " #
n
AB = ∑ ai j b jk .
j=1

Notice that the ikth entry of the product is given by


(AB)ik = ai1 b1k + ai2 b2k + · · · + ain bnk
which is the sum of products of the ith row elements of A with the
corresponding jth column elements of B.
Example 1.1.5
Let the matrices A and B be given by
 
  1 −1
1 −2 3
A= , B = −1 2  .
4 5 2
−2 −1
Then we have
 
  1 −1
1 −2 3 
AB = −1 2 
4 5 2
−2 −1
 
1 × 1 + (−2) × (−1) + 3 × (−2) 1 × (−1) + (−2) × 2 + 3 × (−1)
=
4 × 1 + 5 × (−1) + 2 × (−2) 4 × (−1) + 5 × 2 + 2 × (−1)
 
1 + 2 − 6 −1 − 4 − 3
=
4 − 5 − 4 −4 + 10 − 2
 
−3 −8
=
−5 4
Note that BA is undefined.
Note that AB is defined only when A is conformable with B and BA
is defined only when B is conformable with A. Even in such case it
is not necessary that AB = BA. However for a square matrix A, AA is
defined and we denote it by A2 . Similarly all the positive powers of A
can be defined. For example, A3 = AA2 = AAA.
6 Chapter 1. Matrices

Example 1.1.6
Let  
1 2
A= .
−2 1
Then
    
2 1 2 1 2 −3 4
A = = .
−2 1 −2 1 −4 −3

Example 1.1.7
Let  
1 0 0
I3 = 0 1 0 .
0 0 1
Then we have
    
1 0 0 1 0 0 1 0 0
I32 = 0 1 0 0 1 0 = 0 1 0 = I3
0 0 1 0 0 1 0 0 1
and therefore I33 = I32 I3 = I3 I3 = I3 . In general, for any
positive integer m, Inm = In for the identity matrix In .

Definition 1.1.6
The transpose of the matrix A = [ai j ] denoted by AT is the matrix
obtained by interchanging rows and columns of A:
(AT ) ji = (A)i j .

Example 1.1.8
Let the matrices A and B be given by
 
  1 −1
1 2 3
A= , B = 1 −2 .
4 5 2
2 −1
Then the transpose of A and B are given by
 
1 4  
T   T 1 1 2
A = 2 5 , B = .
−1 −2 −1
3 2
1.2. Properties of matrix operations 7

The transpose of a row vector



v = a1 a2 a3 . . . an
is the column vector
 
a1
a2 
 
a3 
 .
 .. 
.
an

Definition 1.1.7
The trace of a square matrix A = [ai j ] of order n, denoted by tr A, is
the sum of all diagonal elements of A:
n
tr A = a11 + a22 + · · · + ann = ∑ aii .
i=1

Example 1.1.9
The trace of the matrix
 
1 2 3
A = 4 5 2 
6 9 −2
is given by
tr A = 1 + 5 + (−2) = 4.

1.2. P ROPERTIES OF MATRIX OPERATIONS

Many of the basic rules of arithmetic are valid for matrices. There are
some rules that are not valid for matrices.
Theorem 1.2.1
Let A, B and C be matrices and α and β be scalars. Then the fol-
lowing rules of matrix operations are valid:
(1) A + B = B + A (Commutative law for addition)
(2) A + (B +C) = (A + B) +C (Associative law for addition)
(3) A(BC) = (AB)C (Associative law for multiplication)
(4) A(B +C) = AB + AC (Left distributive law)
8 Chapter 1. Matrices

(5) (A + B)C = AC + BC (Right distributive law)


(6) α (A + B) = α A + α B
(7) (α + β )A = α A + β A
(8) (αβ )A = α (β A)
(9) α (AB) = (α A)B = A(α B)
provided that the orders of the matrices are such that the indi-
cated operations can be performed.

P ROOF. We prove (1)-(4). The proof of (5)–(9) are left as exer-


cise.
(1) To prove A + B = B + A, let A = [ai j ] and B = [bi j ]. Then
We have
A + B = [ai j ] + [bi j ]
= [ai j + bi j ]
= [bi j + ai j ]
= B + A.
(2) To prove A + (B + C) = (A + B) + C, let A = [ai j ], B =
[bi j ] and C = [ci j ]. Since A + B = [ai j + bi j ], we have
(A + B) +C = [ai j + bi j ] + [ci j ]
= [(ai j + bi j ) + ci j ]
= [[ai j + (bi j + ci j )]
= [ai j ] + [bi j + ci j ]
= A + (B +C).
(3) To prove A(BC) = (AB)C, let A = [ai j ], B = [b jk ] and C =
[ckl ] be matrices of order m × n, n × p, p × q respectively.
Note that
" #
n
AB = [ai j ][b jk ] = ∑ ai j b jk ,
j=1
" #
p
BC = [b jk ][ckl ] = ∑ b jk ckl .
k=1
1.2. Properties of matrix operations 9

Therefore
" #
n
(AB)C = ∑ ai j b jk [ckl ]
j=1
" ! #
p n
= ∑ ∑ ai j b jk ckl
k=1 j=1
" #
p n
= ∑ ∑ ai j b jk ckl
k=1 j=1
" #
n p
= ∑ ∑ ai j b jk ckl
j=1 k=1
" !#
n p
= ∑ ai j ∑ b jk ckl
j=1 k=1
= A(BC).
(4) To prove A(B + C) = AB + AC, let A = [ai j ], B = [b jk ]
and C = [c jk ] be matrices of order m × n, n × p, n × p
respectively. Since
B +C = [b jk ] + [c jk ] = [b jk + c jk ],
" #
n
AB = ∑ a jk b jk ,
k=1
" #
n
AC = ∑ a jk c jk ,
k=1
we have
A(B +C) = [a jk ][b jk + c jk ]
" #
n
= ∑ a jk (b jk + c jk )
k=1
" #
n 
= ∑ a jk b jk + a jk c jk
k=1
10 Chapter 1. Matrices
" #
n n
= ∑ a jk b jk + ∑ a jk c jk
k=1 k=1
" # " #
n n
= ∑ a jk b jk + ∑ a jk c jk
k=1 k=1
= AB + AC.

Example 1.2.1
Let A, B and C be matrices given by
     
1 2 1 −1 2 3 1
A= , B= , C= .
2 3 3 2 3 −2 0
Let us verify the associative law for multiplication.
We have
    
1 2 1 −1 7 3
AB = =
2 3 3 2 11 4
and
    
7 3 2 3 1 23 15 7
(AB)C = = .
11 4 3 −2 0 34 25 11
Similarly we have
    
1 −1 2 3 1 −1 5 1
BC = =
3 2 3 −2 0 12 5 3
and
    
1 2 −1 5 1 23 15 7
A(BC) = = .
2 3 12 5 3 34 25 11
Thus we see that (AB)C = A(BC).

The commutative law does not hold for matrix multiplication as


shown by the following example.
Example 1.2.2
Let A and B be two matrices given by
   
1 2 0 1
A= , B=
3 4 1 0
1.2. Properties of matrix operations 11

Then     
1 2 0 1 2 1
AB = =
3 4 1 0 4 3
and
    
0 1 1 2 3 4
BA = = ̸ AB.
=
1 0 3 4 1 2
This example shows that the matrix multiplication is not
commutative.

Example 1.2.3
Let A and B be two matrices given by
   
1 0 0 0
A= , B=
0 0 0 1
Then
    
1 0 0 0 0 0
AB = = = 0.
0 0 0 1 0 0
This example shows that AB = 0 need not imply A = 0 or
B = 0.
If  
0 0
C= ,
0 2
then AC = 0 and therefore AB = AC but not B = C. Can-
cellation law that AB = AC ⇒ B = C does not hold al-
ways.

Theorem 1.2.2
If A and B are two square matrices and are commutative, then
(AB)2 = A2 B2 .

P ROOF. Since AB = BA, it follows that


(AB)2 = (AB)(AB)
= A(B(AB))
= A((BA)B)
= A((AB)B)
12 Chapter 1. Matrices

= A(A(BB))
= A(AB2 )
= (AA)B2
= A2 B2 .

Theorem 1.2.3
If A and B are two square matrices and are commutative, then for
any positive integer n, we have
ABn = Bn A.

P ROOF. We prove this result by mathematical induction. For n =


1, this is commutativity of A and B. Assume that the result is valid
for n = k: ABk = Bk A. Now, for n = k + 1, we have
ABk+1 = A(BBk ) = (AB)Bk = (BA)Bk
= B(ABk ) = B(Bk A) = (BBk )A = Bk+1 A.
Thus the result is valid for n = k + 1. The result follows by the
principle of mathematical induction.
The above theorem hold more generally as in the following theorem.

Theorem 1.2.4
If A and B are two square matrices and are commutative, then for
any positive integer n, we have
(AB)n = An Bn .

P ROOF. We prove by mathematical induction. For n = 1, this


result is the commutativity of A and B. Assume that the result is
valid for n = k: (AB)k = Ak Bk . Now, for n = k + 1, we have, using
Theorem 1.2.3,
(AB)k+1 = (AB)k (AB) = (Ak Bk )(AB) = Ak (Bk (AB))
= Ak ((Bk A)B) = Ak ((ABk )B) = Ak (A(Bk B))
1.2. Properties of matrix operations 13

= Ak (ABk+1 ) = (Ak A)Bk+1 = Ak+1 Bk+1 .


This shows that the result is valid for n = k + 1.
Our next example shows that the cancellation law does not hold for
matrix multiplication.
Example 1.2.4
Let A, B and C be given by
     
1 0 2 3 2 3
A= , B= , C= .
1 0 4 5 5 6
Then  
2 3
AB = = AC
2 3
but B ̸= C. This shows that the cancellation law is not
satisfied.
Notice that the product of two nonzero matrix can be zero matrix:
    
1 0 0 0 0 0
= .
1 0 1 1 0 0

Theorem 1.2.5 Properties of Zero Matrices


Let A be any matrix and 0 the zero matrix. Let the order of these
matrices be such that the indicated operations can be performed.
Then we have
(a) A + 0 = 0 + A = A (b) 0 − A = −A
(c) A − A = 0 (d) A0 = 0A = 0.

Theorem 1.2.6 Properties of Identity Matrices


Let I the identity matrix of order n. For a matrix A of order p × n,
we have AI = A, and for the matrix A of order n×q, we have IA = A.

P ROOF. Let  
a11 a12 . . . a1n
a21 a22 . . . a2n 
A= 
. . . . . . . . . . . . . . . . . .
a p1 a p2 . . . a pn
14 Chapter 1. Matrices

Then
  
a11 a12 . . . a1n 1 0 ··· 0
a21 a22 . . . a2n  0 1 · · · 0
AI =   
 . . . . . . . . . . . . . . . . .   . . . . . . . . . . . .
a p1 a p2 . . . a pn 0 0 ··· 1
 
a11 a12 . . . a1n
a21 a22 . . . a2n 
=. . . . . . . . . . . . . . . . .

a p1 a p2 . . . a pn
= A.
This proves the first part. The second part is similar.

1.3. I NVERTIBLE MATRICES

Definition 1.3.1
If A is a square matrix of n, a matrix B of order n is called the
inverse of A if
AB = BA = In .
If A has an inverse, it is called invertible or non-singular. If there
is no inverse, then the matrix is called singular.
Note that the inverse is defined only for square matrices and if B is
the inverse of A, then A is the inverse of B.
Example 1.3.1
Consider the matrix
  1 1

1 1 2 2
A= and B= ,
1 −1 1
2 − 12
we see that
 1 1
  
1 1 2 2 1 0
AB = = = I2
1 −1 1
2 − 12 0 1
and
1 1
   
2 2 1 1 1 0
BA = = = I2 .
1
2 − 12 1 −1 0 1
This shows that B is the inverse of A.
1.3. Invertible matrices 15

Example 1.3.2
Consider the matrix
   
1 1 a b
A= and B= ,
0 0 c d
we see that
    
1 1 a b a+c b+d
AB = = ̸= I2
0 0 c d 0 0
for any a, b, c, d . Therefore A has no inverse.

Theorem 1.3.1
The inverse of a square matrix, if it exists, is unique. Equivalently,
if B and C are inverses of a square matrix A, then B = C.

P ROOF. Since B and C are inverses of A, we have


AB = BA = I, AC = CA = I.
Using this we see that
B = BI = B(AC) = (BA)C = IC = C.

Since the inverse of a square matrix is unique, we denote it by A−1 .


Thus A−1 is the matrix satisfying

AA−1 = A−1 A = I.

Example 1.3.3
Let ad − bc ̸= 0. Then the inverse of the matrix
 
a b
A=
c d
is given by
 
−1 1 d −b
A = .
ad − bc −c a
16 Chapter 1. Matrices

P ROOF. By a direct multiplication, we see that


   
−1 a b 1 d −b
AA =
c d ad − bc −c a
1.3. Invertible matrices 17
  
1 a b d −b
=
ad − bc c d −c a
 
1 ad − bc 0
=
ad − bc 0 ad − bc
 
1 0
=
0 1
=I
and similarly A−1 A = I.
The next theorem shows that the inverse of the product of invertible
matrices of the same order is the product of the inverses in the reverse
order.
Theorem 1.3.2
If A and B are invertible matrices of order n, then their product AB
is also invertible and
(AB)−1 = B−1 A−1 .

P ROOF. Since
(AB)(B−1 A−1 ) = A(B(B−1 A−1 ))
= A((BB−1 )A−1 )
= A(IA−1 )
= AA−1
=I
and
(B−1 A−1 )(AB) = B−1 (A−1 (AB))
= B−1 ((A−1 )A)B)
= B−1 (IB)
= B−1 B
=I
18 Chapter 1. Matrices

we see that AB is invertible and


(AB)−1 = B−1 A−1 .

Corollary 1.3.1
If A1 , A2 , . . . , An are invertible matrices of order n, then their prod-
uct A1 A2 · · · An is invertible and
(A1 A2 · · · An )−1 = A−1 −1 −1
n An−1 · · · A1 .

P ROOF. The proof is by the principle of mathematical induction.


For n = 1, the result reduces to A−1 −1
1 = A1 . Assume the result for
n = k:
(A1 A2 · · · Ak )−1 = A−1 −1 −1
k Ak−1 · · · A1 .
Now, with n = k + 1, we have
(A1 A2 · · · Ak+1 )−1 = ((A1 A2 · · · Ak )Ak+1 )−1
= A−1
k+1 (A1 A2 · · · Ak )
−1

= A−1 −1 −1
k+1 Ak · · · A1 .
The second equality above follows from Theorem 1.3.2 while the
third equality follows from the induction assumption. Thus, the
result is true for n = k + 1.

Theorem 1.3.3
Let A be invertible. Then we have the following:
(1) A−1 is invertible and (A−1 )−1 = A.
(2) An is invertible and (An )−1 = (A−1 )n for any positive inte-
ger n.
(3) kA is invertible and (kA)−1 = 1k A−1 for any scalar k ̸= 0.

P ROOF.

(1) Since A is invertible, we have


AA−1 = A−1 A = I
1.3. Invertible matrices 19

or
A−1 A = AA−1 = I.
This shows that the inverse of A−1 is A.
(2) Since A and A−1 are commutative, we have
An (A−1 )n = (AA−1 )n = I n = I
and similarly (A−1 )n An = I.
(3) Since k ̸= 0, we have
1 1
(kA) A−1 = (k )AA−1 = I
k k
1 −1
and similarly k A (kA) = I.

Theorem 1.3.4 Cancellation law


If A is invertible and AB = AC, then B = C.

P ROOF. Since A is invertible, we have


AA−1 = A−1 A = I
and therefore by multiplying AB = AC by A−1 , we get
A−1 (AB) = A−1 (AC).
By using associative law, we get
(A−1 A)B = (A−1 A)C
or
IB = IC
or
B = C.

Theorem 1.3.5
Let A, B and C be matrices and a and b be scalars. Then the follow-
ing properties of transpose are valid:
(1) (AT )T = A
(2) (A + B)T = AT + BT
(3) (A − B)T = AT − BT
20 Chapter 1. Matrices

(4) (kA)T = kAT , (k a scalar)


(5) (AB)T = BT AT
provided that the orders of the matrices are such that the indi-
cated operations can be performed.

P ROOF. We prove only (1) and (5).


(1) Let A = [ai j ]. Then (AT ) ji = ai j and therefore
((AT )T )i j = ai j = (A)i j . Thus (AT )T = A.
(5) Let A = [ai j ] and B = [b jk ] be two matrices of order m×n
and n × p respectively. Then we have
" #
n
AB = [cik ] = ∑ ai j b jk
j=1
and therefore
n
((AB)T )ki = cik = ∑ ai j b jk .
j=1

Since (AT ) ji = ai j and (BT )k j = b jk , we have


n
T T
(B A )ki = ∑ (BT )k j (AT ) ji
j=1
n
= ∑ b jk ai j
j=1
n
= ∑ ai j b jk
j=1
and therefore
(BT AT )ik = (BT AT )ki .
This proves that (AB)T = BT AT .

Theorem 1.3.6
If A is invertible, then AT is also invertible and
(AT )−1 = (A−1 )T .
1.4. Diagonal, symmetric matrices 21

P ROOF. Since A is invertible, we have


AA−1 = A−1 A = I.
By taking transpose we get
(AA−1 )T = (A−1 A)T = I T
and using (AB)T = BT AT , I T = I, we get
(A−1 )T AT = AT (A−1 )T = I
or
AT (A−1 )T = (A−1 )T AT = I.
This shows that the inverse of AT is (A−1 )T .

1.4. D IAGONAL , SYMMETRIC MATRICES

Definition 1.4.1
A square matrix A = [ai j ] such that ai j = 0 for i ̸= j is a di-
agonal matrix. A diagonal matrix whose diagonal entries are
d1 , d2 , . . . , dn is denoted by diag(d1 d2 . . . dn ). If ai j = 0 for
i > j, it is upper triangular. If ai j = 0 for i < j, then it is lower
triangular. A diagonal matrix in which all the diagonal elements
are equal is called scalar matrix. A scalar matrix with all diagonal
entries equal to one is the unit or identity matrix.

Example 1.4.1
The matrices
   
  1 0 0 1 0 0
1 0 0 3 0 0 1 0
0 3
0 0 4 0 0 1
are examples of diagonal matrices.
The matrices
   
  1 2 3 1 0 1
1 2 0 3 −1 0 2 2
0 3
0 0 4 0 0 4
22 Chapter 1. Matrices

are examples of upper triangular matrices and the matri-


ces
   
  1 0 0 1 0 0
1 0  2 3 0  , 2 2 0 
−2 3
−3 1 4 4 2 −1
are examples of lower triangular matrices respectively.

Example 1.4.2
The product of two diagonal matrices
  
d1 0 0 · · · 0 e1 0 0 ··· 0
 0 d2 0 · · · 0   0 e2 0 ··· 0
  
 .. .. .. .. ..   .. .. ... ... .. 
. . . . .  . . .
0 0 0 · · · dn 0 0 0 ··· en
 
d1 e1 0 0 ··· 0
 0 d2 e2 0 · · · 0 
 
=  .. .. .. .. ... 
 . . . . 
0 0 0 ··· dn en
is again a diagonal matrix.
Also it is clear that the inverse of diagonal matrix
 
d1 0 0 · · · 0
 0 d2 0 · · · 0 
 
D =  .. .. .. .. .. 
. . . . .
0 0 0 · · · dn
is the matrix D−1 given by
1 
d1 0 0 ··· 0
0 1
0 ··· 0
 
D−1 =  .
d2

 .. ... .. ..
. . ... 
0 0 0 ··· 1
dn
provided the diagonal entries are all nonzero.
1.4. Diagonal, symmetric matrices 23

Also it is easy to see that, for any positive integer k,


 k 
d1 0 0 · · · 0
 0 dk 0 · · · 0 
 2 
Dk =  . . .. .. .. 
. . .
. . . .
0 0 0 ··· dnk

Definition 1.4.2
The matrix A is symmetric if
AT = A
and skew-symmetric if
AT = −A.

Example 1.4.3
The matrices    
1 2 3 1 2 3
2 3 −1 , 2 3 4
3 −1 4 3 4 5
are both examples of symmetric matrices. The matrix
 
0 2 3
A = −2 0 −1
−3 1 0
satisfies
   
0 −2 −3 0 2 3
AT = 2 0 1  = − −2 0 −1 = −A
3 −1 0 −3 1 0
and therefore it is an example of skew-symmetric matrix.
Similarly the matrix
 
0 2 −3
−2 0 −4
3 4 0
is skew-symmetric. Note that all the diagonal elements
of a skew-symmetric matrices are zero.
24 Chapter 1. Matrices

Theorem 1.4.1
If A and B are symmetric matrices of same order, k a scalar, then
the matrices A + B, A − B, AT , kA are all symmetric matrices.

P ROOF. We prove only that A is symmetric, then AT is symmetric.


Since A is symmetric, we have AT = A. Now
(AT )T = A = AT
and therefore AT is symmetric.

Theorem 1.4.2
If A and B are symmetric matrices of order n, then the matrix AB is
a symmetric matrix if and only if AB = BA.

P ROOF. Since A, B are symmetric, we have AT = A and BT = B.


We know that
(AB)T = BT AT = BA
and therefore (AB)T = AB if and only if AB = BA.

Theorem 1.4.3
If A is invertible symmetric matrix, then the matrix A−1 is also a
symmetric matrix.

P ROOF. Since A is symmetric, we have AT = A. Since (A−1 )T =


(AT )−1 , we have
(A−1 )T = (AT )−1 = A−1
and therefore the matrix A−1 is a symmetric matrix.

Example 1.4.4
For any matrix A, AAT is symmetric. For let B = AAT . Since
the transpose of product of two matrices is the product of
their transposes in the reverse order and the transpose of
1.4. Diagonal, symmetric matrices 25

transpose of A is A itself, we have


BT = (AAT )T = (AT )T AT = AAT = B.
This shows that AAT is symmetric. Similarly AT A is also
symmetric.

Example 1.4.5
Let A be given by  
1 −1 2
A= .
0 2 −1
Then we have
 
  1 0  
1 −1 2   6 −4
T
AA = −1 2 =
0 2 −1 −4 5
2 −1
is a symmetric matrix.

Theorem 1.4.4
If A is invertible symmetric matrix, then the matrices AAT and AT A
are invertible symmetric matrices.

P ROOF. Since A is invertible, AT is also invertible and therefore


their products AAT and AT A are invertible. Also these two matri-
ces are symmetric.

Theorem 1.4.5
Any matrix is a sum of a symmetric and a skew-symmetric matri-
ces.

P ROOF. Let A be any matrix and define the matrices B and C by


1 1
B = (A + AT ) C = (A − AT ).
2 2
Then we have
1 1
BT = (A + AT )T = (AT + A) = B
2 2
26 Chapter 1. Matrices

and
1 1 1
CT = (A − AT )T = (AT − A) = − (A − AT ) = −C.
2 2 2
Thus B and C are symmetric and skew-symmetric respectively.
Also
1 1
B +C = (A + AT ) + (A − AT ) = A.
2 2
This completes the proof.

Example 1.4.6
Let  
1 2 3
A = 4 5 6  .
7 8 9
Then
   
1 2 3 1 4 7
1 1
B = (A + AT ) = 4 5 6 + 2 5 8
2 2 7 8 9 3 6 9
 
1 3 5

= 3 5 7
5 7 9
and
   
1 2 3 1 4 7
1 1
C = (A − AT ) = 4 5 6  − 2 5 8 
2 2 7 8 9 3 6 9
 
0 −1 2
= 1 0 −1 .
−2 −1 0
It is clear that B is symmetric and C is skew-symmetric. Also A =
B +C.

Theorem 1.4.6
Let A and B square matrices of order n. The we have
1.4. Diagonal, symmetric matrices 27

(1) tr(A + B) = tr A + tr B

(2) tr(A − B) = tr A − tr B

(3) tr(AB) = tr(BA)

(4) tr(kA) = k tr A

(5) tr(AT ) = tr A.

P ROOF. We prove only that tr(A + B) = tr A + tr B and tr(AB) =


tr BA. To prove the first, let A = [ai j ] and B = [bi j ]. Then A + B =
[ai j + bi j ] and therefore
n n n
tr(A + B) = ∑ (aii + bii ) = ∑ aii + ∑ bii = tr A + tr B.
i=1 i=1 i=1
To prove tr(AB) = tr(BA), let A = [ai j ] and B = [b jk ]. Then
" # " #
n n
AB = ∑ ai j b jk , BA = ∑ bi j a jk .
j=1 j=1
Thus we have
!
n n n
tr(AB) = ∑ ∑ ai j b ji = ∑ ai j b ji
i=1 j=1 i, j=1

and
!
n n n
tr(BA) = ∑ ∑ bi j a ji = ∑ a ji bi j
i=1 j=1 i, j=1
n
= ∑ ai j b ji = tr(AB).
i, j=1
Chapter 2
Determinants

2.1. M INOR AND C OFACTOR OF A SQUARE MATRIX

Let A = [ai j ] be a square matrix of order n. The determinant, of a 1 × 1


matrix A = [a11 ], denoted by det(A) or |A|, is just the number a11 . For a
2 × 2 matrix  
a11 a12
A=
a21 a22
the determinant is defined by

a11 a12

det(A) = = a11 a22 − a12 a21 .
a21 a22
In general, the determinant of a square matrix of order n will be defined
using determinants of matrices of order n − 1. These smaller determi-
nants associated with a given matrix are called cofactors.
The minor of the element ai j is the determinant obtained by deleting
the ith row and jth column of the determinant of A. It is denoted by Mi j
and the cofactor Ci j of ai j is defined by
Ci j = (−1)i+ j Mi j .
30 Chapter 2. Determinants

Example 2.1.1
Consider the matrix
 
a11 a12 a13
A = a21 a22 a23  .
a31 a32 a33
For example, the minors of a11 , a12 , a13 and a32 are given respec-
tively by

a22 a23
M11 = = a22 a33 − a32 a23 ,
a32 a33

a21 a23
M12 = = a21 a33 − a31 a23 ,
a31 a33

a21 a22
M13 = = a21 a32 − a31 a22 ,
a31 a32

a11 a13
M32 = = a11 a23 − a21 a13 .
a21 a23
Their cofactors are respectively given by
C11 = (−1)1+1 M11 = +(a22 a33 − a32 a23 ),
C12 = (−1)1+2 M12 = −(a21 a33 − a31 a23 ),
C13 = (−1)1+3 M13 = +(a21 a32 − a31 a22 ),
C32 = (−1)3+2 M32 = −(a11 a23 − a21 a13 ).

Example 2.1.2
Consider the matrix A given by
 
1 2 3
A = 4 5 6  .
7 8 9
The minors M11 , M12 are given by

5 6
M11 = = 45 − 48 = −3
8 9
2.1. Minor and Cofactor of a square matrix 31

4 6
M12 = = 36 − 42 = −6
7 9
and therefore their cofactors are given by
C11 = (−1)1+1 M11 = M11 = −3
C12 = (−1)1+2 M12 = −M12 = 6.

Example 2.1.3
The cofactor and minor of an element ai j differ only in sign:
Ci j = ±Mi j .
For example, the sign for the cofactors of a 3 × 3 and a 4 × 4 matri-
ces are given in the following matrix:
 
  + − + −
+ − +  
− + − , − + − + .
+ − + −
+ − +
− + − +

Definition 2.1.1
The determinant of the square matrix A = [ai j ] of order n, denoted
by det(A) or |A|, is defined by the following cofactor expansion:
det(A) := a11C11 + a12C12 + · · · + a1nC1n .

Example 2.1.4
The determinant of the matrix
 
1 2 3
A = 2 0 1 
4 5 2
is given by

0 1 2 1 2 0
|A| = 1 −2
4 2 + 3 4 5
5 2
= 1(0 − 5) − 2(4 − 4) + 3(10 − 0)
= 25.
32 Chapter 2. Determinants

Example 2.1.5
The determinant of
 
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33
is given by
det(A)

a11 a12 a13

= a21 a22 a23
a31 a32 a33
= a11C11 + a12C12 + a13C13

a22 a23 a21 a23 a21 a22
= a11 − a12
a31 a33 + a13 a31 a32
a32 a33
= a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 )
+ a13 (a21 a32 − a31 a22 ).
Also we get
det(A)
= a11 a22 a33 + a12 a31 a23 + a13 a21 a32
− a11 a32 a23 − a12 a21 a33 − a13 a31 a22
= a11 (a22 a33 − a32 a23 ) − a21 (a12 a33 − a13 a32 )
+ a31 (a12 a23 − a13 a22 )

a22 a23 a12 a13 a12 a13
= a11 − a21
a32 a33 + a31 a22 a23
a32 a33
= a11C11 + a21C21 + a31C31
This is the cofactor expansion of the determinant using the first col-
umn. It can be shown that the determinant can be found by using
the cofactor expansion using any row or column.

Theorem 2.1.1
The determinant of a matrix A = [ai j ] of order n can be computed as
a sum of product of entries in a row or column and its corresponding
2.1. Minor and Cofactor of a square matrix 33

cofactors:
det(A) = ai1Ci1 + ai2Ci2 + · · · + ainCin ,
or
det(A) = a1 jC1 j + a2 jC2 j + · · · + an jCn j
(The first one is the cofactor expansion using the ith row and the
second one is the cofactor expansion using the jth column.)

Example 2.1.6
Consider the matrix A given in Example 2.1.4:
 
1 2 3
A = 2 0 1 
4 5 2
The determinant of A obtained by cofactor expansion using second
column is

2 1 1 3 1 3

|A| = −2 +0 −5
4 2 4 2 2 1
= −2(4 − 4) + 0 − 5(1 − 6)
= 25.

In computing the determinant, we use a row or column having largest


number of zeros.
Example 2.1.7
Consider the determinant of

1 2 0 4

2 1 2 1
A = .
0 3 0 2
1 0 0 1

The third column has only one nonzero entry. It is best to use this
column to find the determinant. In fact, by the cofactor expansion using
the third column, we have

1 2 4

det(A) = (−1)2+3 2 0 3 2 .
1 0 1
34 Chapter 2. Determinants

Exapanding this using the first column, we get


 
3 2 2 4 2 4
det(A) = −2 1 −0
0 1 + 1 3 2
0 1
= −2[1(3 − 0) + 1(4 − 12)]
= 10

Example 2.1.8
If two rows of a determinant |A| are identical, then |A| = 0.

P ROOF. We prove this for a 3 × 3 matrix with identical second


and third row. In this case, the determinant is

a11 a12 a13

|A| = a21 a22 a23 .
a21 a22 a23
Clearly the cofactor of a11 is given by

a22 a23
C11 = = a22 a23 − a22 a23 = 0.
a22 a23
Similarly, the cofactors C12 and C13 respectively of a12 and a13
are also zero. Thus
det(A) = a11C11 + a12C12 + a13C13 = 0.

Example 2.1.9
Consider the determinant

a11 a12 a13

|A| = a21 a22 a23 .
a31 a32 a33
Consider
a11C21 + a12C22 + a13C23 .
This is obtained by multiplying the first row entries with the co-
factor of the corresponding entries in the second row. This can be
2.1. Minor and Cofactor of a square matrix 35

written as the following determinant:



a11 a12 a13

|A′ | = a11 a12 a13 .
a31 a32 a33
Since the two rows of the above determinant are equal, its value is
zero and therefore
a11C21 + a12C22 + a13C23 = 0.
In general, for a square matrix A = [ai j ] of order n,
(
det A if i = j
ai1C j1 + ai2C j2 + · · · + ainC jn =
0 if i ̸= j.

Definition 2.1.2
For a square matrix A = [ai j ] of order n, the matrix
 
C11 C12 · · · C1n
C21 C22 · · · C2n 
 . .. 
 .. ..
. ··· . 
Cn1 Cn2 · · · Cnn
is called the matrix of cofactors. The transpose of this matrix is
called the adjoint of A and is denoted by Adj(A).

Example 2.1.10
Consider the matrix
 
1 2 3
A = 2 3 1  .
3 2 1
The matrix of the cofactors is given by
 
1 1 −5
 4 −8 4  .
−7 5 −1
36 Chapter 2. Determinants

and therefore the adjoint of A is given by


 
1 4 −7
Adj(A) =  1 −8 5  .
−5 4 −1
The determinant value is given by
det(A) = 1(3 − 2) − 2(2 − 3) + 3(4 − 9) = −12.
Now
  
1 2 3 1 4 −7
A Adj(A) = 2 3 1  1 −8 5 
3 2 1 −5 4 −1
 
−12 0 0
=  0 −12 0 
0 0 −12
 
det(A) 0 0
= 0 det(A) 0 .
0 0 det(A)

Theorem 2.1.2
For any square matrix A of order n,
A Adj(A) = Adj(A)A = |A|I
where I is the identity matrix of order n. For a nonsingular matrix
A, we have
1
A−1 = Adj(A).
|A|

P ROOF. Let A = [ai j ] and its adjoint be given by Adj(A) = [Ci j ].


Then " #
n
A Adj(A) = ∑ ai jC jk .
j=1
2.1. Minor and Cofactor of a square matrix 37

Since
n
∑ ai jC jk = ai1C1k + ai2C2k + · · · + ainCnk
j=1
(
det A if i = k
=
0 if i ̸= k,
we see that
 
det(A) 0 ··· 0
 0 det(A) · · · 0 
A Adj(A) = 
 ...
.
... ··· ... 
0 0 · · · det(A)
This proves that A Adj(A) = det(A)I. Similarly, we can show that
Adj(A)A = det(A)I. Thus we have
1 1
A Adj(A) = Adj(A)A = I
det(A) det(A)
and therefore
1
A−1 = Adj(A).
det(A)

Example 2.1.11
Let A be given by
 
1 1 −1
A =  1 −1 1  .
−1 1 1
The determinant of A is given by
det(A) = 1(−1 − 1) − 1(1 + 1) − 1(1 − 1) = −4.
The matrix of cofactor is given by
 
−2 −2 0
−2 0 −2
0 −2 −2
38 Chapter 2. Determinants

The adjoint of A is therefore given by


 
−2 −2 0
Adj(A) = −2 0 −2
0 −2 −2
and its inverse by  
1 1 0
1
A−1 = 1 0 1
2 0 1 1

Theorem 2.1.3
A matrix A is invertible if and only if det(A) ̸= 0.

P ROOF. To be proved later.

Theorem 2.1.4
The determinant of triangular matrix (upper diagonal, lower diag-
onal or diagonal matrix) is the product of all its diagonal entries
and therefore the matrix is invertible if and only if all its diagonal
entries are nonzero.

P ROOF. We prove this theorem for a lower triangular matrix. The


proof is similar for a diagonal and upper triangular matrices. Con-
sider the lower triangular matrix
 
a11 0 · · · 0
a21 a22 · · · 0 
A= ... ... ··· ....

an1 an2 · · · ann
2.1. Minor and Cofactor of a square matrix 39

By expanding each of the determinant below using first row, we


see that

a11 0 0 · · · 0

a21 a22 0 · · · 0

det(A) = a31 a32 a33 · · · 0
... ... ... ··· ...

an1 an2 an3 · · · ann

a22 0 · · · 0

a32 a33 · · · 0
= a11
. . . . . . · · · . . .

an2 an3 · · · ann
= ···
= a11 a22 a33 · · · ann .
Thus the value of the determinant is the product of all its diagonal
entries and therefore it is invertible if and only if all its diagonal
entries are nonzero.

Example 2.1.12
The inverse of  
λ1 0 0
A =  0 λ2 0 
0 0 λ3
exists if and only if λ1 λ2 λ3 ̸= 0. In this case, the inverse is given by
1 
λ 0 0
 1 
A =  0 λ12 0  .
0 0 λ1
3

This can be verified directly also.


40 Chapter 2. Determinants

Example 2.1.13
The inverse of the lower triangular matrix matrix A given by
 
1 0 0 0
2 1 0 0 
A= 3 2 1 0 

4 3 2 1
is given by  
1 0 0 0
−2 1 0 0
A−1 =  1 −2 1 0 .

0 1 −2 1
The inverse is also a lower triangular matrix. This is true in general
as shown in the next theorem.
Consider the upper triangular matrix A:
 
a11 a12 a13 a14 a15
 0 a22 a23 a24 a25 
 
0 0 a33 a34 a35 
 
0 0 0 a44 a45 
0 0 0 0 a55

Consider the minor of a24 . It is given by



a11 a12 a13 a15

0 0 a33 a35
M24 = .
0 0 0 a45
0 0 0 a55

Since the determinant is upper triangular and has zero in main diagonal,
M24 = 0. In this example, we can verify that Mi j = 0 for all i < j.
Theorem 2.1.5
The inverse (if it exists) of a upper (lower) triangular matrix is again
upper (lower) triangular. The inverse (if it exists) of diagonal matrix
is again diagonal.
2.1. Minor and Cofactor of a square matrix 41

P ROOF. We first prove the result for upper triangular matrix. Let
A = [ai j ] be an upper triangular matrix. Let i < j and Mi j be the
42 Chapter 2. Determinants

minor of ai j . The determinant is clearly upper triangular. The ith


row of Mi j is same as the i + 1st row of A with jth entry removed.
Since i < j, the first i entries of Mi j are zero. Hence Mi j has zero
entry in the diagonal. Hence Mi j = 0 for i < j. Thus the cofactor
matrix is lower triangular and hence its adjoint is upper triangular.
Since
1
A−1 = Adj(A)
det(A)
it follows that A−1 is upper triangular.
For a lower triangular matrix A, AT is upper triangular and
hence its inverse (AT )−1 is also upper triangular. Hence its trans-
pose ((AT )−1 )T is lower triangular. Since follows from A−1 =
((AT )−1 )T , it follows that A−1 is lower triangular.
The inverse of a diagonal matrix
A = diag(λ1 , · · · , λn )
is
1 1
A−1 = diag( ,··· , )
λ1 λn
and is diagonal.

2.2. C RAMER ’ S RULE

Theorem 2.2.1 Cramer’s Rule


If A is a square matrix of order n and det(A) ̸= 0, then linear system
Ax = b has a unique solution given by
det(A1 ) det(A2 ) det(An )
x1 = , x2 = , xn =
det(A) det(A) det(A)
where Ai is the matrix obtained from A by replacing the ith column
by the corresponding entries in b.

P ROOF. Let A be a square matrix and det(A) ̸= 0. Then A−1 exists


and therefore the linear system Ax = b has the solution given by
x = A−1 b. Since
1
A−1 = Adj(A),
det(A)
2.2. Cramer’s rule 43

we have
1
x = A−1 b = Adj(A)b.
det(A)
Now let A = [ai j ], Ad j(A) = [Ci j ]T and b = [b1 b2 · · · bn ]T . Then
1
x= Adj(A)b
det(A)
 T  
C11 C12 · · · C1n b1

1  21 22 C C · · · C   b 
=
2n
  .2 
det(A)  .. ..   .. 
. .
.. · · · .
Cn1 Cn2 · · · Cnn bn
 
b1C11 + b2C21 + · · · + bnCn1
1  b C + b2C22 + · · · + bnCn2 
 1 12 .
=  .. 
det(A) .
b1C1n + b2C2n + · · · + bnCnn
Hence
b1C1i + b2C2i + · · · + bnCni
xi =
det(A)

a11 a12 · · · a1 i−1 b1 a1 i+2 · · · a1n

1 a21 a22 · · · a2 i−1 b2 a2 i+2 · · · a2n
=
det(A) .. · · · ..
. .. .. .. .. .
. ··· . . .
a a ··· a b a ··· a
n1 n2 n i−1 n n i+2 nn
det(Ai )
= .
det(A)

Example 2.2.1
Solve the system of equations
2x+3y−4z =−3,
7x−8y+9z = 10,
−2x+ + z = 1.
44 Chapter 2. Determinants

The linear system can be written as Ax = b where


     
2 3 −4 x −3
A =  7 −8 9  , x = y , b =  10  .
−2 0 1 z 1
Then we have

2 3 −4

det(A) = 7 −8 9 = −27
−2 0 1

−3 3 −4

det(A1 ) = 10 −8 9 = −11
1 0 1

2 −3 −4

det(A2 ) = 7 10 9 = −31
−2 1 1

2 3 −3

det(A3 ) = 7 −8 10 = −49.
−2 0 1
Therefore we have
det(A1 ) 11
x= = ,
det(A) 27
det(A2 ) 31
y= = ,
det(A) 27
det(A3 ) 49
z= = .
det(A) 27

Example 2.2.2
Solve the system of equations
x+y−z =3,
x−y+z =1,
−x+y+z =2.
2.3. Properties of determinants 45

The linear system can be written as Ax = b where


     
1 1 −1 x 3
A =  1 −1 1  , x = y , b = 1 .
−1 1 1 z 2
Then we have

1 1 −1

det(A) = 1 −1 1 = −4
−1 1 1

3 1 −1

det(A1 ) = 1 −1 1 = −8
2 1 1

1 3 −1

det(A2 ) = 1 1 1 = −10
−1 2 1

1 1 3

det(A3 ) = 1 −1 1 = −6.
−1 1 2
Therefore we have
det(A1 )
x1 = = 2,
det(A)
det(A2 ) 5
x2 = = ,
det(A) 2
det(A3 ) 3
x3 = = .
det(A) 2

2.3. P ROPERTIES OF DETERMINANTS

Theorem 2.3.1
If a square matrix A has a row (or a column) of zeros, then det(A) =
0.

P ROOF. Let A = [ai j ] be a square matrix of order n and [Ci j ] be


its cofactor matrix. If the ith row of A is a row of zeros, then by
46 Chapter 2. Determinants

expanding using this row, we get


det(A) = 0 ×Ci1 + 0 ×Ci2 + · · · + 0 ×Cin = 0.

Example 2.3.1
The determinant of the matrix
 
1 2 3
A = 0 0 0 
4 5 6
is zero.

Theorem 2.3.2
For any square matrix A,
det(A) = det(AT ).

P ROOF. The value of det(AT ) found by the cofactor expansion


using first column of AT is same as the value of det(A) found by
the cofactor expansion using the first row of A. Thus det(A) =
det(AT ).

Theorem 2.3.3
If B is the matrix obtained by multiplying a single row (or a column)
of a square matrix A by a constant λ , then
det(B) = λ det(A).

P ROOF. Let A = [ai j ] be a square matrix and [Ci j ] be its cofactor


matrix. Let B be obtained by multiplying the ith row of A by λ .
Then by the cofactor expansion of B using ith row, we have
det(B) = λ ai1Ci1 + λ ai2Ci2 + · · · + λ ainCin
= λ (ai1Ci1 + ai2Ci2 + · · · + ainCin )
= λ det(A).
2.3. Properties of determinants 47

Theorem 2.3.4
If B is the matrix obtained adding a constant multiple of a row (or
a column) to another row (or a column) of a square matrix A, then
det(B) = det(A).

P ROOF. Let B be obtained by adding λ times the jth row of A


to the ith row of A. Evaluating the determinant of B by cofactor
expansion of B using jth row, we get
det(B) = (ai1 + λ a j1 )Ci1 + (ai2 + λ a j2 )Ci2 + · · ·
+ (ain + λ a jn )Cin
= ai1Ci1 + ai2Ci2 + · · · + ainCin
+ λ (a j1Ci1 + a j2Ci2 + · · · + a jnCin )
= ai1Ci1 + ai2Ci2 + · · · + ainCin
= det(A).

Corollary 2.3.1
If a square matrix A has two rows (or two columns) which are pro-
portional, then det(A) = 0.

P ROOF. Let ith row is k times the jth row in the matrix A, then
by adding −k times to the ith row the jth row, we see that the jth
row of A now becomes a row of zero. Thus the determinant of A
is zero.

Theorem 2.3.5
If B is the matrix obtained by interchanging two rows (or two
columns) of a square matrix A, then
det(B) = − det(A).

P ROOF. We prove this result by induction on the order of the ma-


trix A. For n = 1, the result is vacuously true. For n = 2, the result
48 Chapter 2. Determinants

follows as
a11 a12 a21 a22

a21 a22 = − a11 a12 .
Assume that the result is true for all matrices of order n − 1.
Let n ≥ 3. Let B be obtained interchanging ith row and jth rows
of A. Let the cofactor matrices of A and B are respectively [Ci j ],
[Ci′ j ]. Since n ≥ 3, there is a 1 ≤ k ≤ with k ̸= i, j. Noting that
Ck′ j is obtained from Ck j by interchanging the same rows used to
get B from A. Since the cofactors are determinants of matrices of
order n − 1, by induction hypothesis, we have
Ck′ j = −Ck j .
Using cofactor expansion of B using this kth row, and we get
′ ′
det(B) = ak1Ck1 + · · · + aknCkn
= −[ak1Ck1 + · · · + aknCkn ]
= det(A).
Thus the result is therefore true for matrices of order n. By the
principle of mathematical induction, the result follows.

Example 2.3.2
Consider the determinant of a matrix A

1 2 −1 3

1 1 2 0
det(A) = .
−1 4 2 1
−1 2 3 1
By taking −1 as a common factor from the second row, we get

1 2 −1 3

−1 −1 −2 0
det(A) = − .
−1 4 2 1
−1 2 3 1
2.3. Properties of determinants 49

By adding the first row to each of the other rows, we get



1 2 −1 3

0 1 −3 3

det(A) = − .
0 6 1 4

0 4 2 4
Expanding this using the first column, we have

1 −3 3

det(A) = − 6 1 4 .
4 2 4
Now by adding 2nd column to the third column and 3 times the first
column to the second column, we get

1 0 0

det(A) = − 6 19 5 .
4 14 6
Thus by expanding using first row, we see that
det(A) = −(19 × 6 − 14 × 5) = −(114 − 70) = −44.

Example 2.3.3
Consider the matrix
 
1 a b+c
A = 1 b a + c .
1 c a+b
The determinant is given by

1 a b + c

det(A) = 1 b a + c
1 c a + b
By adding the second column to the third, we get

1 a a + b + c

det(A) = 1 b a + b + c
1 c a + b + c
By taking the common factor a + b + c from the last column and
using the fact that the determinant of a matrix having two identical
50 Chapter 2. Determinants

rows or column is zero, we get



1 a 1

det(A) = (a + b + c) 1 b 1 = 0.
1 c 1

Example 2.3.4 Vandermonde determinant


Consider the Vandermonde matrix
 
1 a a2
V = 1 b b 2  .
1 c c2
By adding −1 times the first row to second and third rows, we see
that the determinant of V is given by

1 a a2

det(V ) = 0 b − a b2 − a2 .
0 c − a c2 − a2
By taking b − a from the second row and c − a from the third row
as a common factor, we get

1 a a2

det(V ) = (b − a)(c − a) 0 1 b + a
0 1 c + a
= (b − a)(c − a)[(c + a) − (b + a)]
= (a − b)(b − c)(c − a).

Example 2.3.5
Consider the determinant

a b · · · b
b

b a · · · b
b

det(A) = b b · · · b .
a
... .
· · · ..
.. ..
. .
b b b · · · a
If a = b, then all the rows are equal and therefore the determinant is
zero. Hence assume that a ̸= b. Add all the rows (except the first)
2.3. Properties of determinants 51

to the first row and take the common factor (n − 1)b + a from the
first row, to get

1 1 1 · · · 1

b a b · · · b

det(A) = [(n − 1)b + a] b b a · · · b .
... ... ... · · · ..
.
b b b · · · a
Subtract the first column from all the remaining columns to get

1 0 0 ··· 0

b a − b 0 ··· 0

det(A) = [(n − 1)b + a] b 0 a−b ··· 0
... .. .. ..
. . ··· .
b 0 0 · · · a − b
= [(n − 1)b + a](a − b)n−1 .
The last inequality follows since the determinant of a triangular ma-
trix is the product of diagonal entries.

Theorem 2.3.6
For any square matrix of order n, we have
det(λ A) = λ n det(A).

P ROOF. Since the determinant of a matrix obtained by multiply-


ing a matrix B by a constant λ is λ det(B) and λ A is obtained by
multiplying all the n rows by λ , the result follows.

Example 2.3.6
Consider the identity matrix I3 . Then det(I3 ) = 1 and therefore
det(λ I3 ) = λ 3 .
Further
det(I3 + I3 ) = det(2I3 ) = 23 = 8
and
det(I3 ) + det(I3 ) = 2
52 Chapter 2. Determinants

and therefore
det(I3 + I3 ) ̸= det(I3 ) + det(I3 ).
This shows that
det(A + B) ̸= det(A) + det(B)
in general.

Theorem 2.3.7
Let A, B and C be square matrices which differs only in the ith row.
Let ith row of C is obtained by adding the ith rows of A and B. Then
det(C) = det(A) + det(B).

P ROOF. Notice that the cofactor Ci j of elements in the ith row of


any of these matrices are identical. Let ai j , bi j and ci j denotes the
ith row elements of A, B and C respectively. Then ci j = ai j + bi j .
Now
n
det(C) = ∑ Ci j ci j
j=1
n
= ∑ Ci j [ai j + bi j ]
j=1
n n
= ∑ Ci j ai j + ∑ Ci j bi j
j=1 j=1
= det(A) + det(B)
where n is the order of the matrices.
Now we prove two theorem about invertibility matrices.
Theorem 2.3.8
Let A and B be square matrices. If either BA = I or AB = I, then
B = A−1 .
2.3. Properties of determinants 53

P ROOF. Let BA = I. Consider the linear system Ax = 0. Let x0


be a solution of Ax = 0. Then Ax0 = 0. Now
x0 = Ix0 = BAx0 = b(Ax0 ) = A0 = 0.
54 Chapter 2. Determinants

Thus the linear system Ax = 0 has unique solution and therefore


A is invertible. Now by post-multiplying BA = I by A−1 , we get
(BA)A−1 = IA−1 or B(AA−1 ) = A−1 or BI = A−1 or B = A−1 .
Similarly we can show that AB = I implies B = A−1 .

Theorem 2.3.9
Let A and B be square matrices. If AB is invertible, then both A and
B are invertible.

P ROOF. If AB is invertible then there is a matrix C such that


(AB)C = I. Thus A(BC) = I. By taking D := BC, we have AD = I.
Thus it is clear that A is invertible. Similarly, B is also invert-
ible.

Example 2.3.7
Let E be an elementary matrix of order n. If E is obtained from In
by multiplying a row by k, then det(E) = k. For example, let E be
obtained by multiplying the second row by k, then
 
1 0 0 ··· 0
0 k 0 · · · 0
 
E = 0 0 1 · · · 0 .
. . . 
 .. .. .. · · · ... 
0 0 0 ··· 1
Since the determinant of diagonal matrix is the product of diagonal
entries, it follows that det(E) = k.
If E is obtained by interchanging of two rows of In , then
det(E) = − det(I) = −1.
If E is obtained by adding a multiple of one row of In to another,
then det(E) = 1. For example, let E be obtained by adding k times
2.3. Properties of determinants 55

the last row of In to the first row. Then



1 0 0 · · · k

0 1 0 · · · 0

det(E) = 0 0 1 · · · 0
... ... ... · · · ...

0 0 0 · · · 1

1 0 0 · · · 0 0 0 · · · 1
0

0 1 0 · · · 0 0 1 · · · 0
0

= 0 0 1 · · · 0 + k 0 0 · · · 0 = 1.
1
. .
.. .. .. · · · ..
. . . . .. ..
· · · ..
..
. .
0 0 0 · · · 1 0 0 0 · · · 1
The last determinant is 0 since the first and last rows are identical.

Theorem 2.3.10
Let A be square matrix of order n and E an elementary matrix of
order n, then
det(EA) = det(E) det(A).

P ROOF. If E is obtained by multiplying ith row of In by k, then


EA is the matrix obtained by multiplying ith row of A by k.
The determinant is therefore given by det(EA) = k det(A). Since
det(E) = k, we have det(EA) = det(E) det(A).
If E is obtained by interchanging of two rows i and j of In ,
then det(E) = − det(I) = −1. In this case, EA is obtained by
interchanging the two rows i and j. Therefore
det(EA) = − det(A) = det(E) det(A).
If E is obtained by adding a multiple of one row of In to an-
other, then det(E) = 1. Also the matrix EA is obtained by the
same operation on A. Thus
det(EA) = det(A) = det(E) det(A).

If E1 , E2 , . . ., Er are elementary matrices of order n, then for any


square matrix A of order n, we have

det(E1 . . . Er A) = det(E1 ) . . . det(Er ) det(A).


56 Chapter 2. Determinants

Theorem 2.3.11
A square matrix A is invertible if and only if det(A) ̸= 0.

P ROOF. Let R be the reduced row-echelon form of the matrix A.


Let E1 , E2 , . . ., Er are elementary matrices corresponding to the
elementary row operations that reduce A to R. Then R = Er . . . E1 A
and therefore det(R) = det(Er ) . . . det(E1 ) det(A). Since the de-
terminant of elementary matrices are nonzero, it follows that
det(A) ̸= 0 if and only if det(R) ̸= 0.
If A is invertible, then the reduced row-echelon form of A is I.
Since det(I) ̸= 0, we have det(A) ̸= 0.
Conversely, let det(A) ̸= 0. Then det(R) ̸= 0. Thus there is
no zero rows in R. Therefore R should be I and therefore A is
invertible.

Theorem 2.3.12
If A and B are two matrices of order n, then
det(AB) = det(A) det(B).

P ROOF. If A is singular, then AB is also singular. In this case,


det(AB) = 0 = det(A) and therefore det(AB) = det(A) det(B).
Assume that A is non-singular. In this case, there exists ele-
mentary matrices E1 , . . . , Er such that A = E1 . . . Er . Hence
det(AB) = det(E1 . . . Er B)
= det(E1 ) . . . det(Er ) det(B)
= det(E1 . . . Er ) det(B)
= det(A) det(B).
2.3. Properties of determinants 57

Example 2.3.8
Consider the matrices
   
1 0 2 2 1 0
A = 2 1 0  and B = 0 2 1  .
0 1 2 1 2 1
Clearly  
4 5 2
AB = 4 4 1 .
2 6 3
Also
det(A) = 1(2 − 0) − 0 + 2(2 − 0) = 6
det(B) = 2(2 − 2) − 1(0 − 1) + 0 = 1
det(AB) = 4(12 − 6) − 5(12 − 2) + 2(24 − 8) = 6.
Therefore we have det(A) det(B) = 6 = det(AB).

Theorem 2.3.13
If the square matrix A is invertible, then
1
det(A−1 ) = .
det(A)

P ROOF. Since A is invertible, det(A) ̸= 0. Also AA−1 = I implies


that det(A) det(A−1 ) = det(AA−1 ) = det(I) = 1 and therefore
1
det(A−1 ) = .
det(A)
Chapter 3
System of Linear Equations

After joining for bachelors degree in Mathematics, I visited my rel-


atives in a village. I told one of the villager that I am now studying
mathematics and the villager exclaimed and asked me the following
question: “There are certain number of flowers in a pond. Certain num-
ber of birds came there and sat one each on the flowers. When they sat
like this, there was no flower left out for a bird. So they decided to sit
two in each flower and in this case, there was no bird to sit on one of
the flowers. So how many flowers were there and how many birds came
there?” I could figure out the answer by some trial and errors at that
time. How will you solve it?
If you let the number of birds as b and the number of flowers as f ,
then the first situation is described by the equation
b = f + 1.
The second situation, 2 birds were sitting on f − 1 flowers and so this
situation is described by the equation
2( f − 1) = b or 2 f − b = 2.
Thus b and f satisfy the equations
b− f = 1 and 2 f − b = 2.
By adding the two equations, we see that f = 3 and therefore, using
the first equation, b = 4. There were three flowers in the pond and four
birds came there!
60 Chapter 3. System of Linear Equations

3.1. S YSTEM OF LINEAR EQUATIONS

A straight line in the xy-plane is


a1 x + a2 y = b
where a1 , a2 and b are real numbers and a1 , a2 are not both zero. An
equation of this form is called a linear equation in the variables x and
y. In general, a linear equation in n variables x1 , x2 , · · · , xn is an equation
of the form
(3.1.1) a1 x1 + a2 x2 + · · · + an xn = b
where a1 , a2 , . . . , an and b are (real) numbers. The variables x1 , x2 , . . . , xn
are sometime called the “unknowns”.
Definition 3.1.1
A solution of the equation (3.1.1) is a sequence of numbers
s1 , s2 , . . . , sn such that the equation (3.1.1) is satisfied when we sub-
stitute x1 = s1 , x2 = s2 , . . . , xn = sn . The set of all solutions of the
equation (3.1.1) is called its solution set or general solution of the
equation (3.1.1).

Example 3.1.1
Consider the linear equation in two variables x and y given by
2x − y = 1.
Let x = t . Then y = 2x − 1 = 2t − 1. Thus the general
solution of the equation 2x − y = 1 is given by
x = t, y = 2t − 1.
The variable t is called a parameter.

Example 3.1.2
The solution of the equation
x1 + x2 − 2x3 = 4
is obtained by taking x2 = t, x3 = s and solving for x1 :
x1 = 4 − x2 + 2x3 = 4 − t − 2s.
3.1. System of linear equations 61

Hence the solution set of the equation x1 + x2 − 2x3 = 4


is
x1 = 4 − t − 2s, x2 = t, x3 = s.
The variables t and s are called parameters.

Example 3.1.3
Find the solution set of the equation 2x + 4y = 11 in integers.
If x and y are integers, then 2x + 4y is an even integer
and it cannot be equal to the odd integer 11. Hence the
equation has no solution in integers.

A system of linear equations (or a linear system) of m equations in


n variables x1 , x2 , . . . , xn is a system of the form

a11 x1 + a12 x2 + · · · + a1n xn = b1 ,


a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
(3.1.2) ..
.
am1 x1 + am2 x2 + · · · + amn xn = bm ,

where ai j and bi (1 ≤ i ≤ m, 1 ≤ j ≤ n) are real numbers. A sequence


of numbers s1 , s2 , . . . , sn is a solution of the linear system (3.1.2) if
x1 = s1 , x2 = s2 , . . . , xn = sn is a solution of every equation in the system
(3.1.2).
Let
     
a11 a12 . . . a1n b1 x1
 a21 a22 . . . a2n   b2   x2 
A = [ai j ] = 
 ... ..  , b= 
 ...  , x= 
 ...  .
. 
.. ..
. .
am1 am2 . . . amn bm xm

Then
  
a11 a12 . . . a1n x1
 a21 a22 . . . a2n   x2 
Ax = 
 ... ..  .
.   .. 
.. ..
. .
am1 am2 . . . amn xm
62 Chapter 3. System of Linear Equations

   
a11 x1 + a12 x2 + · · · + a1n xn b1
 a21 x1 + a22 x2 + · · · + a2n xn   b2 
=
 ..  =  .  = b.
  .. 
.
am1 x1 + am2 x2 + · · · + amn xn bm
The matrix A is called the matrix of the linear system (3.1.2). The linear
system (3.1.2) can be written as Ax = b.
Definition 3.1.2
The system of linear equations (3.1.2) is called consistent if it has
a solution and it is inconsistent if it has no solution.
Let a1 , a2 , . . . , an column vectors. The sum x1 a1 + x2 a2 + · · · + xn an
is called a linear combination of a1 , a2 , . . . , an .
Theorem 3.1.1
The linear system Ax = b is consistent if and only if the vector b is
a linear combination of the column vectors of A.

P ROOF. Let a1 , a2 , . . . , an be the column vectors of A given by


     
a11 a12 a1n
 a21  a  a 
a1 = 
 .. 
.  , a2 =  22
 .. 
.  , · · · , an =  2n 
 ...  .
am1 am2 amn
Then
  
a11 a12 . . . a1n x1
 a21 a22 . . . a2n   x2 
Ax = 
 ... ..  .
.   .. 
.. ..
. .
am1 am2 . . . amn xm
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
=
 .. 

.
am1 x1 + am2 x2 + · · · + amn xn
3.1. System of linear equations 63

     
a11 x1 a12 x2 a1n xn
 a21 x1   a22 x2  a x 
=  +  .  + · · · +  2n. n 
 ..   .. 
.  .. 
am1 x1 am2 x2 amn xn
     
a11 a12 a1n
 a21   a22   a2n 
= x1      
 ...  + x2  ...  + · · · + xn  ... 
am1 am2 amn
= x1 a1 + x2 a2 + · · · + xn an .
Therefore, Ax = b is equivalent to x1 a1 + x2 a2 + · · · + xn an = b
and the result follows from this equation.

Definition 3.1.3
The system of linear equations (3.1.2) of m equations in n variables
is
(1) square if the number of equations is the same as the num-
ber variables: m = n;
(2) underdetermined if there are fewer equations than the
variables: m < n;
(3) overdetermined if there are more equations than the vari-
ables: m > n.

Example 3.1.4
Consider the system of linear equations of two equations in two
variables x and y given by
x + y = 2, 2x + y = 3.
Then x = 1, y = 1 is a solution of the system of linear
equations and the system of equations is consistent.

Example 3.1.5
Consider the system of linear equations of two equations in two
variables x and y given by
x + y = 1, 2x + 2y = 3.
64 Chapter 3. System of Linear Equations

The system can be written in the equivalent form as


x + y = 1, x + y = 3/2.
Clearly no x and y can satisfy both the equations. Thus,
there is no solution to the given linear system and the
system of equations is inconsistent.

Example 3.1.6
If the system of linear equations is given by
x + y = 1, 2x + 2y = 2,
then it is equivalent to the linear system
x + y = 1, x + y = 1.
In this case, it reduces to a single equation x + y = 1 and
x = t , y = 1 − t for any real number t is a solution of the
given linear system. In this case, there are infinite number
of solutions.
Consider a linear system of two equations in the variables x and y
given by

l1 : a1 x + b1 y = c1
l2 : a2 x + b2 y = c2 .

The solutions of this system of linear equations are the points of in-
tersection of the lines l1 and l2 . There are three possibilities (see Fig-
ure ??):

(1) The two lines are parallel but not the same. In this case, there
is no solution to the given linear system.
(2) The two lines intersect at a point. In this case, there is exactly
one solution.
(3) The two lines are same. In this case, all the points on the line
are solutions to the given system and therefore there are infin-
itely many solutions.

The general linear system given by (3.1.2) also has no solution or


unique solution or infinitely many solutions. The linear system (3.1.2)
3.1. System of linear equations 65

y y y

x x x
Fig. 3.1.1: Straight line is represented as linear equation. Linear systems of 2 equa-
tions in 2 variables with no solution, unique solution, and infinite number of solutions

can be represented by a matrix


 
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
 . .. 
 .. ..
. ···
..
. . 
am1 am2 · · · amn bm
This matrix is called the augmented matrix of the linear system (3.1.2).
Example 3.1.7
The augmented matrix for the linear system of three equation in
the variables x, y and z given by
2x + 3y−4z =−3
7x − 8y+9z =10,
−2x +z = 1
is the matrix
 
2 3 −4 −3
 7 −8 9 10 .
−2 0 1 1

Definition 3.1.4
A system of linear equations Ax = b of n equations in n variables
is in strict triangular form if the matrix A is an upper triangular
matrix with non-zero diagonal entries. In other words, a system is
66 Chapter 3. System of Linear Equations

in strict triangular form if it is of the form


a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1 ,
a22 x2 + a23 x3 + · · · + a2n xn = b2 ,
(3.1.3) a33 x3 + · · · + a3n xn = b3 ,
..
.
ann xn = bn ,
where a11 , · · · , ann are all non-zero.
A system of linear equations in strict triangular form is easy to solve.
From the last equation in (??), one solve for xn and using it in previous
equation one get xn−1 . Continuing this way to the first equation, one get
x1 .
Example 3.1.8
The system of linear equations
2x + y + 2z = 2, y + z = 1, z = −1
is in strict triangular form as the matrix of the system
 
2 1 2
0 1 1
0 0 1
is an upper triangular matrix. The last equation gives z =
−1 and using this in the second equation, we get y = 2.
Using y = 2, z = −1 in the first equation, we get x = 1.
Therefore the given system has unique solution given by
x = 1, y = 2, z = −1.

Definition 3.1.5
Two systems of linear equations are called equivalent if they have
the same solution set.
A system of equations obtained by applying the one or more of the
following elementary row operations is equivalent to the given sys-
tem:

(1) Multiplication of an equation by a nonzero constant;


(2) Interchange of two equations;
(3) Addition of a multiple of one equation to another equation.
3.1. System of linear equations 67

By applying the above three operations, a system of linear equations


can be transformed into an equivalent system that is easier to solve.
Example 3.1.9
The system of linear equations
x + y = 3, x + 2y = 5
is equivalent to the system
x + y = 3, y = 2.
The first equation in the second linear system is the
same as the one in the first system. The second equation
is obtained by addition of −1 times the first equation to
the second equation. Using y = 2 in the first equation
x + y = 3, we get x = 1. Thus, x = 1 and y = 2 is the
solution to the given system of equations.

In terms of the augmented matrix, these elementary row operations


becomes the following:
(1) Multiplication of a row by a nonzero constant;
(2) Interchange of two rows;
(3) Addition of a multiple of one row to another row.
Example 3.1.10
Solve the following system of linear equations using elementary
row operations
x +y+z=4,
2x +y−z=1,
x+2y+z=5.
The augmented matrix of the system of linear equa-
tions is  
1 1 1 4
2 1 −1 1 .
1 2 1 5
By subtracting 2 times the first row from the second row
and subtracting the first row from third row, we obtain
 
1 1 1 4
0 −1 −3 −7 .
0 1 0 1
68 Chapter 3. System of Linear Equations

By adding second row to the third row, we now obtain


 
1 1 1 4
0 −1 −3 −7 .
0 0 −3 −6
The linear system corresponding to this augmented ma-
trix is
x + y + z = 4, −y − 3z = −7, −3z = −6.
Solving the last equation, we get z = 2 and using this
in −y − 3z = −7, we get y = 1. Using y = 1, z = 2 in
x + y + z = 4, we get x = 1. Therefore the solution is
x = 1, y = 1, z = 2.

3.2. R OW- ECHELON AND REDUCED ROW- ECHELON FORM

Let us begin by solving

x + y = 2, y + z = 2, x + z = 2.

The augmented matrix of this linear system is


 
1 1 0 2
0 1 1 2 .
1 0 1 2

The notation Ri ←− α Ri is used to indicate the row operation of


multiplying ith row by the nonzero constant α , the notation Ri ↔ R j is
used for the row operation of interchanging ith row with jth row while
the notation R j ←− R j + α Ri is used for the row operation of adding α
times ith row to the jth row.
By a sequence of elementary row operation, we obtain
   
1 1 0 2 1 1 0 2
0 1 1 2  ∼ 0 1 1 2 R3 ←− R3 − R1
1 0 1 2 0 −1 1 0
 
1 1 0 2
∼ 0 1 1 2 R3 ←− R3 + R2
0 0 2 2
3.2. Row-echelon and reduced row-echelon form 69
 
1 1 0 2
∼ 0 1 1 2  R3 ←− 1/2R3
0 0 1 1

and the linear system corresponding to the augmented matrix in the last
step is
x + y = 2, y + z = 2, 2z = 2.

The last equation gives z = 1. Using it in the second equation gives


y = 1. Using y = 1 in the first equation, we get x = 1. Thus the solution
to the given linear system is x = y = z = 1. The matrix of the form
 
1 1 0 2
0 1 1 2
0 0 1 1

are called row echelon form.


Definition 3.2.1
A matrix A is in row echelon form if A has the following properties:
(1) If a row does not consist entirely of zeros, then the first
nonzero number in the row is 1. This is called a leading 1.
(2) If there are any rows that consists entirely of zeros, then
they are grouped together at the bottom of the matrix.
(3) In any two successive rows that do not consist entirely of
zeros, the leading 1 in the lower row occurs farther to the
right than the leading 1 in the higher row.

Example 3.2.1
The matrices
 
    0 1 ∗ ∗ ∗
1 ∗ ∗ 1 ∗ ∗  0 0 1 ∗ ∗
0 0 1 , 0 1 ∗ , 0 0 0 0 1
 
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0
are in row-echelon form. In these examples, ∗ can be any
number.
70 Chapter 3. System of Linear Equations

If we apply some more elementary as indicated, we get


   
1 1 0 2 1 1 0 2
0 1 1 2 ∼ 0 1 1 2
1 0 1 2 0 0 1 1
 
1 1 0 2
∼ 0 1 0 1 R2 ←− R2 − R3
0 0 1 1
 
1 0 0 1
∼ 0 1 0 1 R1 ←− R1 − R2
0 0 1 1
and the linear system corresponding to the augmented matrix in the last
step is
x = 1, y = 1, z = 1.
The solution to the system is again x = y = z = 1. A matrix of the form
 
1 0 0 1
0 1 0 1
0 0 1 1
is said to be in reduced row-echelon form.
Definition 3.2.2
A matrix A is in reduced row-echelon form if A has the following
properties:
(1) It is in row echelon form;
(2) Each column that contain a leading 1 has zeros everywhere
else in that column.

Example 3.2.2
The matrices
 
  0 1 0 ∗ 0  
1 ∗ ∗ 0 ∗
∗ ∗ 0 ∗  0 0 1 ∗ 0 0
0 0 1 ∗ , 0 0 0 1 ∗
 0 0 0 1 
 , 0

0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
are in reduced row-echelon form. In these examples, ∗
can be any number.
3.3. Gauss/Gauss-Jordan elimination methods 71

If the augmented matrix of a system of linear equations is put in


the reduced row-echelon form by using a sequence of elementary row
transformation, then the solution of the system can be easily obtained.
Example 3.2.3
If the augmented matrix of a linear system (in the variables
x1 , x2 . . . , x5 ) after transforming to reduced row-echelon
form is  
1 6 0 0 4 −2
0 0 1 0 3 1
 
0 0 0 1 5 2
0 0 0 0 0 0
then the linear system corresponding to this matrix is
x1 + 6x2 + 4x5 = −2
x3 + 3x5 = 1
x4 + 5x5 = 2.
The last row of the augmented matrix gives 0 = 0. By
taking x5 = t , we see that x3 = 1 − 3t , x4 = 2 − 5t and
x1 + 6x2 = −2 − 4t . By taking x2 = s, we have x1 = −2 −
4t − 6s. Thus the solution set is
x1 = −2 − 4t − 6s,
x2 = s
x3 = 1 − 3t,
x4 = 2 − 5t,
x5 = t.

3.3. G AUSS /G AUSS -J ORDAN ELIMINATION METHODS

Example 3.3.1
Solve the system of equations
− 2x3 + 7x5 = 12
2x1 + 4x2 − 10x3 + 6x4 + 12x5 = 28
2x1 + 4x2 − 5x3 + 6x4 − 5x5 = −1.
72 Chapter 3. System of Linear Equations

The augmented matrix of the given system of linear


equations is given by
 
0 0 −2 0 7 12
2 4 −10 6 12 28 .
2 4 −5 6 −5 −1
Step 1. Locate the leftmost column that does not con-
sists of entirely of zero.
The first column has this property in this case.
Step 2. Interchange the top row with another row,
if necessary, to bring a nonzero entry to the top of the
column found in Step 1.
In this case, exchange the first and second rows in the
preceding matrix.
 
2 4 −10 6 12 28
0 0 −2 0 7 12 .
2 4 −5 6 −5 −1
Step 3. If the entry that is now at the top of the col-
umn found in Step 1 is a, multiply the first row by 1/a in
order to introduce a leading 1.
In this case, multiply the first row by 1/2.
 
1 2 −5 3 6 14
0 0 −2 0 7 12 .
2 4 −5 6 −5 −1
Step 4. Add a suitable multiples of the top row to the
rows below so that all entries below the leading 1 become
zeros.
Add -2 times the first row of the preceding matrix to
the third row.
 
1 2 −5 3 6 14
0 0 −2 0 7 12 .
0 0 5 0 −17 −29
Step 5. Now cover the top row in the matrix and
begin again with Step 1 applied to the submatrix that re-
mains. Continue in this way until the entire matrix is in
row-echelon form.
3.3. Gauss/Gauss-Jordan elimination methods 73

The third column in the submatrix is a column that is


not entirely of zero. The top entry in the column is -2 and
therefore multiply this row by -1/2.
 
1 2 −5 3 6 14
0 0 1 0 −7/2 −6 .
0 0 5 0 −17 −29
By adding -5 times the first row in the submatrix to
the second row in the submatrix, we get
 
1 2 −5 3 6 14
0 0 1 0 −7/2 −6 .
0 0 0 0 1/2 1
Now we cover the the first and second row of the ma-
trix and begin again with Step 1. There is only one row
remaining. The first column that is not entirely of zero is
5th column. The top entry in this column is 1/2 and to
make this as a leading 1, we multiply by 2.
 
1 2 −5 3 6 14
0 0 1 0 −7/2 −6 .
0 0 0 0 1 2
This matrix above is in row-echelon form.
Step 6. Beginning with the last nonzero row and
working upward, add suitable multiples of each row to
the rows above to introduce zeros above the leading 1’s.
Multiply the last row by 7/2 and add it to the second
row to introduce zero above the leading 1 in the last row.
Similarly multiply the last row by -6 and add it to the first
row.
 
1 2 −5 3 0 2
0 0 1 0 0 1 .
0 0 0 0 1 2
Now the entry above the leading 1 in the second row
should be made zero. Add 5 times the second row to the
first row.
74 Chapter 3. System of Linear Equations

 
1 2 0 3 0 7
0 0 1 0 0 1 .
0 0 0 0 1 2
The last matrix is in reduced row-echelon form.
The linear system corresponding to the augmented
matrix above is
x1 + 2x2 + 3x4 =7
x3 =1
x5 = 2.
Hence the solution set is given by
x1 = 7 − 2r − 3s,
x2 = r,
x3 = 1,
x4 = s,
x5 = 2.
This method of obtaining the solution by reducing the
augmented matrix of the given linear system to reduced
row-echelon form is called the Gauss-Jordan elimina-
tion method.
The linear system corresponding to the augmented
matrix found in Step 5 is
x1 + 2x2 − 5x3 + 3x4 + 6x5 = 14
x3 − 7/2x5 = −6
x5 = 2.
From the last equation we have x5 = 2 and using it in the
middle equation we get
x3 = −6 + 7/2x5 = −6 + 7 = 1.
Using these values in the first equation we get
x1 + 2x2 + 3x4 = 14 + 5 − 12 = 7.
In this way also, we see that the solution set is given by
x1 = 7 − 2r − 3s,
x2 = r,
3.3. Gauss/Gauss-Jordan elimination methods 75

x3 = 1,
x4 = s,
x5 = 2.
This method of obtaining the solution by reducing the
augmented matrix of the given linear system to row-
echelon form is called the Gauss elimination method.

Example 3.3.2
Solve the system of equations
x3 + x4 + x5 = 3
x1 + x2 + x3 − x4 − x5 = 1
2x1 − x2 − x4 + 2x5 = 2.
The augmented matrix of the given system of linear
equations is given by
 
0 0 1 1 1 3
1 1 1 −1 −1 1 .
2 −1 0 −1 2 2
Step 1. Locate the leftmost column that does not con-
sists of entirely of zero. The first column has this property
in this case.
Step 2. Interchange the top row with another row,
if necessary, to bring a nonzero entry to the top of the
column found in Step 1.
In this case, exchange the first and second rows in the
preceding matrix.
 
1 1 1 −1 −1 1
0 0 1 1 1 3 .
2 −1 0 −1 2 2
Step 3. If the entry that is now at the top of the col-
umn found in Step 1 is a, multiply the first row by 1/a in
order to introduce a leading 1.
In this case, the entry at the top of the column is 1.
Step 4. Add a suitable multiples of the top row to the
rows below so that all entries below the leading 1 become
zeros.
76 Chapter 3. System of Linear Equations

Add -2 times the first row of the preceding matrix to


the third row.
 
1 1 1 −1 −1 1
0 0 1 1 1 3 .
0 −3 −2 1 4 0
Step 5. Now cover the top row in the matrix and
begin again with Step 1 applied to the submatrix that re-
mains. Continue in this way until the entire matrix is in
row-echelon form.
The second column in the submatrix is a column that
is not entirely of zero. The top entry in the column is 0
and therefore we interchange the second and third row the
matrix.
 
1 1 1 −1 −1 1
0 −3 −2 1 4 0 .
0 0 1 1 1 3
Since the top row entry is −3, multiply the second row
by -1/3.
 
1 1 1 −1 −1 1
0 1 2 − 1 − 4 0 .
3 3 3
0 0 1 1 1 3
Now we cover the the first and second row of the ma-
trix and begin again with Step 1. There is only one row
remaining. The first column that is not entirely of zero is
3rd column. The top entry in this column is -1 and this is
a leading 1.
Thus the matrix above is in row-echelon form.
Step 6. Beginning with the last nonzero row and
working upward, add suitable multiples of each row to
the rows above to introduce zeros above the leading 1’s.
Multiply the last row by -2/3 and add it to the second
row to introduce zero above the leading 1 in the last row.
Similarly add the last row to the first row.
 
1 1 0 −2 −2 −2
0 1 0 −1 −2 −2 .
0 0 1 1 1 3
3.3. Gauss/Gauss-Jordan elimination methods 77

Now the entry above the leading 1 in the second row


should be made zero. Add -1 times the second row to the
first row.
 
1 0 0 −1 0 0
0 1 0 −1 −2 −2 .
0 0 1 1 1 3
The last matrix is in reduced row-echelon form.
The linear system corresponding to the augmented
matrix above is
x1 − x4 =0
x2 − x4 − 2x5 = −2
x3 + x4 + x5 = 3.
Hence the solution set is given by
x1 = r,
x2 = r + 2s − 2,
x3 = 3 − r − s,
x4 = r,
x5 = s.
This solution is obtained by Gauss-Jordan elimina-
tion method.
The linear system corresponding to the augmented
matrix found in Step 5 is
x1 + x2 + x3 − x4 − x5 = 1
2 1 4
x2 + x3 − x4 − x5 = 0
3 3 3
x3 + x4 +x5 = 3.
By taking x4 = r, x5 = s, we see that
x3 = 3 − x4 − x5 = 3 − r − s
and therefore
2 1 4
x2 = − x3 + x4 + x5
3 3 3
1
= [(−6 + 2r + 2s) + r + 4s]
3
78 Chapter 3. System of Linear Equations

1
= [3r + 6s − 6] = r + 2s − 2.
3
Using the values of x2 , x3 , x4 and x5 in the first equation,
we get
x1 + (r + 2s − 2) − 2r − 2s = −2
or
x1 − r − 2 = −2 or x1 = r.
In this way also, we see that the solution set is given by
x1 = r,
x2 = r + 2s − 2,
x3 = 3 − r − s,
x4 = r,
x5 = s.
This solution is obtained by Gauss elimination method.

A system of linear equations is homogeneous if the constant terms


are all zero, that is, the system has the form
a11 x1 + a12 x2 + · · · + a1n xn =0
a21 x1 + a22 x2 + · · · + a2n xn =0
···
am1 x1 + am2 x2 + · · · + amn xn =0.
Clearly
x1 = x2 = x3 = · · · = xn = 0
is a solution to the homogeneous linear system. It is called a trivial
solution. Therefore, a homogenous linear system is always consistent.
Any other solutions are called non-trivial solutions. The interesting
question for the homogeneous system is the existence of non-trivial so-
lutions.
Example 3.3.3
Solve the system of equations
x1 + x2 − 3x3 =0
+ x2 − x3 + x4 = 0
x1 + x2 + x3 + 4x4 = 0
3.3. Gauss/Gauss-Jordan elimination methods 79

by using Gauss/Guass-Jordan elimination methods.


The augmented matrix of the given system of homo-
geneous linear equations is given by
 
1 1 −3 0 0
0 1 −1 1 0 .
1 1 1 4 0
Step 1. Locate the leftmost column that does not con-
sists of entirely of zero. The first column has this property
in this case.
Step 2. Interchange the top row with another row,
if necessary, to bring a nonzero entry to the top of the
column found in Step 1.
In this case, the top of the first column has nonzero
entry.
Step 3. If the entry that is now at the top of the col-
umn found in Step 1 is a, multiply the first row by 1/a in
order to introduce a leading 1.
In this case, the entry at the top of the column is 1.
Step 4. Add a suitable multiples of the top row to the
rows below so that all entries below the leading 1 become
zeros.
Add -1 times the first row of the preceding matrix to
the third row.
 
1 1 −3 0 0
0 1 −1 1 0 .
0 0 4 4 0
Step 5. Now cover the top row in the matrix and
begin again with Step 1 applied to the submatrix that re-
mains. Continue in this way until the entire matrix is in
row-echelon form.
The second column in the submatrix is a column that
is not entirely of zero. The top entry in the column is 1.
We can therefore cover up the first and second row
and proceed to Step 1. In this case the 3rd column is a
nonzero column. The top entry is also nonzero. To make
this entry a leading 1, we multiply the preceding matrix
by 1/2.
80 Chapter 3. System of Linear Equations

 
1 1 −3 0 0
0 1 −1 1 0 .
0 0 1 1 0
Thus the matrix above is in row-echelon form.
Step 6. Beginning with the last nonzero row and
working upward, add suitable multiples of each row to
the rows above to introduce zeros above the leading 1’s.
Add the third row to the second row and 3 times third
row to first row in order to introduce zero above the lead-
ing 1 in the last row.
 
1 1 0 3 0
0 1 0 2 0 .
0 0 1 1 0
Now the entry above the leading 1 in the second row
should be made zero. Add -1 times the second row to the
first row.
 
1 0 0 1 0
0 1 0 2 0 .
0 0 1 1 0
The last matrix is in reduced row-echelon form.
The linear system corresponding to the augmented
matrix above is
x1 + x4 = 0
x2 + 2x4 = 0
x3 + x4 = 0
By setting x4 = −t , we see that the solution set is given
by
x1 = t, x2 = 2t, x3 = t, x4 = −t.
This solution is obtained by Gauss-Jordan elimina-
tion method.
The linear system corresponding to the augmented
matrix found in Step 5 is
x1 + x2 + 3x3 =0
x2 − x3 + x4 = 0
3.4. Elementary row operations 81

x3 + x4 = 0
By taking x4 = −t , x3 = t . Also
x2 = x3 − x4 = t + t = 2t
and
x1 = 3x3 − x2 = 3t − 2t = t.
Therefore the solution set is again given by
x1 = t, x2 = 2t, x3 = t, x4 = −t.
This solution is obtained by Gauss elimination method.

Theorem 3.3.1
A homogeneous linear system with more unknowns than the equa-
tions has infinitely many solutions.

Theorem 3.3.2
If R is the reduced row-echelon form of an n × n matrix A, then
either R has a row of zeros or R is the identity matrix.

P ROOF. Let the reduced row-echelon form of an n × n matrix A


is given by  
r11 r12 . . . r1n
r21 r22 . . . r2n 
R= 
. . . . . . . . . . . . . . . . .
rn1 rn2 . . . r pn
Either the last row in this matrix consists of zeros, or it does not.
If there is no such zero row, then the matrix has no zero rows and
every row has a leading 1. As there are exactly n rows, and the
leading 1s in the matrix occur to right of a previous leading 1, we
must have all the leading 1s in the main diagonal. Thus the matrix
is the identity matrix.

3.4. E LEMENTARY ROW OPERATIONS


82 Chapter 3. System of Linear Equations

Definition 3.4.1
A matrix of order n is called an elementary matrix if it can be
obtained from the identity matrix of order n by a single elementary
row operation.

Example 3.4.1
The matrices  
  1 0 0
0 1
, 0 1 0 ,
1 0
0 0 1
   
1 1 0 1 0 0
0 1 0 , 0 2 0
0 0 1 0 0 1
are all elementary matrices. The first matrix is obtained
from I2 by interchanging their rows. The second matrix,
the identity matrix of order 3, is obtained by multiplying
the first row of I3 by 1. The third matrix was obtained
from I3 by adding second row to the first row while the
last matrix is obtained by multiplying the second row of
I3 by 2.

Theorem 3.4.1
If E is an elementary matrix obtained by performing a elementary
operation on Im , and A is any m × n matrix, the matrix EA is ob-
tained from A by performing the same elementary row operation
on A.

P ROOF. The following are the elementary row operations:


(1) Multiplication of a row through a nonzero constant.
(2) Interchange two rows.
(3) Addition of a multiple of one row to another.
3.4. Elementary row operations 83

Let E be a elementary matrix of order m obtained from Im by


multiplying its 3rd row by a constant λ :
 
1 0 0 ... 0
0 1 0 . . . 0
 
E = 0 0 λ . . . 0 .
. . . . .
 .. .. .. .. .. 
0 0 0 ... 1
Then for any matrix
 
a11 a12 a13 . . . a1n
 a21 a22 a23 . . . a2n 
 
A= a
 31
a32 a33 . . . a3n  ,
 ... .. .. .. .. 
. . . . 
am1 am2 am3 . . . amn
we have
  
1 0 0 ... 0 a11 a12 a13 . . . a1n
0 1 0 ... 0  a21 a22 a23 . . . a2n 
  
EA =  0 0 λ ... 0  a31 a32 a33 . . . a3n 
. .. .. .. ..   . .. .. .. .. 
 .. . . . .   .. . . . . 
0 0 0 ... 1 am1 am2 am3 . . . amn
 
a11 a12 a13 ... a1n
 a21 a22 a23 ... a2n 
 

= λ a λ a λ a33 ... λ a3n  .
.. 
31 32
 ... . .. ..
.. . . . 
am1 am2 am3 ... amn
Thus the matrix EA is obtained from A by multiplying its 3rd row
by a constant λ .
Similarly we can verify our theorem for the row operations of
interchanging two rows. We verify this in the case of exchang-
ing the first and second row. In this case, the elementary matrix
obtained from Im by exchanging the first and second row is given
84 Chapter 3. System of Linear Equations

by:  
0 1 0 ... 0
1 0 0 . . . 0
 
E =0
.
0 1 . . . 0 .
 .. .. .. .. .. 
. . . .
0 0 0 ... 1
Then, for any matrix A, we have
  
0 1 0 ... 0 a11 a12 a13 . . . a1n
1 0 0 . . . 0  a21 a22 a23 . . . a2n 
  
EA =  0 0 1 . . . 0  a31 a32 a33 . . .
. . . . . .
a3n 
 .. .. .. .. ..   .. .. .. .. .. 
. . . . 
0 0 0 ... 1 am1 am2 am3 . . . amn
 
a21 a22 a23 . . . a2n
 a11 a12 a13 . . . a1n 
 
= a31 a32 a33 . . . a3n 
 .
 .. .. .. .. .. 
. . . . 
am1 am2 am3 . . . amn
.
Thus the matrix EA is obtained from A by exchanging the first
and second row of A.
We now very the theorem for the third row operation of addi-
tion of a multiple of one row to another. In this case, we take the
particular row operation R1 = R1 + α R3 . In this case, the elemen-
tary matrix obtained from Im by the row operation R1 = R1 + α R3
is given by:  
1 0 α ... 0
0 1 0 . . . 0
 
E = 0 0 1 . . . 0
. . . . ..
 .. .. .. .. .. 
0 0 0 ... 1
3.4. Elementary row operations 85

Then, for any matrix A, we have


  
1 0 α ... 0 a11 a12 a13 . . . a1n
0 1 0 . . . 0  a21 a22 a23 . . . a2n 
  
EA =  0 0 1 . . . 0  a
 . . . . .   31
a32 a33 . . . a3n 
 .. .. .. .. ..   ... .. .. .. .. 
. . . . 
0 0 0 ... 1 am1 am2 am3 . . . amn
 
a11 + α a31 a12 + α a32 a13 + α a33 . . . a1n + α a3n
 a21 a22 a23 ... a2n 
 
=  a a32 a33 ... a3n .
31
. .. .. .. .. 
 .. . . . . 
am1 am2 am3 ... amn
This is the same the matrix obtained from A by using the row
operation R1 = R1 + α R3 .

Theorem 3.4.2
If E is the elementary matrix obtained from I by using any one of
the row operation Ri ↔ R j , Ri ← α Ri or Ri ← Ri + α R j , then AE
is the matrix obtained from A by performing the column operations
Ci ↔ C j , Ci ← α Ci or C j ← C j + α Ci respectively.

Example 3.4.2
Let E be the elementary matrix obtained from I3 by adding the
second row to its first row. Let A be given by
   
a b c d 1 1 0
A = e f g h , E = 0 1 0 .
i j k l 0 0 1
The product EA is given by
  
1 1 0 a b c d
EA = 0 1 0 e f g h
0 0 1 i j k l
 
a+e b+ f c+g d +h
 e f g h 
i j k l
86 Chapter 3. System of Linear Equations

and it is the same as the matrix obtained from A by adding


the second row to its first row.
Similarly, the product AE is given by
  
a b c d 1 1 0
AE = e f g h 0 1 0
i j k l 0 0 1
 
a a+b c d
e e + f g h
i i+ j k l
and it is the same as the matrix obtained from A by adding
the first column to to its second column.
Corresponding to the elementary row operations applied to a matrix
A to get A′ , we have inverse operations that produces A from A′ .
Consider the elementary operations:
(1) Multiplication of ith row through a nonzero constant λ .
(2) Interchange two rows i and j.
(3) Addition of λ times ith row to jth row.
The inverse operations corresponding to them are given below:
(1) Multiplication of the ith row by 1/λ .
(2) Interchange the two rows i and j.
(3) Addition of −λ times ith row to jth row.

Example 3.4.3
Let A be the matrix given by
 
1 2 −1
A = 2 3 4  .
3 2 1
The matrix B below is obtained by adding 2 times the first
row to the third row:
 
1 2 −1
B = 2 3 4  .
5 6 −1
By adding −2 times the first row to the third row in
the matrix B, we get the matrix A back.
3.4. Elementary row operations 87

Theorem 3.4.3
Every elementary matrix is invertible and the inverse is also ele-
mentary matrix.

P ROOF. Let E be the elementary matrix obtained from I by a row


operation and F be the matrix obtained from I by the correspond-
ing inverse operation. Then by using Theorem ??, we see that
EF = FE = I. This shows that E is invertible. Since F is ob-
tained from I by a single elementary row operation, it is also an
elementary matrix.

Theorem 3.4.4
If A is a square matrix of order n, the following statements are all
equivalent.
(1) A is invertible.
(2) Ax = 0 has only the trivial solution.
(3) The reduced row-echelon form of A is In .
(4) A is a product of elementary matrices.

P ROOF. We will prove this by showing (1) ⇒ (2) ⇒ (3) ⇒ (4) ⇒


(1).
(1) ⇒ (2). Assume that A is invertible. By multiplying Ax = 0
by A−1 , we get A−1 Ax = A−1 0 or Ix = 0 or x = 0. Thus Ax = 0
has only the trivial solution.
(2) ⇒ (3). Assume that Ax = 0 has only the trivial solu-
tion. Then the reduced row-echelon form of [A|0] obtained by
the Gauss-Jordan elimination method should be the matrix [In |0]
and therefore the reduced row-echelon form of A is the identity
matrix.
(3) ⇒ (4). Assume that the reduced row-echelon form of A
is In . Then In is obtained from A by a sequence of elementary
row operations. Thus there are elementary matrices E1 , E2 , · · · , Ek
such that
Ek Ek−1 · · · E2 E1 A = In .
88 Chapter 3. System of Linear Equations

Since elementary matrices are invertible and their inverses are


also elementary, we have
A = E1−1 E2−1 · · · Ek−1 .
Thus A is a product of elementary matrices.
(4) ⇒ (1). Assume that A is a product of elementary matrices.
Since elementary matrices are invertible and product of invertible
matrices is again invertible, it follows that A is invertible.

3.4.1. Inverses by Row Operations


Let A be invertible matrix. Since a matrix is invertible if and only if it
is product of elementary matrices, let
A = E1 E2 · · · Ek .
Then we have
Ek−1 Ek−1
−1
· · · E1−1 A = I
and also
Ek−1 Ek−1
−1
· · · E1−1 I = A−1 .
Thus if we apply certain elementary row operation to get identity matrix
from the matrix A, the same set of operation produces A−1 from I. Thus
to find the inverse of a matrix, we apply elementary row operations to
the matrix [A|I] to reduce it to the form [I|B]. Then B = A−1 .
Example 3.4.4
Let A be the matrix given by
 
1 2 3
A = 2 1 4  .
3 4 1
Now consider the matrix [A|I] given by
 
1 2 3 1 0 0
2 1 4 0 1 0 .
3 4 1 0 0 1
By using R2 ← R2 − 2R1 and R3 ← R3 − 3R1 , we get
 
1 2 3 1 0 0
0 −3 −2 −2 1 0 .
0 −2 −8 −3 0 1
3.4. Elementary row operations 89

By using R2 ← (−1/3)R2 , we get


 
1 2 3 1 0 0
0 1 2/3 2/3 −1/3 0 .
0 −2 −8 −3 0 1
By using R3 ← R3 + 2R2 , we get
 
1 2 3 1 0 0
0 1 2/3 2/3 −1/3 0 .
0 0 −20/3 −5/3 −2/3 1
By using R3 ← (−3/20)R3 , we get
 
1 2 3 1 0 0
0 1 2/3 2/3 −1/3 0 .
0 0 1 1/4 1/10 −3/20
By using R1 ← R1 − 3R3 , R2 ← R2 − (2/3)R3 , we get
 
1 2 0 1/4 −3/10 9/20
0 1 0 1/2 −2/5 1/10  .
0 0 1 1/4 1/10 −3/20
Finally by using R1 ← R1 − 2R2 , we get
 
1 0 0 −3/4 1/2 1/4
0 1 0 1/2 −2/5 1/10  .
0 0 1 1/4 1/10 −3/20
Therefore
 
−3/4 1/2 1/4
A−1 =  1/2 −2/5 1/10  .
1/4 1/10 −3/20

Example 3.4.5
The inverse of  
1 1 2
1 2 1
2 1 1
is given by  
−1 −1 3
1
−1 3 −1 .
4 3 −1 −1
90 Chapter 3. System of Linear Equations

Theorem 3.4.5
Every linear system Ax = b has no solution, or has exactly one
solution, or infinitely many solutions.

P ROOF. It is enough to show that if there are more than one so-
lution, then there are infinitely many solutions. So let x1 and x2
be two different solutions of Ax = b. Then we have Ax1 = b and
Ax2 = b. Then for any real number λ , we see that
A(x1 + λ (x2 − x1 )) = Ax1 + λ (Ax2 − Ax1 )
= b + λ (b − b)
= b.
Thus the vector x1 + λ (x2 −x1 ) is also a solution of Ax = b. Since
x1 ̸= x2 , it follows that the solutions x1 + λ (x2 − x1 ) are all dis-
tinct. This completes the proof.

Theorem 3.4.6
If A is invertible, then the linear system Ax = b has exactly one
solution.

P ROOF. Since A is invertible, by multiplying Ax = b by A−1 we


get
A−1 (Ax) = A−1 b,
and by using associative law, we have
(A−1 A)x = A−1 b.
Since A−1 A = I and Ix = x, we obtain
x = A−1 b.
This shows that the system Ax = b has exactly one solution given
by x = A−1 b.

Example 3.4.6
3.4. Elementary row operations 91

Given a complex number z = a + ib, the multiplicative inverse of


z is given by
1 a b
= 2 2
− 2 .
z a +b a + b2
This can be proved as follows. Let 1/z = c + id . Then
(a + ib)(c + id) = 1
or, by comparing the real and imaginary parts,
ac − bd = −1 and ad + bc = 0.
This can be written in matrix form as
    
a −b c 1
= .
b a d 0
Therefore
   −1       a 
c a −b 1 1 a b 1 2 2
= = 2 = a +bb
d b a 0 a + b −b a
2 0 − a2 +b2

Example 3.4.7
The condition for three points z, z1 , z2 to be on the same straight
line in the complex is

z z 1

z1 z1 1 = 0.

z2 z2 1
The general form of the equation of a straight line is
Az + Az + c = 0 where A ̸= 0 and c is real constant. If the
points z, z1 , z2 are on the same straight line, then
Az + Az + c = 0, Az1 + Az1 + c = 0, Az2 + Az2 + c = 0
and this can be written in the matrix form as
  
z z 1 A  
z1 z1 1 A = 0 .
0
z2 z2 1 c
If the matrix is non-singular, then A = 0, c = 0. So the
matrix should be singular. This proves the statement.
92 Chapter 3. System of Linear Equations

3.5. T RIANGULAR FACTORIZATION

Let A be a square matrix. If A can be reduced to upper triangular ma-


trix U (with diagonal elements not necessarily 1s) by Gauss elimination
without any row exchange, then U is obtained from A by a sequence of
elementary matrices E1 , E2 , · · · , En corresponding to the row operation
of adding a multiple of one row to another row:
U = En · · · E1 A.
The matrices E1 , E2 , · · · , En are lower triangular and their inverses are
also lower triangular. Hence
A = (En · · · E1 )−1U = (E1−1 · · · En−1 )U = LU.

Theorem 3.5.1
If a square matrix A can be reduced to upper triangular matrix using
Gauss elimination without row exchange, then A = LU where L is
a lower triangular matrix with all diagonal entries equal to 1 and U
is an upper triangular matrix.

Example 3.5.1
Let  
0 1
A= .
1 0
If    
1 0 a b
L= , U= ,
c 1 0 d
then  
a b
LU = .
ac bc + d
Therefore LU = A can hold if and only if a = 0 and in this case, all
the entries in the first column of LU are zero and therefore it cannot
be equal to A. Thus, the matrix A cannot have LU decomposition.
Since the matrix A is non-singular, we conclude that not every non-
singular matrix can have LU decomposition.
3.5. Triangular factorization 93

Example 3.5.2
If a ̸= 0, then Let
    
a b 1 0 a b
= .
c d c/a 1 0 (ad − bc)/a

Example 3.5.3
If A = [ai j ] is a non-singular square matrix of order n with a11 = 0,
then he matrix A cannot have LU decomposition.
If L = [li j ] is a lower triangular matrix with lii = 1 and U = [u jk ]
is an upper triangular matrix, then a11 = u11 and so u11 = 0. Since
the determinant of a triangular matrix is the product of diagonal
elements, det(U) = u11 · · · unn = 0 and therefore the matrix U is
singular. Therefore det(A) = det LU = det L detU = 0 and hence A
is singular.
If A has LU decomposition A = LU, then there are elementary ma-
trices E1 , · · · , En such that
L = (E1−1 · · · En−1 ), U = En · · · E1 A.
Hence LU decomposition can be obtained from A = IA by applying the
sequence of elementary row and column operations as below:
A = IA
= (IE1−1 )(E1 A)
= (IE1−1 E2−1 )(E2 E1 A)
..
.
= (IE1−1 · · · En−1 )(En · · · E1 A)
= LU.
This method is illustrated in the following examples.
Example 3.5.4
Let  
1 1 0
A = 1 2 1  .
2 1 1
94 Chapter 3. System of Linear Equations

Write   
1 0 0 1 1 0
A = 0 1 0  1 2 1 
0 0 1 2 1 1
Reduce the second matrix by using the row operations R2 = R2 − R1
and R3 = R3 − 2R1 and the first matrix by the corresponding inverse
column operations C1 = C1 +C2 and C1 = C1 + 2C3 to get
  
1 0 0 1 1 0
A = 1 1 0 0 1 1
2 0 1 0 −1 1
Now apply the row operation R3 = R3 + R2 to the second matrix
and the inverse column operation C2 = C2 −C3 to the first matrix to
get   
1 0 0 1 1 0
A = 1 1 0 0 1 1
2 −1 1 0 0 2

Example 3.5.5
Let  
2 1 3
A = 4 3 7  .
6 7 17
Write   
1 0 0 2 1 3
A = 0 1 0  4 3 7  .
0 0 1 6 7 17
Reduce the second matrix by using the row operations R2 = R2 −
2R1 and R3 = R3 − 3R1 and the first matrix by the corresponding
inverse column operations C1 = C1 + 2C2 and C1 = C1 + 3C3 to get
  
1 0 0 2 1 3
A = 2 1 0  0 1 1  .
3 0 1 0 4 8
Now apply the row operation R3 = R3 − 4R2 to the second matrix
and the inverse column operation C2 = C2 + 4C3 to the first matrix
3.5. Triangular factorization 95

to get   
1 0 0 2 1 3
A = 2 1 0  0 1 1  .
3 4 1 0 0 4

Example 3.5.6
Let  
1 1 0 0
1 2 1 1
A=
2
.
1 1 0
1 0 1 1
Write   
1 0 0 0 1 1 0 0
0 1 0 0  1 2 1 1 
A= 
0 0 1 0  2 1 1 0 

0 0 0 1 1 0 1 1
Reduce the second matrix by using the row operations R2 = R2 −
R1 , R3 = R3 − 2R1 , R4 = R4 − R1 and the first matrix by the corre-
sponding inverse column operations C1 = C1 +C2 , C1 = C1 + 2C3 ,
C1 = C1 +C4 to get
  
1 0 0 0 1 1 0 0
1 1 0 0 0 1 1 1
A= 
2 0 1 0 0 −1 1 0 .

1 0 0 1 0 −1 1 1
Now apply the row operation R3 = R3 + R2 , R4 = R4 + R2 to the
second matrix and the inverse column operation C2 = C2 −C3 , C2 =
C2 −C4 to the first matrix to get
  
1 0 0 0 1 1 0 0
1 1 0 0  
A=  0 1 1 1
2 −1 1 0 0 0 2 1 .
1 −1 0 1 0 0 2 2
Finally apply the row operation R4 = R4 − R3 to the second matrix
and the inverse column operation C3 = C3 +C4 to the first matrix to
96 Chapter 3. System of Linear Equations

get   
1 0 0 0 1 1 0 0
1 1 0  
0 0 1 1 1
A=
2 −1 1
.
0 0 0 2 1
1 −1 1 1 0 0 0 1

If A has LU decomposition A = LU, then the linear system Ax = b


can be solved by solving the two triangular system of linear equations:
Ly = b and then Ux = y.
Example 3.5.7
Solve: x1 + 2x2 + x3 = 0, 2x1 + x2 + x3 = 1, −x1 + x2 − 2x3 = 1. The
system of linear equation in matrix form is Ax = b where
     
1 2 1 x1 0
A=  2 1 1 , x = x2 , b = 1 .
   
−1 1 −2 x3 1
Write   
1 0 0 1 2 1
A = 0 1 0   2 1 1 .
0 0 1 −1 1 −2
Reduce the second matrix by using the row operations R2 = R2 −
2R1 and R3 = R3 + R1 and the first matrix by the corresponding
inverse column operations C1 = C1 + 2C2 and C1 = C1 −C3 to get
  
1 0 0 1 2 1
A =  2 1 0 0 −3 −1 .
−1 0 1 0 3 −1
Now apply the row operation R3 = R3 + R2 to the second matrix
and the inverse column operation C2 = C2 −C3 to the first matrix to
get   
1 0 0 1 2 1
A= 2 1 0 0 −3 −1 = LU,
−1 −1 1 0 0 −2
where
   
1 0 0 1 2 1
L= 2 1 0 , U = 0 −3 −1 .
−1 −1 1 0 0 −2
3.5. Triangular factorization 97

We first solve the system Ly = b; this system becomes


    
1 0 0 y1 0
 2 1 0 y2  = 1
−1 −1 1 y3 1
and this is same as the system:
y1 = 0, 2y1 + y2 = 1, −y1 − y2 + y3 = 1.
The solution to this system is clearly y1 = 0, y2 = 1, y3 = 2. Now
we solve Ux = y; this system is
    
1 2 1 x1 0
0 −3 −1 x2  = 1
0 0 −2 x3 2
or
x1 + 2x2 + x3 = 0, −3x2 − x3 = 1, −2x3 = 2.
Clearly x3 = −1 and this with the second equation gives x2 = 0.
First equation finally gives x1 = 1. So the given system has the
solution x1 = 1, x2 = 0, x3 = −1.

Example 3.5.8
Solve: x1 + 2x2 + x3 = 3, 2x1 + x2 + 3x3 = 2, −x1 + x2 − x3 = 0. The
system of linear equation in matrix form is Ax = b where
     
1 2 1 x1 3
A= 2 1 3 , x = x2  , b = 2 .
−1 1 −1 x3 1
Write   
1 0 0 1 2 1
A = 0 1 0   2 1 3 .
0 0 1 −1 1 −1
Reduce the second matrix by using the row operations R2 = R2 −
2R1 and R3 = R3 + R1 and the first matrix by the corresponding
inverse column operations C1 = C1 + 2C2 and C1 = C1 −C3 to get
  
1 0 0 1 2 1
A =  2 1 0 0 −3 1 .
−1 0 1 0 3 0
98 Chapter 3. System of Linear Equations

Now apply the row operation R3 = R3 + R2 to the second matrix


and the inverse column operation C2 = C2 −C3 to the first matrix to
get   
1 0 0 1 2 1
A= 2 1 0 0 −3 1 = LU,
−1 −1 1 0 0 1
where    
1 0 0 1 2 1
L= 2 1 0 , U = 0 −3 1 .
−1 −1 1 0 0 1
We first solve the system Ly = b; this system becomes
    
1 0 0 y1 3
 2 1 0   y2 = 2
 
−1 −1 1 y3 1
and this is same as the system:
y1 = 3, 2y1 + y2 = 2, −y1 − y2 + y3 = 0.
The solution to this system is clearly y1 = 3, y2 = −4, y3 = −1. Now
we solve Ux = y; this system is
    
1 2 1 x1 3
0 −3 1 x2  = −4
0 0 1 x3 −1
or
x1 + 2x2 + x3 = 3, −3x2 + x3 = −4, x3 = −1.
Clearly x3 = −1 and this with the second equation gives x2 = 1.
First equation finally gives x1 = 2. So the given system has the
solution x1 = 2, x2 = 1, x3 = −1.
Chapter 4
Euclidean Space

4.1. D EFINITION AND BASIC PROPERTIES

An ordered pair of two objects x and y is the pair written as (x, y) with
the property that (x, y) = (y, x) if and only if x = y. Ordered pair can be
defined in different ways; for example,
(a, b) := {{a}, {a, b}},
(a, b) := {a, {a, b}}.
Either of these definition shows that (x, y) = (y, x) if and only if x =
y. One can also extend the definition of ordered pair to ordered triple
and ordered n-tuple. Let n be a positive integer. A sequence of n real
numbers (a1 , a2 , a3 , . . . , an ) is called an ordered n-tuple. The set of
all ordered n-tuples, denoted by Rn , is called the Euclidean n-space.
Clearly, ordered pair are just the 2-tuple and we call a 3-tuple as an
ordered triple. We call the elements of Rn as points in Rn or as vectors
in Rn .
Definition 4.1.1 Equality and coordinatewise operations
Let u = (u1 , u2 , u3 , . . . , un ), v = (v1 , v2 , v3 , . . . , vn ) be two vectors in
Rn . The two vectors u and v are equal if
u1 = v1 , u2 = v2 , . . . , un = vn .
The sum u + v and the difference u − u are defined by
u + v := (u1 + v1 , u2 + v2 , . . . , un + vn )
100 Chapter 4. Euclidean Space

u − v := (u1 − v1 , u2 − v2 , . . . , un − vn )
and the scalar multiple λ u (where λ is a real number) is defined by
λ u := (λ u1 , λ u2 , . . . , λ un ).
The addition, subtraction and scalar multiple are respectively called
the coordinatewise addition, subtraction and scalar multiplication.
The zero vector, denoted by 0, is defined by
0 = (0, 0, . . . , 0).
When λ = −1, the scalar multiple −u given by
−u = (−u1 , −u2 , −u3 , . . . , −un )
is called the negative of u or the additive inverse of u. Note that u − v =
u + (−v) and u − u = 0.
Theorem 4.1.1
Let u, v and w be vectors in Rn and λ and µ be scalars. Then we
have the following
(1) u + v = v + u
(2) (u + v) + w = u + (v + w)
(3) u + 0 = 0 + u = u
(4) u + (−u) = (−u) + u = 0
(5) λ (µ u) = (λ µ )u
(6) λ (u + v) = λ u + λ v
(7) (λ + µ )u = λ u + µ u
(8) 1u = u
(1) is the commutative law for addition; (2) is the associative law for
addition; 0 is the identity element for addition; −u is additive inverse of
u.
P ROOF. We prove only the commutative law for addition. Let
u = (u1 , u2 , u3 , . . . , un ), v = (v1 , v2 , v3 , . . . , vn ).
Then
u + v = (u1 , u2 , u3 , . . . , un ) + (v1 , v2 , v3 , . . . , vn )
= (u1 + v1 , u2 + v2 , . . . , un + vn )
4.2. Dot product and norm 101

= (v1 + u1 , v2 + u2 , . . . , vn + un )
= v + u.

Example 4.1.1
Let u = (1, 3, −1), v = (4, −2, 5) be two vectors in R3 . Then
u + v = (1, 3, −1) + (4, −2, 5)
= (1 + 4, 3 + (−2), −1 + 5) = (5, 1, 4),
v + u = (4, −2, 5) + (1, 3, −1) = (5, 1, 4),
3u = 3(1, 3, −1) = (3, 9, −3),
3v = 3(4, −2, 5) = (12, −6, 15),
3u + 3v = (3, 9, −3) + (12, −6, 15) = (15, 3, 12),
3(u + v) = 3(5, 1, 4) = (15, 3, 12).
Clearly u + v = v + u, 3(u + v) = 3u + 3v. Also
u − v = (1, 3, −1) − (4, −2, 5) = (−3, 5, −6).

Example 4.1.2
If x + u = v, then by adding −u both sides of the equation, we get
(x + u) + (−u) = v + (−u).
By using associative law, we get
x + (u + (−u)) = v − u.
Thus we have
x+0 = v−u
or
x = v − u.

4.2. D OT PRODUCT AND NORM

The generalization of dot product of vectors on R3 to Rn is given in the


following:
102 Chapter 4. Euclidean Space

Definition 4.2.1
The Euclidean inner product of two vectors u = (u1 , u2 , u3 , . . . , un ),
v = (v1 , v2 , v3 , . . . , vn ) in Rn , denoted by u · v, is defined by
n
u · v = u1 v1 + u2 v2 + · · · + un vn = ∑ ui vi .
i=1

If we write the vectors u, v as a column vectors


   
u1 v1
u2  v2 
u=   
 ...  , v =  ...  ,
un vn
then
 
v1
 v2 
uT v = u1 u2 · · · un  
 ... 
vn
= [u1 v1 + u2 v2 + · · · + un vn ]
= [u · v] = u · v.
Also it follows that vT u = u · v. Thus we have
u · v = uT v = vT u.
Theorem 4.2.1
If u, v and w are vectors in Rn and k a scalar, then
(1) u.v = v · u
(2) (u + v) · w = u · w + v · w
(3) (λ u) · v = λ (u · v)
(4) u · u ≥ 0 and u · u = 0 if and only if u = 0.

P ROOF. Let u = (u1 , u2 , u3 , . . . , un ),


v = (v1 , v2 , v3 , . . . , vn ) and w = (w1 , w2 , w3 , . . . , wn ).
4.2. Dot product and norm 103

(1) Since ui vi = vi ui , we have


n n
u · v = ∑ ui vi = ∑ vi ui = v · u
i=1 i=1
(2) Since
u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ),
we have
n
(u + v) · w = ∑ (ui + vi )wi
i=1
n
= ∑ (ui wi + vi wi )
i=1
n n
= ∑ ui wi + ∑ vi wi
i=1 i=1
= u·w+v·w
(3) Since λ u = (λ u1 , λ u2 , λ u3 , . . . , λ un ), we have
n n
(λ u) · v = ∑ (λ ui )vi = λ ∑ uivi = λ (u · v).
i=1 i=1
(4) Clearly u · u = ∑ni=1 u2i ≥ 0. If u · u = 0, then ui = 0 for
every i and therefore u = 0. If u = 0, then u.u = 0.

Example 4.2.1
Using the above theorem, we can compute the Euclidean inner
product in the same way we compute ordinary product. For ex-
ample,
(u + v).(u − v) = u.(u − v) + v.(u − v)
= u·u−u·v+v·u−v·v
= u · u − v · v.
104 Chapter 4. Euclidean Space

Definition 4.2.2
The Euclidean norm or Euclidean length of a vector u =
(u1 , u2 , u3 , . . . , un ) in Rn , denoted by ∥u∥, is defined by
s
√ q n
∥u∥ := u · u = u21 + u22 + · · · + u2n = ∑ u2i .
i=1
The distance between two vectors u, v in Rn , denoted by d(u, v), is
defined by
d(u, v) := ∥u − v∥
q
= (u1 − v1 )2 + (u2 − v2 )2 + · · · + (un − vn )2
s
n
= ∑ (ui − vi)2
i=1

Example 4.2.2
For two vectors u, v in Rn , we have
∥u + v∥2 = (u + v) · (u + v)
= u · (u + v) + v · (u + v)
= u·u+u·v+v·u+v·v
= u · u + 2(u · v) + v · v
= ∥u∥2 + 2(u · v) + ∥v∥2
and
∥u − v∥2 = (u − v).(u − v)
= u · (u − v) − v · (u − v)
= u·u−u·v−v·u+v·v
= u · u − 2(u · v) + v · v
= ∥u∥2 − 2(u · v) + ∥v∥2 .
Adding the two equalities, we get
∥u + v∥2 + ∥u − v∥2 = 2[∥u∥2 + ∥v∥2 ].
4.2. Dot product and norm 105

Also we get
∥u + v∥2 − ∥u − v∥2 = 4u · v
and therefore the Euclidean inner product can be expressed in term
of the norms as
1
u · v = [∥u + v∥2 − ∥u − v∥2 ].
4

Example 4.2.3
Let u = (1, 2, −1, 3) and v = (2, 0, 3, 1) be two vectors in R4 . Then
q √
∥u∥ = 12 + 22 + (−1)2 + 32 = 15,
p √
∥v∥ = 22 + 02 + 32 + 12 = 14,
u · v = 1 × 2 + 2 × 0 + (−1) × 3 + 3 × 1 = 2.
Note that √ √
|(u · v)| = 2 ≤ 15 14 = ∥u∥ ∥v∥·

Theorem 4.2.2
If u is vector in Rn and k a scalar, then
(1) ∥u∥ ≥ 0 and ∥u∥ = 0 if and only if u = 0
(2) ∥λ u∥ = |λ | ∥u∥.

P ROOF. We prove only that ∥λ u∥ = |λ | ∥u∥. Since λ u =


(λ u1 , λ u2 , · · · , λ un ), we have
s
n
∥λ u∥ = ∑ (λ ui)2
i=1
s
n
= λ 2 ∑ u2i
i=1
s
n
= |λ | ∑ (ui)2
i=1
= |λ | ∥u∥.
106 Chapter 4. Euclidean Space

Example 4.2.4
By taking λ = −1 in ∥λ u∥ = |λ | ∥u∥, we obtain the following: For
any u ∈ Rn ,
∥ − u∥ = ∥u∥.

Theorem 4.2.3 Cauchy-Schwarz inequality


Let u = (u1 , u2 , u3 , . . . , un ), v = (v1 , v2 , v3 , . . . , vn ) be two vectors in
Rn . Then
|(u · v)| ≤ ∥u∥ ∥v∥
or equivalently
s s
n n n

∑ ui vi ≤ ∑ u2i ∑ v2i .
i=1 i=1 i=1

P ROOF. Let a = ∥v∥2 , b = u · v and c = ∥u∥2 . Note that a ≥ 0.


If a = 0, then v = 0 and in this case, u · v = 0 and ∥v∥ = 0 and
therefore the inequality is satisfied. So assume that a > 0. Now
0 ≤ ∥u + λ v∥2
= ∥u∥2 + 2λ (u.v) + λ 2 ∥v∥2
= aλ 2 + 2bλ + c
for all λ . By taking λ = −b/a, we see that
b2 b
a 2
− 2b + c ≥ 0
a a
or
b2

+ c ≥ 0.
a
Since a > 0, we obtain b2 ≤ ac or equivalently
(u · v)2 ≤ ∥u∥2 ∥v∥2 .
By taking square roots both side, we get the desired result.
4.2. Dot product and norm 107

Example 4.2.5
When n = 2, Cauchy-Schwarz inequality becomes
|u1 v1 + u2 v2 |2 ≤ (u21 + u22 )(v21 + v22 ).
By taking u1 = a, u2 = b, v1 = cos θ and v2 = sin θ , we get
q
|a cos θ + b sin θ | ≤ (a2 + b2 )(sin2 θ + cos2 θ )
p
= a2 + b2
or equivalently (a cos θ + b sin θ )2 ≤ a2 + b2 .

Example 4.2.6
By taking v1 = v2 = · · · = vn = 1 in Cauchy-Schwarz inequality, we
have
(u1 + u2 + · · · + un )2 ≤ n(u21 + u22 + . . . + u2n )·

Theorem 4.2.4 Triangle inequality


If u and v are two vectors in Rn , then
∥u + v∥ ≤ ∥u∥ + ∥v∥·

P ROOF. By using Cauchy-Schwartz inequality, we get


∥u + v∥2 = ∥u∥2 + 2(u · v) + ∥v∥2
≤ ∥u∥2 + 2∥u∥∥v∥ + ∥v∥2
= (∥u∥ + ∥v∥)2 .
By taking square roots, the result follows.

Example 4.2.7
Let u and v be two vectors in Rn . Then
| ∥u∥ − ∥v∥ | ≤ ∥u − v∥·
108 Chapter 4. Euclidean Space

P ROOF. By using the triangle inequality, we get


∥u∥ = ∥(u − v) + v∥
4.2. Dot product and norm 109

≤ ∥u − v∥ + ∥v∥
or
∥u∥ − ∥v∥ ≤ ∥u − v∥·
Similarly
∥v∥ = ∥(v − u) + u∥
≤ ∥v − u∥ + ∥u∥
= ∥ − (u − v)∥ + ∥u∥
= ∥u − v∥ + ∥u∥
or
∥v∥ − ∥u∥ ≤ ∥u − v∥
or
−(∥u∥ − ∥v∥) ≤ ∥u − v∥·
Thus we have | ∥u∥ − ∥v∥ | ≤ ∥u − v∥.

Definition 4.2.3
Two vectors u and v in Rn are orthogonal if u.v = 0.

Example 4.2.8
Consider u = (1, 1, 1, 1, 1, 1) and v = (1, −1, 1, −1, 1, −1) in R6 .
Then
u·v = 1−1+1−1+1−1 = 0

and therefore u and v are orthogonal in R6 . Note that ∥u∥ = 6 =
∥v∥,
∥u + v∥ = ∥(2, 0, 2, 0, 2, 0)∥
√ √
= 4 + 0 + 4 + 0 + 4 + 0 = 12
and therefore
∥u + v∥2 = 12 = 6 + 6 = ∥u∥2 + ∥v∥2 ·

Theorem 4.2.5 Pythagorean Theorem


Two vectors u and v in Rn are orthogonal if and only if
∥u + v∥2 = ∥u∥2 + ∥v∥2 ·
110 Chapter 4. Euclidean Space

P ROOF. Since ∥u + v∥2 = ∥u∥2 + 2(u · v) + ∥v∥2 , we have ∥u +


v∥2 = ∥u∥2 + ∥v∥2 if and only if u · v = 0 or equivalently if and
only if u and v are orthogonal.

Theorem 4.2.6
Two vectors u and v in Rn are orthogonal if and only if
∥u + v∥ = ∥u − v∥·

P ROOF. Since
1
u · v = [∥u + v∥2 − ∥u − v∥2 ],
4
u and v in R are orthogonal if and only if ∥u+v∥2 −∥u−v∥2 = 0
n

or equivalently if and if ∥u + v∥ = ∥u − v∥.

4.3. L INEAR TRANSFORMATIONS

Let A and B be two sets. Then a function f : A → B is a rule that as-


sociate with each element a ∈ A with exactly one element b ∈ B; the
element b is called the image of a and is written as b = f (a). The set
A is called the domain of f and B is called the codomain of f . The set
{ f (a) : a ∈ A}, the set of all images of points in A is called the range
of f . Two functions f and g are equal if they have same domain and
f (a) = g(a) for all a in the domain.
Example 4.3.1
The following are examples of functions.
(1) f : R → R where f (x) = x2 + x,
(2) f : R3 → R where f (x, y, z) = x2 + xy + z2 ,
(3) f : Rn → R where f (x1 , x2 , . . . , xn ) = x1 +x22 +x33 +. . .+xnn ,
(4) f : R → R2 where f (x) = (x2 , x + 1),
(5) f : R2 → R2 where f (x, y) = (x2 , x + y2 ),
(6) f : Rm → Rn where f (x1 , x2 , . . . , xm ) = (x1 , x1 +
x2 , . . . , xn−1 + xn ) (n ≤ m).

A function f : Rn → Rm is called a map or transformation and we


say that f maps Rn to Rm or f is a mapping from Rn to Rm . A mapping
4.3. Linear transformations 111

f : Rn → Rn is called an operator on Rn . For i = 1, 2, . . . , m, let fi :


Rn → R be functions. For each (x1 , x2 , x3 , · · · , xn ) ∈ Rn , let

w1 = f1 (x1 , x2 , x3 , . . . , xn )
w2 = f2 (x1 , x2 , x3 , . . . , xn )
.
. . . ..
wm = fm (x1 , x2 , x3 , . . . , xn ).

Thus for each (x1 , x2 , x3 , · · · , xn ) ∈ Rn we can associate (w1 , w2 , w3 , · · · , wm ) ∈


Rm , and therefore these equations defines a function f : Rn → Rm where

f (x1 , x2 , x3 , · · · , xn ) = (w1 , w2 , w3 , · · · , wm ).

Example 4.3.2
Consider the functions
w1 = f1 (x1 , x2 , x3 ) = x1 + x1 x2 + x3
w2 = f2 (x1 , x2 , x3 ) = x2 + x3 .
Then we can define a function f : R3 → R2 by
f (x1 , x2 , x3 ) = (w1 , w2 ) = (x1 + x1 x2 + x3 , x2 + x3 ).

If the functions fi defining f are all linear equations, then f : Rn →


Rm is called a linear transformation. When n = m, we call a linear
transformation as a linear operator. Thus a linear transformation T :
Rn → Rm is given by

T (x1 , x2 , x3 , · · · , xn ) = (w1 , w2 , w3 , · · · , wm )

where

w1 = a11 x1 + a12 x2 +· · ·+ a1n xn


w2 = a21 x1 + a22 x2 +· · ·+ a2n xn
.. ..
. .
wm =am1 x1 +am2 x2 +· · ·+amn xn .
112 Chapter 4. Euclidean Space

These equations can be written as


   
w1   x1
a11 a12 ··· a1n b1
 w2  x 
   a21 a22 ··· a2n b2   2 
 w3  =  . ..  
 x3  .
 .   .. ..
···
..
.  .. 

 ..  . .
.
am1 am2 · · · amn bm
wm xm
If we write    
w1 x1
 w2   x2 
   
w= w 
 .3  , x= x 
 .3  ,
 ..   .. 
wm xm
 
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
A=  ... ..  ,
. 
.. ..
. ··· .
am1 am2 · · · amn bm
then we have w = Ax. The matrix A is called the standard matrix of the
linear transformation T and T is called multiplication by A. In this case,
we write w = T (x) = Ax.
Example 4.3.3
The transform T : R4 → R3 defined by
w1 =2x1 +x2 +4x3 −2x4 ,
w2 =3x1 +x2 − x3 +2x4 ,
w3 = x1 +x2 + x3 +3x4 .
is a linear transformation from R4 to R3 . This linear transformation
can be written as
 
    x1
w1 2 1 4 −2  
w2  = 3 1 −1 2  x2  .
x3 
w3 1 1 1 3
x4
4.3. Linear transformations 113

Thus the standard matrix of this linear transformation is given by


 
2 1 4 −2
A = 3 1 −1 2  .
1 1 1 3
When (x1 , x2 , x3 , x4 ) = (1, 2, −1, 0), we have
w1 =2+2−4−0 = 0,
w2 =3+2+1+0 = 6,
w3 =1+2−1+0 = 2.
This can also be computed using the standard matrix of the linear
transformation T :
 
    1  
w1 2 1 4 −2   0
w2  = 3 1 −1 2   2  = 6 .
−1
w3 1 1 1 3 2
0

Theorem 4.3.1
A transform T : Rn → Rm is a linear transformation if and only if
T (λ u + µ v) = λ T (u) + µ T (v)
for all vectors u, v ∈ Rn and for every scalars λ , µ .

P ROOF. Assume that T is a linear transformation and A its stan-


dard matrix. Then
T (λ u + µ v) = A(λ u + µ v)
= A(λ u) + A(µ v)
= λ Au + µ Av
= λ T (u) + µ T (v).
Conversely assume that the transformation T has the property
that
T (λ u + µ v) = λ T (u) + µ T (v)
114 Chapter 4. Euclidean Space

for all vectors u, v ∈ Rn and for every scalars λ , µ . It follows


easily that
T (λ1 u1 + λ2 u2 + · · · + λk uk )
= λ1 T (u1 ) + λ2 T (u2 ) + · · · + λk T (uk )
for any vectors u1 , u2 , . . . , uk ∈ Rn and any scalars λ1 , . . . , λn . We
complete the proof by showing that T is a multiplication by a
matrix A. Let e1 , e2 , . . . , en be vectors given by
     
1 0 0
0 1  0
     
e1 =  0 0 
 .  , e2 =  .  , · · · , en =  .  .
0
 ..   ..   .. 
0 0 1
Let A be the matrix whose columns are T (e1 ), T (e2 ), . . ., T (en ).
Then
Ax = x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en )
= T (x1 e1 + x2 e2 + · · · + xn en )
= T (x).
Thus T is multiplication by A and therefore it is linear.

Definition 4.3.1
A linear transformation T : Rn → Rm is one-to-one if
x ̸= y ⇒ T (x) ̸= T (y)
for all x, y ∈ Rn or equivalently
T (x) = T (y) ⇒ x = y
for all x, y ∈ Rn .

Example 4.3.4
Let T (x) = Ax be a linear transformation. Then
T (x) = T (y) ⇒ Ax = Ay ⇒ x = y
provided A is invertible. Thus a linear transformation T is one-to-
one if its matrix A is invertible.
4.3. Linear transformations 115

Example 4.3.5
Let Ti : R2 → R2 be given by Ti (x) = Ai x where
     
1 0 0 −1 −1 0
A1 = , A2 = , A3 = ,
0 1 −1 0 0 −1
     
0 1 −1 0 0 −1
A4 = , A5 = , A6 = ,
−1 0 0 1 −1 0
     
1 0 0 1 α 0
A7 = , A8 = , A9 = ,
0 −1 1 0 0 α
     
1 0 0 0 0 0
A10 = , A11 = , A12 =
0 0 0 1 0 0
Chapter 5
Real Vector Spaces

5.1. V ECTOR SPACES

Definition 5.1.1
A field (F, +, ·) is a nonempty set F together with the two binary
operations
+ : F × F → F, · : F × F → F,
called addition and multiplication respectively, satisfying the ax-
ioms for all x, y, z ∈ F:
(1) x + (y + z) = (x + y) + z, (associative law for addition)
(2) there is an element 0 ∈ F satisfying x + 0 = 0 + x = x,
(additive identity element)
(3) for every x ∈ F, there is an element −x ∈ F satisfying x +
(−x) = (−x) + x = 0, (existence of additive inverse)
(4) x + y = y + x, (commutative law/Abelian law for addi-
tion)
(5) x · (y · z) = (x · y) · z, (associative law for multiplication)
(6) there is an element 1 ∈ F (different from 0) satisfying x ·
1 = 1 · x = x, (multiplicative identity element)
(7) for every x ∈ F\{0}, there is an element x−1 ∈ F satisfying
x · x−1 = x−1 · x = 1, (existence of multiplicative inverse)
(8) x · y = y · x. (commutative law/Abelian law for multipli-
cation)
(9) x · (y + z) = x · y + x · z (distributive law)
118 Chapter 5. Real Vector Spaces

Often the product x · y is written as xy. When the two operations


+ and · are clear from the context, the field (F, +, ·) is denoted by
F itself by omitting the binary operations.

Example 5.1.1
The set Z of all integers is not a field with the usual addition and
multiplication. The non-zero integers other than 1, -1 have no mul-
tiplicative inverse.

Example 5.1.2
The set R of all real numbers and the C of all complex numbers are
fields with the usual addition and multiplication. The set Q of all
rational numbers and the Q[i] of all Gaussian rational numbers (the
complex numbers p + iq where p and q are rational numbers) are
also fields with the usual addition and multiplication.

Example 5.1.3
A positive number p with exactly two positive divisors, namely 1
and p itself, is called a prime number. Let p be a prime number. The
set Z p := {1, 2, . . . , p − 1} with addition and multiplications defined
by
x ⊕ y = x + y (mod p), x ⊙ y = xy (mod p)
is a field.

Definition 5.1.2
A vector space V over a field F is a nonempty set V , whose elements
are called vectors, together with the two operation
+ : V ×V → V, · : F ×V → V,
called addition and scalar multiplication respectively, satisfying the
following axioms for all u, v, w ∈ V and for all scalars λ , µ ∈ F:
(1) If u, v ∈ V , then u + v ∈ V .
(2) u + v = v + u
(3) (u + v) + w = u + (v + w)
5.1. Vector spaces 119

(4) There is an element 0 ∈ V such that for any u ∈ V , u + 0 =


0 + u = u.
(5) For each u ∈ V , there is an element −u ∈ V such that u +
(−u) = (−u)+ u = 0. This element is called the (additive)
inverse of u.
(6) If u ∈ V and λ a scalar, then λ u ∈ V .
(7) λ (u + v) = λ u + λ v
(8) (λ + µ )u = λ u + µ u
(9) λ (µ u) = (λ µ )u
(10) 1u = u
We say that V is a real vector space if V is a vector space over the
field R while it called a complex vector space if it is a vector space over
the field C.
Example 5.1.4
The set Rn with the coordinatewise addition and scalar multiplica-
tion is a real vector space. For two vectors u = (u1 , u2 , . . . , un ) and
v = (v1 , v2 , . . . , vn ) in Rn , the coordinatewise addition and scalar
multiplications are defined by
u+v := (u1 +v1 , u2 +v2 , . . . , un +vn ), λ u := (λ u1 , λ u2 , . . . , λ un ),

Example 5.1.5
The set V1 = {(x, 1) : x ∈ R} with the coordinatewise addition and
scalar multiplication is not a vector space. For, if (x, 1) ∈ V and
(y, 1) ∈ V , then (x, 1) + (y, 1) = (x + y, 2) ̸∈ V .
However, the set V2 = {(x, 0) : x ∈ R} is a vector space. The
sum of (x, 0), (y, 0) ∈ V2 is (x + y, 0) and it is clearly in V2 . The
scalar product λ (x, 0) = (λ x, 0) is also in V2 . The additive identity
is (0, 0) and the inverse of (x, 0) is (−x, 0). Other properties are
easily verified.

Example 5.1.6
The set V = {(x, 1) : x ∈ R} with addition and scalar multiplication
defined by
(x, 1) + (y, 1) = (x + y, 1), λ (x, 1) = (λ x, 1)
120 Chapter 5. Real Vector Spaces

is vector space. The vector (0, 1) is the additive identity; the inverse
of (x, 1) is (−x, 1). All the real vector space axioms are satisfied.
We verify only: λ (u + v) = λ u + λ v where u = (x, 1), v = (y, 1).
Clearly
u + v = (x, 1) + (y, 1) = (x + y, 1)
and therefore
λ (u + v) = λ (x + y, 1)
= (λ (x + y), 1)
= (λ x + λ y, 1)
= (λ x, 1) + (λ y, 1)
= λ (x, 1) + λ (y, 1)
= λ u + λ v.

Example 5.1.7
The set M2 of all square matrices of order 2 with usual matrix addi-
tion, matrix scalar multiplication is a real vector space. The vector
space axioms are verified below.
Let    
u11 u12 v11 v12
u= , v= .
u21 u22 v21 v22
Then
(1) The addition of u and v is given by
   
u11 u12 v11 v12
u+v = +
u21 u22 v21 v22
 
u11 + v11 u12 + v12
= .
u21 + v21 u22 + v22
Clearly u + v ∈ M2 .
(2) We have earlier proved the commutative law for matrix ad-
dition. We repeat the proof. By the definition of addition,
we have
   
u11 u12 v11 v12
u+v = +
u21 u22 v21 v22
5.1. Vector spaces 121
 
u + v11 u12 + v12
= 11
u21 + v21 u22 + v22
 
v + u11 v12 + u12
= 11
v21 + u21 v22 + u22
= v + u.
(3) Associative law for matrix addition is already proved.
(4) The zero matrix
 
0 0
0=
0 0
satisfies u + 0 = 0 + u = u for all u ∈ M2 .
(5) For  
u11 u12
u= ∈ M2 ,
u21 u22
the matrix
 
−u11 −u12
−u = ∈ M2
−u21 −u22
satisfies
u + (−u) = (−u) + u = 0.
(6) By the definition of matrix scalar multiplication, we have
   
u11 u12 λ u11 λ u12
λu = λ = ∈ M2 .
u21 u22 λ u21 λ u22
(7) By the definition of matrix addition and matrix scalar mul-
tiplication, we have
   
u11 u12 v11 v12
λ (u + v) = λ +
u21 u22 v21 v22
 
u11 + v11 u12 + v12

u21 + v21 u22 + v22
 
λ (u11 + v11 ) λ (u12 + v12 )
=
λ (u21 + v21 ) λ (u22 + v22 )
 
λ u11 + λ v11 λ u12 + λ v12
=
λ u21 + λ v21 λ u22 + λ v22
122 Chapter 5. Real Vector Spaces
   
λ u11 λ u12 λ v11 λ v12
= +
λ u21 λ u22 λ v21 λ v22
   
u u12 v v12
= λ 11 + λ 11
u21 u22 v21 v22
= λ u + λ v.
(8) The proof of (λ + µ )u = λ u + µ u is similar.
(9) It is clear that
 
µ u11 µ u12
λ (µ u) = λ
µ u21 µ u22
 
λ µ u11 λ µ u12
=
λ µ u21 λ µ u22
 
u11 u12
= (λ µ )
u21 u22
= (λ µ )u.
(10) Clearly
   
u11 u12 1u11 1u12
1u = 1 = = u.
u21 u22 1u21 1u22
Thus M2 is a vector space.

Example 5.1.8
The set Mm×n of all matrices (with real entries) of order m × n with
usual matrix addition, matrix scalar multiplication is a real vector
space.

Example 5.1.9
The set V of all points in any plane through the origin is a real vec-
tor space with usual addition and scalar multiplication. Any plane
through the origin has the equation ax + by + cz = 0. At least one
of the constant a, b, c, is non-zero. Let c ̸= 0. Then the equation can
be written as z = (−a/c)x + (−b/c)y = Ax + By. Thus the set V is
given by
V = {(u1 , u2 , Au1 + Bu2 )|u1 , u2 ∈ R}.
5.1. Vector spaces 123

(1) If u, v ∈ V , then
u = (u1 , u2 , Au1 + Bu2 ), v = (v1 , v2 , Av1 + Bv2 )
and therefore
u+v
= (u1 , u2 , Au1 + Bu2 ) + (v1 , v2 , Av1 + Bv2 )
= (u1 + v1 , u2 + v2 , A(u1 + v1 ) + B(u2 + v2 ))
∈ V.
(2) We omit the proofs of u + v = v + u, (u + v) + w = u +
(v + w).
(3) The element 0 = (0, 0, 0) ∈ V satisfies, any u ∈ V , u + 0 =
0 + u = u.
(4) For each u = (u1 , u2 , Au1 + Bu2 ) ∈ V , the element −u =
(−u1 , −u2 , −Au1 − Bu2 ) ∈ V satisfies u + (−u) = (−u) +
u = 0.
(5) If u, v ∈ V and λ a scalar, then it is easy to prove that λ u ∈
V . λ (u + v) = λ u + λ v, (λ + µ )u = λ u + µ u, λ (µ u) =
(λ µ )u, 1u = u.
Therefore V is a vector space.

Example 5.1.10
Let F [a, b] be the set of all functions f : [a, b] → R. If f , g ∈ F [a, b]
and λ be any scalar in R, then let f + g, λ f be the functions defined
by
( f + g)(x) = f (x) + g(x), (λ f )(x) = λ f (x)
for all x ∈ [a, b]. Under these operations, F [a, b] is a real vector
space.
Indeed, the function 0 : [a, b] → R defined by 0(x) = 0 for all x ∈
[a, b] acts as the additive identity while the function − f : [a, b] → R
defined by (− f )(x) = − f (x) for all x ∈ [a, b] is the additive inverse
of f ∈ F [a, b]. Other properties of the vectors follows from the
corresponding properties of real numbers. For example, f + g =
g + f follows from
( f + g)(x) = f (x) + g(x) = g(x) + f (x) = (g + f )(x).
124 Chapter 5. Real Vector Spaces

Example 5.1.11
The set C [a, b] of all continuous functions f : [a, b] → R with addi-
tion and scalar multiplication defined by
( f + g)(x) = f (x) + g(x), (λ f )(x) = λ f (x)
for all x ∈ [a, b] is a real vector space.

Example 5.1.12
Let Pn [x] be the set of all polynomials of degree less than n:
( )
n−1
Pn [x] := p : p(x) = ∑ ak xk , a0 , . . . , an−1 ∈ R .
k=0
Note that the constant polynomials p(x) = a0 are also included in
Pn [x]. If p, q ∈ Pn [x] and λ be any scalar in R, then let p + q, λ p
be the functions defined by
(p + q)(x) = p(x) + q(x), (λ p)(x) = λ p(x).
Under these operations, Pn [x] is a real vector space.

Example 5.1.13
Let V be the set with a single element 0. Let addition and scalar
multiplication be defined by
0 + 0 = 0, λ 0 = 0.
Then V is a vector space and it is called a zero vector space.

Theorem 5.1.1 Cancellation law


Let V be a vector space. For any vector u, v, w ∈ V , we have
u + w = v + w ⇒ u = v.

P ROOF. Since w ∈ V , there is an element −w ∈ V such that w +


(−w) = 0 = (−w) + w. Now by adding −w to the equation u +
w = v + w, we get
(u + w) + (−w) = (v + w) + (−w)
5.1. Vector spaces 125

or by using associative law,


u + (w + (−w)) = v + (w + (−w))
or
u+0 = v+0
or
u = v.

Theorem 5.1.2
Let V be a vector space. For any vector u ∈ V and scalar λ , we have
(1) 0u = 0;
(2) λ 0 = 0;
(3) (−1)u = −u;
(4) If λ u = 0, then λ = 0 or u = 0.

P ROOF. (1) Since


0u + 0u = (0 + 0)u = 0u = 0 + 0u,
by cancellation law, we have 0u = 0.
(2) Since 0 + 0 = 0, we have
λ 0 + λ 0 = λ (0 + 0) = λ 0 = 0 + λ 0.
By cancellation law, we have λ 0 = 0.
(3) Since
u + (−u) = 0
and
u + (−1)u = 1u + (−1)u = (1 + (−1))u = 0u = 0,
we have
u + (−1)u = u + (−u) = 0.
By using cancellation law, we have
(−1)u = −u.
126 Chapter 5. Real Vector Spaces

(4) If λ = 0, then there is nothing to prove. Therefore as-


sume that λ ̸= 0. Since λ u = 0, we have
1 1 1
u = 1u = ( λ )u = (λ u) = 0 = 0.
λ λ λ

5.2. S UBSPACES
Let V be a vector space and W a subset of V . For two elements x, y ∈ W ,
we can talk about their sum x+y using the same addition operation in V .
Similarly, we can talk about λ x using the scalar product in V . Our aim
is to investigate if the two operations makes W as a vector space on its
own. For example, if V = R2 , and W1 = {(x1 , 1) : x1 ∈ R}, then, for two
elements x = (x1 , 1), y = (y1 , 1), their sum x + y = (x1 + y1 , 2) ̸∈ W1 and
therefore the addition operation is not a binary operation on W1 . Thus,
W1 is not a vector space. However, in certain cases, W may become a
vector space. Consider W2 = {(x1 , 0) : x1 ∈ R}. In this case, for the two
elements x = (x1 , 1), y = (y1 , 1), their sum x + y = (x1 + y1 , 0) ∈ W2 and
the scalar product λ x = (λ x1 , 0) ∈ W2 . The other properties for a vector
space are also satisfied and W2 becomes a vector space on its own.
Definition 5.2.1
A subset W of a vector space V is called a subspace if W is itself a
vector space under the addition and multiplication defined on V .

Theorem 5.2.1
Let W be a non-empty set subset of a vector space V . Then W is a
subspace of V if and only if
(1) u + v ∈ W for all u, v ∈ W ,
(2) λ u ∈ W for all u ∈ W and for all scalar λ .

P ROOF. Let W be a subspace of V . Then all vector space axioms


hold in W and therefore u + v, λ u ∈ W for all u, v ∈ W and for
any scalar λ .
Conversely, let the conditions (1) and (2) holds. Then we need
to show that W is subspace of V . In other words, we have to show
that W is a vector space on its own. In view of (1), the first axiom
5.2. Subspaces 127

of vector space holds. The commutative and associative axioms


hold in V and therefore they holds in W . By taking λ = 0 in (2),
it follows that 0 ∈ W . Similarly by taking λ = −1 in (1), we see
that −u ∈ W whenever u ∈ W . The other axioms hold in V and
therefore they hold in W too. This completes the proof.

Theorem 5.2.2
Let W be a non-empty set subset of a vector space V . Then W is a
subspace of V if and only if λ u + µ v ∈ W for all u, v ∈ W and for
all scalar λ , µ .

P ROOF. Let W be a subspace of V . Then all vector space axioms


hold in W . If u, v ∈ W and λ , µ are scalars, then λ u, µ v ∈ W and
therefore λ u + µ v ∈ W .
Conversely, if λ u + µ v ∈ W for all u, v ∈ W and for all scalar
λ , µ , then it by taking λ = µ = 1 and µ = 0 respectively, we get
(1) u + v ∈ W for all u, v ∈ W ,
(2) λ u ∈ W for all u ∈ W and for all scalar k.
By the previous theorem, it follows that W is a subspace of V .

Example 5.2.1
Let V = R3 be the vector space under usual addition and scalar
multiplication and W = {(x, y, 0) : x, y ∈ R}. Let u = (x1 , y1 , 0) and
v = (x2 y2 , 0) be two vectors in W and λ , µ be scalars. Then
λ u + µ v = λ (x1 , y1 , 0) + µ (x2 , y2 , 0)
= (λ x1 + µ x2 , λ y1 + µ y2 , 0) ∈ W.
Thus W is a subspace of V .

Example 5.2.2
Let V = R3 be the vector space under usual addition and scalar
multiplication and
W = {(x, y, ax + by) : x, y ∈ R}.
128 Chapter 5. Real Vector Spaces

Let u = (x1 , y1 , ax1 + by1 ) and v = (x2 y2 , ax2 + by2 ) be two vectors
in W and λ , µ be scalars. Then
λ u + µ v = λ (x1 , y1 , ax1 + by1 ) + µ (x2 , y2 , ax2 + by2 )
= (λ x1 + µ x2 , λ y1 + µ y2 , a(λ x1 + µ x2 ) + b(λ y1 + µ y2 )) ∈ W.
Thus W is a subspace of V .
Since the points (x, y, z) ∈ W satisfy
ax + by = z,
we have proved that the set of points in a straight line passing
through origin is a subspace of R3 .

Example 5.2.3
Consider the vector space V = Mm of all square matrices of order m.
Let W be the subset consisting of symmetric matrices of order m.
Since the sum of two symmetric matrices is symmetric and scalar
multiple of symmetric matrix is symmetric, W is a subspace of V .
Similarly the subsets consisting of lower triangular matrices,
upper triangular matrices and diagonal matrices are all subspaces
of V .
The subset consisting of invertible matrices of order m is not a
vector space since the sum of two invertible matrices need not be
invertible.

Example 5.2.4
Consider the vector space V of all functions f : [a, b] → R under
the addition and scalar multiplication of functions. Since the sum
of two continuous is continuous and scalar multiple of a continuous
function is also continuous, the subset C[a, b] consisting of contin-
uous functions is a subspace of V .

Theorem 5.2.3
The set of all of solution of a homogeneous linear system Ax = 0
(with m equations and n unknowns) is a subspace of Rn .
5.2. Subspaces 129

P ROOF. Let x, x′ be two solution of the linear system Ax = 0.


Then Ax = 0 and Ax′ = 0. For any scalar λ and µ , we have
A(λ x + µ x′ ) = λ Ax + µ Ax′ = λ 0 + µ 0 = 0.
This shows that λ x + µ x′ is also a solution of Ax = 0. Thus the
set of all solutions of the homogeneous linear system Ax = 0 is a
subspace of Rn .

Example 5.2.5
Consider solutions of the homogeneous linear system
x + y + z = 0, y−z = 0
given by
x = −2t, y = t, z = t.
The set of all solutions W = {(−2t,t,t) : t ∈ R} is a subspace of
R3 . It can be verified directly. Let u = (−2t,t,t) and v = (−2s, s, s)
and λ and µ be scalars. Then
λ u + µ v = (−2(λ t + µ s), λ t + µ s, λ t + µ s)
is again in W .

Definition 5.2.2
A vector v is a linear combination of the vectors v1 , v2 , . . . , vk if
there are scalars λ1 , λ2 , . . . , λk such that
v = λ1 v1 + λ2 v2 + . . . + λk vk .

Example 5.2.6
Consider the vector space Rn with usual addition and scalar multi-
plication. Consider the following vectors
e1 = (1, 0, 0, · · · , 0)
e2 = (0, 1, 0, · · · , 0)
e3 = (0, 0, 1, · · · , 0)
..
.
130 Chapter 5. Real Vector Spaces

en = (0, 0, 0, · · · , 1).
Any vector v = (v1 , v2 , · · · , vn ) ∈ Rn can be written as
v = v1 e1 + v2 e2 + v3 e3 + · · · + vn en .

Example 5.2.7
Consider the vector space R3 . Let e1 = (1, 0, 0), e2 = (1, 1, 0) and
e3 = (1, 1, 1). Let v = (v1 , v2 , v3 ) ∈ R3 be any vector. Since
λ1 e1 + λ2 e2 + λ3 e3 = (λ1 + λ2 + λ3 , λ2 + λ3 , λ3 ),
v = λ1 e1 + λ2 e2 + λ3 e3
if
λ1 + λ2 + λ3 = v1 , λ2 + λ3 = v2 , λ3 = v3
or equivalently if
λ1 = v1 − v2 , λ2 = v2 − v3 , λ3 = v3 .
Thus
v = (v1 − v2 )e1 + (v2 − v3 )e2 + v3 e3 .

Example 5.2.8
Consider the vector space R3 . Let v1 = (3, −1, 2), v2 = (−1, 2, −1).
Consider the vectors u = (7, −4, 5) and w = (1, 3, 1).
If u is a linear combination of v1 and v2 , then there are scalars
λ1 and λ2 such that
u = λ1 v1 + λ2 v2 .
Since
λ1 v1 + λ2 v2 = (3λ1 − λ2 , −λ1 + 2λ2 , 2λ1 − λ2 )
we must have
3λ1 − λ2 = 7, −λ1 + 2λ2 = −4, 2λ1 − λ2 = 5.
The solution of the linear system is given by λ1 = 2 and λ2 = −1.
Thus
u = 2v1 − 2v2 .
The linear system resulting from
w = λ1 v1 + λ2 v2
5.3. Span 131

is
3λ1 − λ2 = 1, −λ1 + 2λ2 = 3, 2λ1 − λ2 = 1;
this system is inconsistent and therefore w is not a linear combina-
tion of v1 and v2 .

Theorem 5.2.4
Let V be a vector space. The set W of all linear combination of
v1 , v2 , . . . , vk ∈ V is a subspace of V . Any subspace W ′ containing
v1 , v2 , . . . , vk contains the subspace W .

P ROOF. Since vi = 0v1 + · · · + 0vi−1 + 1vi + 0vi+1 + · · · + 0vk ,


vi ∈ W . Thus W is non-empty. Let u, v ∈ W . Then there are
scalars λi , µi (i = 1, 2, . . . , k) such that
u = λ1 v1 + λ2 v2 + . . . + λk vk
v = µ1 v1 + µ2 v2 + . . . + µk vk .
Thus we have
λ u + µ v = λ (λ1 v1 + λ2 v2 + . . . + λk vk )
+ µ (µ1 v1 + µ2 v2 + . . . + µk vk )
= (λ λ1 + µ µ1 )v1 + (λ λ2 + µ µ2 )v2 + . . .
+ (λ λk + µ µk )vk .
Thus λ u+ µ v is a linear combination of v1 , v2 , . . ., vk ∈ V . There-
fore λ u + µ v ∈ W and W is therefore a subspace of V .
Since W ′ is a subspace of V , every linear combination of ele-
ments in W ′ is again in W ′ . Since v1 , v2 , . . ., vk ∈ W ′ , their linear
combination is again in W ′ . Thus W is contained in W ′ .

5.3. S PAN
Definition 5.3.1
Let S = {v1 , v2 , . . . , vk } be a subset of a vector space V . Then the
subspace W consisting of all linear combination of elements in S is
called the space spanned by v1 , v2 , . . ., vk or simply span of S and
132 Chapter 5. Real Vector Spaces

is denoted by span(S) or span{v1 , v2 , . . . , vk }. If W is the span of S,


we say that S spans W .

Example 5.3.1
Let V be a vector space. Let 0 ̸= v ∈ V and S = {v}. Then
span(S) = {λ v : λ ∈ R}.
Let v1 , v2 ∈ V and v1 ̸= λ v2 for any λ . If S = {v1 , v2 }, then
span S = {λ v1 + µ v2 : λ , µ ∈ R}.

Example 5.3.2
Consider the vector space Rn . Consider the set S consisting of the
following vectors
e1 = (1, 0, 0, · · · , 0)
e2 = (0, 1, 0, · · · , 0)
e3 = (0, 0, 1, · · · , 0)
..
.
en = (0, 0, 0, · · · , 1).
For any vector v = (v1 , v2 , · · · , vn ) ∈ Rn , we have
v = v1 e1 + v2 e2 + v3 e3 + · · · + vn en .
Thus span(S) = Rn .

Example 5.3.3
Consider v1 = (1, 0, 0), v2 = (0, 1, 1) and v3 = (1, 0, 1) in R3 . Then
λ1 v1 + λ2 v2 + λ3 v3
= (λ1 + λ3 , λ2 , λ2 + λ3 )
= (v1 , v2 , v3 )
where v1 = λ1 + λ3 , v2 = λ2 , v3 = λ3 + λ2 or λ1 = v1 + v2 − v3 , λ2 =
v2 , λ3 = v3 − v2 . Thus any vector in R3 is a a linear combination of
v1 and v2 and v3 . Thus
span{v1 , v2 , v3 } = R3 .
5.4. Linear dependence and independence 133

Example 5.3.4
Consider v1 = (1, 0, 0), v2 = (0, 1, 0) in R3 . Then
λ1 v1 + λ2 v2
= (λ1 , λ2 , 0).
Therefore
span{v1 , v2 } = {(x, y, 0) : x, y ∈ R};
in this case, span{v1 , v2 } ̸= R3 .

Example 5.3.5
The set {1, x, x2 , . . . , xn } spans the vector space of all polynomials
of degree less than or equal to n.

5.4. L INEAR DEPENDENCE AND INDEPENDENCE

Definition 5.4.1
Let V be a vector space. Let S = {v1 , v2 , . . . , vk } be a subset of V .
If there are scalars λ1 , λ2 , . . . , λk , not all zero, such that
λ1 v1 + λ2 v2 + . . . + λk vk = 0,
then the set S is linearly dependent. If S is not linearly dependent,
then it is linearly independent. Thus S is linearly independent if
λ1 v1 + λ2 v2 + . . . + λk vk = 0,
then
λ1 = λ2 = · · · = λk = 0.

Example 5.4.1
A set with single vector v is linearly dependent if and only if there
is a scalar λ ̸= 0 such that λ v = 0. This hold if and only if v = 0.
Thus a single element set {v} is linearly dependent if and only if
v = 0.
The set {v} is linearly independent if and only if v ̸= 0.
134 Chapter 5. Real Vector Spaces

Example 5.4.2
Consider a two element subset {v1 , v2 } of a vector space V . The set
is linearly dependent if there are scalars λ1 and λ2 , not both zero,
such that
λ1 v1 + λ2 v2 = 0.
Since λ1 and λ2 are not both zero, at least one of them, say λ1 , is
nonzero. Then
λ2
v1 = − v2 .
λ1
Thus v1 is a scalar multiple of v2 .
If v1 is a scalar multiple of v2 , then v1 = λ v2 or v1 − λ v2 = 0.
Thus {v1 , v2 } is linearly dependent.
Thus, the set {v1 , v2 } is linearly dependent if and only if one
vector is scalar multiple of other vector. The set {v1 , v2 } is linearly
independent if and only if neither vector is scalar multiple of other
vector.
The set {(1, 0, 0), (0, 1, 0)} is linearly independent while
{(1, 1, 2), (2, 2, 4)} is linearly dependent.

Example 5.4.3
Consider the vector space Rn . Consider the set S consisting of the
following vectors
e1 = (1, 0, 0, · · · , 0)
e2 = (0, 1, 0, · · · , 0)
e3 = (0, 0, 1, · · · , 0)
..
.
en = (0, 0, 0, · · · , 1).
Then
λ1 e1 + λ2 e2 + . . . + λn en = 0
becomes
(λ1 , λ2 , . . . , λn ) = (0, 0, . . . , 0)
or
λ1 = λ2 = · · · = λn = 0.
5.4. Linear dependence and independence 135

Thus S is a linearly independent set.

Example 5.4.4
Consider v1 = (1, 0, 0), v2 = (0, 1, 1) and v3 = (1, 0, 1) in R3 . Since
λ1 v1 + λ2 v2 + λ3 v3
= (λ1 + λ3 , λ2 , λ2 + λ3 ),
the equation
λ1 v1 + λ2 v2 + λ3 v3 = 0
becomes
λ1 + λ3 = 0, λ2 = 0, λ2 + λ3 = 0.
Solving these equations, we get
λ1 = λ2 = λ3 = 0.
Thus S = {v1 , v2 , v3 } is a linearly independent set.

Example 5.4.5
The set {1, x, x2 , . . . , xn } is linearly independent in the vector
space of all polynomials of degree less than or equal to n. Let
λ0 , λ1 , λ2 , . . . , λn be scalars such that
λ0 1 + λ1 x + λ2 x2 + · · · + λn xn = 0.
Since this is true for all x, we get λ0 = 0 by taking x = 0. Now by
differentiating the above equation, we get
λ1 + 2λ2 x + · · · + nλn xn−1 = 0.
By taking x = 0 in this equation, we get λ1 = 0. By proceeding this
way, we see that
λ0 = λ1 = λ2 = . . . = λn = 0
and therefore {1, x, x2 , . . . , xn } is a linearly independent set.

Example 5.4.6
Consider v1 = (1, 0, 0), v2 = (0, 1, 1) and v3 = (1, 1, 1) in R3 . Since
λ1 v1 + λ2 v2 + λ3 v3
136 Chapter 5. Real Vector Spaces

= (λ1 + λ3 , λ2 + λ3 , λ2 + λ3 ),
the equation
λ1 v1 + λ2 v2 + λ3 v3 = 0
becomes
λ1 + λ3 = 0, λ2 + λ3 = 0, λ2 + λ3 = 0.
Solving these equations, we get
λ1 = λ2 = t, λ3 = −t.
In particular, we can take λ1 = λ2 = 1 and λ3 = −1. In this case,
we have
v1 + v2 − v3 = 0.
Thus S = {v1 , v2 , v3 } is a linearly dependent set.

Example 5.4.7
If S is a subset of a vector space V and 0 ∈ S, then S is linearly
dependent. For, let v1 , v2 , . . . , vk−1 , 0 be the vectors in S. By taking
λ1 = λ2 = · · · = λk−1 = 0 and λk = 1, we see that
λ1 v1 + λ2 v2 + . . . + λk−1 vk−1 + λk 0 = 0.
Since not all λi ’s are zero (in this case, λk ̸= 0), we see that S is
linearly dependent.

Theorem 5.4.1
A subset S = {v1 , v2 , . . . , vk } with two or more vector of a vector
space V is linearly dependent if and only if at least one of the vector
in S is a linear combination of other vectors in S.

P ROOF. Let S be linearly dependent. Then there are scalars


λ1 , λ2 , . . . , λk , not all zero, such that
λ1 v1 + λ2 v2 + . . . + λk vk = 0.
Let λi ̸= 0. Then
λi vi = −λ1 v1 − λ2 v2 − . . . − λi−1 vi−1
−λi+1 vi+1 − . . . − λk vk
5.4. Linear dependence and independence 137

or
−λ1 λ2 λi−1
vi = v1 − v2 − . . . − vi−1
λ1 λ1 λ1
λi+1 λk
− vi+1 − . . . − vk .
λ1 λ1
This shows that vi is linear combination of other vectors in S.
Conversely, let one of the vector, say v1 , is linear combination
of other vectors in S. Then there are scalars λ2 , λ3 , . . . , λk such
that
v1 = λ2 v2 + . . . + λk vk .
Thus
−v1 + λ2 v2 + . . . + λk vk = 0.
Since not all constants in the linear combination is zero, it follows
that S is linearly dependent.

Example 5.4.8
Consider v1 = (1, 0, 0), v2 = (1, 0, 1) and v3 = (3, 0, 1) in R3 . Since
2v1 + v2 = 2(1, 0, 0) + (1, 0, 1)
= (2, 0, 0) + (1, 0, 1)
= (3, 0, 1)
= v3 ,
the set {v1 , v2 , v3 } is linearly dependent in R3 .

Theorem 5.4.2
Any subset S of Rn having more than n vectors is linearly depen-
dent.

P ROOF. Let S = {v1 , v2 , . . . , vk } be a subset of Rn and k > n. Con-


sider the equation
λ1 v1 + λ2 v2 + . . . + λk vk = 0.
We need to show that there is a solution where not all λi ’s are zero.
Each side of the above equation is a vector in Rn . By equating
138 Chapter 5. Real Vector Spaces

the corresponding coordinates, we get n homogeneous equations


in k unknowns. Since a homogeneous linear system with more
unknowns than the number of equations has nontrivial solution,
we have non-zero solutions. Thus the set S is linearly dependent.

5.5. BASIS AND DIMENSION

Definition 5.5.1
A subset S of a vector space V is called a basis for V if (a) S is
linearly independent and (b) S spans V .

Example 5.5.1
Consider the vector space Rn . Consider the set S consisting of the
following vectors
e1 = (1, 0, 0, · · · , 0)
e2 = (0, 1, 0, · · · , 0)
e3 = (0, 0, 1, · · · , 0)
..
.
en = (0, 0, 0, · · · , 1).
This S is a linearly independent set and also spans Rn . Thus it is a
basis for Rn . It is called the standard basis for Rn .

Example 5.5.2
Consider v1 = (1, 0, 0), v2 = (0, 1, 1) and v3 = (1, 0, 1) in R3 . We
have shown that the set S = {v1 , v2 , v3 } spans R3 and linearly inde-
pendent. Thus S is basis for R3 .

Example 5.5.3
The set S = {1, x, x2 , . . . , xn } is linearly independent in the vector
space Pn of all polynomials of degree less than or equal to n. Also
it spans Pn and therefore S is basis for Pn .
5.5. Basis and dimension 139

Theorem 5.5.1
Let S = {v1 , v2 , . . . , vk } be a basis for a vector space V . Then every
element v ∈ V can be expressed as a unique linear combination of
elements in S:
v = λ1 v1 + λ2 v2 + · · · + λk vk .

P ROOF. Since S is a basis for V , V = span(S). Thus every vec-


tor in V is a linear combination of elements in S. We need to
show that v can be expressed uniquely as linear combination of
elements in S. Or equivalently, if v is expressed as linear combi-
nation in two ways
v = λ1 v1 + λ2 v2 + · · · + λk vk
and
v = µ1 v1 + µ2 v2 + · · · + µk vk ,
then we need to show that
λ1 = µ1 , . . . , λk = µk .
By subtracting the two expression for v, we now get
(λ1 − µ1 )v1 + · · · + (λk − µk )vk = 0.
Since S is a basis for V , the set S is linearly independent. Hence
λ1 − µ1 = 0, . . . , λk − µk = 0
or
λ1 = µ1 , . . . , λk = µk .

Definition 5.5.2
Let S = {v1 , v2 , . . . , vk } be a basis for a vector space V . Then every
element v ∈ V can be expressed as a unique linear combination of
elements in S:
v = λ1 v1 + λ2 v2 + · · · + λk vk .
The scalars λ1 , λ2 , . . . , λk are called the coordinates of the vector v
with respect to the basis S. The vector (v)S := (λ1 , λ1 , . . . , λk ) ∈ Rk
is the coordinate vector of v relative to S.
140 Chapter 5. Real Vector Spaces

Example 5.5.4
The coordinate vector of v = (v1 , v2 , v3 ) relative to the standard ba-
sis of R3 is (v)S = (v1 , v2 , v3 ) which is same as the vector v.
Consider the set S = {v1 , v2 , v3 } where v1 = (1, 0, 0), v2 =
(0, 1, 1) and v3 = (1, 0, 1) are in R3 . Thus S is basis for R3 . For
any vector v = (v1 , v2 , v3 ), we have
v = (v1 + v2 − v3 )v1 + v2 v2 + (v3 − v2 )v3 .
Therefore the coordinate vector of v relative to the basis S is given
by
(v)S = (v1 + v2 − v3 , v2 , v3 − v2 ).

Example 5.5.5
Consider the vector space M2 of all matrices of order 2. Consider
also the set
S = {E1 , E2 , E3 , E4 }
where    
1 0 0 1
E1 = , E2 = ,
0 0 0 0
   
0 0 0 0
E3 = , E4 = .
1 0 0 1
Then any matrix  
a b
v=
c d
can be written as a linear combination of E1 , E2 , E3 and E4 :
v = aE1 + bE2 + cE3 + dE4 .
Hence span(S) = M2 . Also if there are scalars λ1 , . . . , λ4 such that
 
0 0
λ1 E1 + λ2 E2 + λ3 E3 + λ4 E4 = ,
0 0
then we have    
λ1 λ2 0 0
= ,
λ3 λ4 0 0
and therefore
λ1 = . . . = λ4 = 0.
5.5. Basis and dimension 141

Thus S is linearly independent and therefore S is a basis for M2 . For


the matrix  
a b
v= ,
c d
we have
(v)S = (a, b, c, d).

Definition 5.5.3
A vector space V is finite-dimensional if it has a basis with finite
number of elements. The vector space {0} is also considered as
finite-dimensional (even though there is no basis for this vector
space). Otherwise it is called infinite-dimensional.
The number of elements in a basis for a finite-dimensional vec-
tor space V is called the dimension of the vector space and is de-
noted by dim(V ).

The above definition of dimension make sense only when all the ba-
sis of a finite-dimensional vector space have same number of elements.
In fact, this is true and will be proved shortly.
Theorem 5.5.2
Let V be finite-dimensional vector space. Let S be a basis for V with
n vectors. Any set with more than n elements is linearly dependent.

P ROOF. Let S = {v1 , v2 , . . . , vn } be a basis for V and S′ =


{w1 , w2 , . . . , wm } be any set and m > n. We will show that S′ is
linearly dependent. To do this we must find scalars λ1 , λ2 , . . . , λm
not all zero such that
(5.5.1) λ1 w1 + λ2 w2 + · · · + λm wm = 0.
Since S is a basis for V , all the vectors in S′ are linear combi-
nation of vectors in S. Thus we have scalars ai j such that
w1 = a11 v1 + a21 v2 + · · · + an1 vn
w2 = a12 v1 + a22 v2 + · · · + an2 vn
.. .. .. .. ..
. . . . .
wm = a1m v1 + a2m v2 + · · · + anm vn .
142 Chapter 5. Real Vector Spaces

Since vi ’s are linearly independent and


0 = λ1 w1 + λ2 w2 + · · · + λm wm
= (λ1 a11 + λ2 a12 + · · · + λm a1m )v1
+ (λ1 a21 + λ2 a22 + · · · + λm a2m )v2
+···
+ (λ1 an1 + λ2 an2 + · · · + λm anm )vn ,
we have
λ1 a11 + λ2 a12 + · · · + λm a1m = 0
λ1 a21 + λ2 a22 + · · · + λm a2m = 0
..
.
λ1 an1 + λ2 an2 + · · · + λm anm = 0.
Since there are n equations in m variables and m > n, there are
nontrivial solution to the above linear system. Thus there are
scalars λ1 , λ2 , . . . , λm not all zero such that (??) holds. This proves
that S′ is linearly dependent.

Theorem 5.5.3
Let V be finite-dimensional vector space. Let S be a basis for V
with n vectors. Any set with less than n elements does not span V .

P ROOF. Let S = {v1 , v2 , . . . , vn } be a basis for V and S′ =


{w1 , w2 , . . . , wm } be any set and m < n. We will show that S′ does
not span V . Assume, on the contrary, that S′ spans V . Then we
show that this implies that S is linearly dependent, a contraction.
Since S′ spans V , all the vectors in S are linear combination of
vectors in S′ . Thus we have scalars ai j such that
v1 = a11 w1 + a21 w2 + · · · + am1 wm
v2 = a12 w1 + a22 w2 + · · · + am2 wm
.. .. .. .. ..
. . . . .
vn = a1n w1 + a2n w2 + · · · + amn wm .
5.5. Basis and dimension 143

Note that
λ1 v1 + λ2 v2 + · · · + λn vn
=(λ1 a11 + λ2 a12 + · · · + λn a1n )w1
+ (λ1 a21 + λ2 a22 + · · · + λn a2n )w2
+···
+ (λ1 am1 + λ2 am2 + · · · + λn amn )wm ,
Consider the equation
(5.5.2) λ1 v1 + λ2 v2 + · · · + λn vn = 0.
Also consider the system of equations
λ1 a11 + λ2 a12 + · · · + λn a1n = 0
λ1 a21 + λ2 a22 + · · · + λn a2n = 0
···
λ1 am1 + λ2 am2 + · · · + λn amn = 0
Since there are m equations in n variables and n > m, there are
nontrivial solution to the above linear system. Thus there are
scalars λ1 , λ2 , . . . , λn not all zero such that (??) holds. This proves
that S is linearly dependent, a contraction to the assumption that
S is a basis for V . Thus it follows that S′ does not span V .

Theorem 5.5.4
Any two bases for a finite-dimensional vector space V have same
number of elements.

P ROOF. Let S be a basis for V having n elements. Let S′ be an-


other basis for V with n′ elements. Since S is a basis for V , any set
with more than n elements is linearly dependent. But S′ is linearly
independent, so we must have n′ ≤ n.
Since S is a basis for V , any set with less than n elements does
not span V . But S′ spans V and therefore we must have n′ ≥ n.
Thus we have n = n′ .
144 Chapter 5. Real Vector Spaces

Example 5.5.6
Since Rn has a basis with n elements, dim(Rn ) = n. Also the vec-
tor space Pn of all polynomial of degree n has a basis with n + 1
elements and therefore dim(Pn ) = n + 1. The vector space M2 of
all square matrices of order 2 has a basis with four elements and
therefore dim(M2 ) = 4. In general, dim(Mm×n ) = mn.

Theorem 5.5.5
Let V be a vector space and S be a nonempty subset of V . If S
is linearly independent subset of V and v ̸∈ span(S), then the set
S′ = S ∪ {v} is also linearly independent.

P ROOF. Let S = {v1 , v2 , . . . , vk }. We need to show that S′ =


{v1 , v2 , . . . , vk , v} is linearly independent. In other words, we need
to show that
λ1 v1 + λ2 v2 + · · · + λk vk + λk+1 v = 0
implies
λ1 = λ2 = · · · = λk = λk+1 = 0.
The scalar λk+1 = 0. For, if this scalar is nonzero, then
λ1 λ2 λk
v=− v1 − v2 + · · · − vk
λk+1 λk+1 λk+1
and therefore v ∈ span(S), contrary to the assumption that v ̸∈
span(S).
Since λk+1 = 0, we have
λ1 v1 + λ2 v2 + · · · + λk vk = 0
and by linear independence of S, it follows that
λ1 = λ2 = · · · = λk = 0.
This proves that S′ is linearly independent.
5.5. Basis and dimension 145

Theorem 5.5.6
Let V be a vector space and S be a nonempty subset of V . If v ∈ S
and v ∈ span(S − {v}), then
span(S) = span(S − {v}).

P ROOF. Let S = {v, v1 , v2 , . . . , vk }. Since v ∈ span(S − {v}), v is


a linear combination of vectors in S − {v} = {v1 , v2 , . . . , vk }; that
is,
v = λ1 v1 + λ2 v2 + · · · + λk vk .
To prove the theorem, we need to show that every vector that is
expressible as linear combination of v, v1 , v2 , . . . , vk is expressible
as a linear combination of v1 , v2 , . . . , vk . Consider a vector u ∈
span S. Then
u = µ0 v + µ1 v1 + µ2 v2 + · · · + µk vk
= µ0 (λ1 v1 + λ2 v2 + · · · + λk vk )
+ µ1 v1 + µ2 v2 + · · · + µk vk
= (µ0 λ1 + µ1 )v1 + . . . + (µ0 λk + µk )vk .
Thus u is a linear combination of v1 , v2 , . . . , vk .

Theorem 5.5.7
Let V be a finite-dimensional vector space and S be a finite subset
of V .
(a) If S spans V , but not a basis for V , a basis for V can ob-
tained from S by removing appropriate vectors.
(b) If S is linearly independent, but not a basis for V , then S
can enlarged to a basis for V by adding appropriate vectors
into S.

P ROOF OF (a). If S spans V but not a basis for V , then S is lin-


early dependent and therefore there is a vector v ∈ S which linear
combination of other vectors in S. The set S′ obtained by remov-
ing v from S still spans V . If this set S′ is linearly independent,
then S′ is a basis for V . Otherwise there is a vector v′ in S′ which
146 Chapter 5. Real Vector Spaces

linear combination of other vectors in S′ . The set S′′ obtained by


removing v′ from S′ still spans V . If this set S′′ is linearly indepen-
dent, then S′′ is a basis for V . This procedure is continued until
we get a subset of S that is linearly independent and spans V .

P ROOF OF (b). If S is linearly independent set in V but not a basis


for V , then S does not span V . The set S can be extended to
another set S′ by adding appropriate vector v so that S′ is linearly
independent. If S′ spans V , then S′ is a basis for V . If S′ does
not span V , then the set S′ can be extended to another set S′′ by
adding appropriate vector v′ so that S′′ is linearly independent. If
S′′ spans V , then S′′ is a basis for V . This procedure is continued
until we get a linearly independent set that spans V .

Theorem 5.5.8
Let V be an n-dimensional vector space and S be a subset of V with
exactly n vectors. Then S is a basis for V if either S spans V or S is
linearly independent.

P ROOF. Let V be vectors space of dimension n. Let S spans V .


Then to show that S is a basis, we need to show that S is linearly
independent. If S is not linearly independent, then some vector v
is linear combination of other vectors in S. The set S′ obtained
from S by removing the vector v still spans V , which is not pos-
sible since S′ has only n − 1 elements and V is of dimension n.
Thus S should be linearly independent and therefore S is a basis
for V . (A set with less than n elements in a n-dimensional vector
space V cannot span V .)
Let S be linearly independent. We will show that S spans V so
that S becomes a basis for V . If S does not span, then we can find
a vector v such that the set S′ obtained adding v is still linearly
independent, which is not possible since S′ has n + 1 elements
and V of dimension n. (A set with more than n elements in a
n-dimensional vector space V cannot be linearly independent in
V .)
5.5. Basis and dimension 147

Example 5.5.7
Consider the set S with three vectors v1 = (1, 2, 3), v2 = (2, 5, 3)
and v3 = (1, 0, 8). Let λ1 , λ2 and λ3 be scalars such that λ1 v1 +
λ2 v2 + λ3 v3 = 0. Therefore we have
λ1 + 2λ2 + λ3 = 0,
2λ1 + 5λ2 = 0,
3λ1 + 3λ2 + 8λ3 = 0.
Since the matrix  
1 2 1
A = 2 5 0 
3 3 8
has nonzero determinant (det(A) = −1), it is invertible and there-
fore there is only a trivial solution to the above equations. Thus the
set S is linearly independent. Therefore S is a basis for R3 .

Example 5.5.8
Consider the set S with three vectors v1 = (1, 0, 0, 0), v2 =
(a, 1, 0, 0), v3 = (b, c, 1, 0) and v4 = (d, e, f , 1). Let λ1 , λ2 , λ3 and
λ4 be scalars such that λ1 v1 + λ2 + λ3 v3 + λ4 v4 = 0. Therefore we
have
λ1 + aλ2 + bλ3 + d λ4 = 0,
λ2 + cλ3 + eλ4 = 0,
λ3 + f λ4 = 0,
λ4 = 0.
Since the matrix  
1 a b d
0 1 c e 
A= 0 0 1 f 

0 0 0 1
has nonzero determinant (det(A) = 1), it is invertible and therefore
there is only a trivial solution to the above equations. Thus the set
S is linearly independent. Therefore S is a basis for R4 .
148 Chapter 5. Real Vector Spaces

Example 5.5.9
Consider the set S with three vectors v1 = (1, 1, 1), v2 = (1, 1, 0)
and v3 = (1, 0, 0). Since
v3 = (1, 0, 0), v2 − v3 = (0, 1, 0), v1 − v2 = (0, 0, 1),
any vector v = (λ1 , λ2 , λ3 ) ∈ R3 can be written as
v = λ1 v3 + λ2 (v2 − v3 ) + λ3 (v1 − v2 )
= λ3 v1 + (λ2 − λ3 )v2 + (λ1 − λ2 )v3 .
Thus S spans R3 and therefore it is a basis for R3 .

5.6. R OW SPACE , COLUMN SPACE AND NULL SPACE

Definition 5.6.1
Let A = [ai j ] be an m × n matrix. The vectors
r1 = (a11 a12 a13 · · · a1n ),
r2 = (a21 a22 a23 · · · a2n ),
..
.
rm = (am1 am2 am3 · · · amn ),
obtained from the rows of A, are called the row vectors of A. The
vectors
     
a11 a12 a1n
 a21   a22   a2n 
     
c1 =  a31  , c2 =  a32  , · · · , cn =  a3n  ,
 .   .   . 
 ..   ..   .. 
am1 am2 amn
obtained from the columns of A, are called the column vectors of
A. The vector space spanned by the row vectors of A is called the
row space of A and the vector space spanned by the column vectors
is the column space of A. The vector space of all solutions of the
linear system Ax = 0 is the null space of A. Note that row space
and null spaces are subspaces of Rn while the column space is a
subspace of Rm .
5.6. Row space, column space and null space 149

Example 5.6.1
Consider the matrix  
1 0 1
A= .
1 1 1
The vectors r1 = (1 0 1) and r2 = (1 1 1) are the row vectors of
A while c1 = (1 1)T and c2 = (0 1)T and c3 = c1 are the column
vectors of A. The subspace
R = {λ r1 + µ r2 : λ , µ ∈ R}
is the row space of A. The subspace
C = {λ c1 + µ c2 : λ , µ ∈ R}
is the column space of A.
The solutions of the system Ax = 0 are given by the equation
x + z = 0, x+y+z = 0
or equivalently by
x + z = 0, y = 0.
By solving this, we get x = t, y = 0, z = −t. Thus the set of all
solutions of Ax = 0 is N = {(t 0 − t) : t ∈ R}. This is clearly a
vector subspace of R3 and is the null space of A.

Theorem 5.6.1
A linear system Ax = b is consistent if and only if b is in the column
space of A.

P ROOF. Let A = [ai j ] be a matrix of order m × n and c1 , c2 , . . . , cn


be column vectors of A. Then for any x = (x1 x2 x3 . . . xn )T , we
have
Ax = (c1 c2 . . . cn )(x1 x2 . . . xn )T
= x1 c1 + x2 c2 + · · · + xn cn .
Since the column space of A is just the set of all linear combina-
tions of the column vectors, we see that Ax = b is consistent if
and only if
b = x1 c1 + x2 c2 + · · · + xn cn .
150 Chapter 5. Real Vector Spaces

Thus Ax = b if and only if b is the column space of A.

Theorem 5.6.2
A linear system Ax = 0 is has only trivial solution if and only if the
column vectors of A are linearly independent.

P ROOF. Let c1 , c2 , . . . , cn be column vectors of A so that


Ax = x1 c1 + x2 c2 + · · · + xn cn .
Let Ax = 0 has only trivial solutions. Consider
x1 c1 + x2 c2 + · · · + xn cn = 0.
Since this is same as Ax = 0, it follows that x = 0 or x1 = x2 =
· · · = xn = 0. Thus the column vectors are linearly independent.
Conversely assume that the column vectors of A are linearly
independent. Consider Ax = 0. This is same as
x1 c1 + x2 c2 + · · · + xn cn = 0.
By linear independence of the column vectors, we get x1 = x2 =
· · · = xn = 0 or x = 0. Thus Ax = 0 has only a trivial solution.

Example 5.6.2
Consider the linear system Ax = b where
     
1 0 1 x 4
A = 1 1 0 , x = y , and b = −1 .
    
0 1 1 z −1
The augmented matrix for the system is
 
1 0 1 4
1 1 0 −1 .
0 1 1 −1
By subtracting the first row from the second, second row from the
third and then dividing the third row by 2, we get the following
5.6. Row space, column space and null space 151

matrix  
1 0 1 4
0 1 −1 −5 .
0 0 1 2
Thus the linear system becomes x + z = 4, y − z = −5 and z = 2.
Thus we have
x = 2, y = −3, z = 2.
Note that the vector b is in the column space of A, for
       
4 1 0 1
−1 = 2 1 − 3 1 + 2 0 .
−1 0 1 1
The vector      
1 0 1
b = 1 + 1 = 2
    
0 1 1
is in the column space of A and therefore the system Ax = b is
consistent and the solution is given by x = 1, y = 1, z = 0.

Theorem 5.6.3
Let x0 be any solution of consistent linear system Ax = b and
x1 , x2 , . . . , xk be basis vectors of the null space of A. Every solu-
tion x of Ax = b can be written as
(5.6.1) x = x0 + λ1 x1 + · · · + λk xk ,
where λ1 , . . . , λk are scalars. Conversely the vector x given by (??)
is a solution of Ax = b every choice of scalars λ1 , . . . , λk .

P ROOF. Since x0 is a solution of Ax = b, we have Ax0 = b. If x


is any solution of Ax = b, then A(x − x0 ) = Ax − Ax0 = b − b = 0
and therefore x − x0 is solution of Ax = 0; in other words, x − x0
is in the null space of A. Since x1 , x2 , . . . , xk are the basis vectors
of the null space of A, there are scalars λ1 , . . . , λk such that
x − x0 = λ1 x1 + · · · + λk xk
or
x = x0 + λ1 x1 + · · · + λk xk .
152 Chapter 5. Real Vector Spaces

This proves (??).


To prove the converse, let us compute Ax. Since x0 be any
solution of Ax = b, we have Ax0 = b. Since x1 , x2 , . . . , xk are
basis vectors of the null space of A, they are solutions of Ax = 0
and therefore Axr = 0 for every r = 1, 2, . . . , k. Now
Ax = A(x0 + λ1 x1 + · · · + λk xk )
= Ax0 + λ1 Ax1 + · · · + λk Axk
= b+0+0+···+0
= b.
Thus x given by (??) is a solution of Ax = b for every choice of
scalars λ1 , . . . , λk .
Any solution x = x0 of the linear system Ax = b is called a partic-
ular solution of Ax = b. The solution x given by (??) is the general
solution of Ax = b. The solution x = λ1 x1 + · · · + λk xk is the general
solution of Ax = 0.
Example 5.6.3
Consider the linear system
    
1 0 0 x 1
0 1 1 y = 2 .
1 1 1 z 3
Clearly x0 = (1 1 1)T is a solution of the given linear system. The
solution space of the homogeneous linear system
    
1 0 0 x 0
0 1 1 y = 0
1 1 1 z 0
is {(0,t, −t)T : t ∈ R} and its basis vector is x1 = (0, 1, −1)T . Thus
the general solution of the given nonhomogeneous linear system is
     
1 0 1
x = x0 + tx1 = 1 + t  1  = 1 + t  .
1 −1 1−t
5.6. Row space, column space and null space 153

Theorem 5.6.4
Let B be a matrix obtained from the matrix A by a sequence of
elementary row operations. Then the null space of the matrix A and
the null space of B are same.

P ROOF. Since the solutions of the linear system Ax = 0 is same


as the solutions of the linear system Bx = 0, the result follows.

Theorem 5.6.5
Let B be a matrix obtained from the matrix A by a sequence of
elementary row operations. Then the row space of a matrix A and
the row space of B are same.

P ROOF. It is enough to prove that the row operations does not


change the row space of a given matrix. There are three kinds of
row operations. First, consider the row operation of interchanging
two rows of A to get A′ . Clearly the rows of A and A′ are same
and therefore their row spaces are same.
Now let A′ be obtained from A by multiplying ith row of A by
a nonzero scalar λ or by adding λ times jth row of A to the ith
row of A. Let r1 , r2 , . . . , rk and r′1 , r′2 , . . . , r′k be the row vectors of
A and A′ respectively. In this case, we have
r′1 = r1 ,
..
.
r′i−1 = ri−1 ,
r′i = λ ri or ri + λ r j ,
r′i+1 = ri+1
..
.
r′k = rk .
Any vector which is a linear combination of r1 , r2 , . . . , rk is a lin-
ear combination of r′1 , r′2 , . . . , r′k and conversely. Since the row
154 Chapter 5. Real Vector Spaces

space is the set of all linear combination of row vectors, it follows


that the row space of A and A′ are same.

Theorem 5.6.6
Let B be a matrix obtained from the matrix A by a sequence of
elementary row operations. Then a subset of column vectors of
A is linearly independent if and only if the corresponding column
vectors of B are linearly independent. The same is true for basis
also.

P ROOF. The result is consequence of two results: The null space


is not changed by row operations and a homogeneous linear sys-
tem has only trivial solutions if and only the column vectors are
linearly independent.

Theorem 5.6.7
Let R be the reduced row echelon form of a matrix A. Then the row
vectors of R with leading 1’s forms a basis for the row space of A
and the column vectors of A corresponding to column vectors of R
with leading 1’s forms a basis for the column space of A.

Corollary 5.6.1
The dimension of row space and column space are equal.

Definition 5.6.2
The common value of the dimensions of row space and column
space of a matrix A is the rank of A. The dimension of the null
space of A is the nullity of A.
It follows that the rank of A is the number of leading 1’s in the
reduced row echelon form of the matrix A.
Theorem 5.6.8
The matrix A and its transpose AT have the same rank.
5.6. Row space, column space and null space 155

P ROOF. Since the row space of A is same as the column space


of AT , we have: rank of A = dimension of row space of A= the
dimension of column space of AT = rank of AT .

Example 5.6.4
Consider the matrix
 
1 2 0 0 2 −1
2 4 1 0 7 −1
A=
1
.
2 0 1 4 0
1 2 1 1 7 1
The reduced row echelon form of the matrix is given by
 
1 2 0 0 2 −1
0 0 1 0 3 1 
R= 0 0 0 1 2 1  .

0 0 0 0 0 0
The row vectors

r1 = 1 2 0 0 2 −1

r2 = 0 0 1 0 3 1

r3 = 0 0 0 1 2 1
forms a basis for the row space of A. Note that first, third and fourth
columns corresponds to leading 1’s. Therefore the column vectors
     
1 0 0
2  1 0
c1 =    
1 c2 = 0 c3 = 1
 
1 1 1
forms a basis for the column space of A.
Note that the basis for the column space is just the column vec-
tors of A while for the row space they are not the rows of A. We can
find the basis for the row space consisting of rows of A by reducing
AT to reduced row echelon form.
156 Chapter 5. Real Vector Spaces

The reduced row echelon form of AT is


 
1 0 0 −2
0 1 0 1 
 
0 0 1 1 
 .
0 0 0 0 
0 0 0 0 
0 0 0 0
The rows corresponding to the first, second and third columns con-
stitutes the basis for the row space of A; that is, the vectors

r′1 = 1 2 0 0 2 −1

r′2 = 2 4 1 0 7 −1

r′3 = 1 2 0 1 4 0
forms a basis for the row space of A. In this example, the rank of A
is 3.
The solutions of the linear system Ax = 0 is obtained by solving
the equations (obtained from the reduced row echelon form of A):
x1 + 2x2 + 2x5 − x6 = 0
x3 + 3x5 + x6 = 0
x4 + 2x5 + x6 = 0.
By taking the unknowns that does not corresponds to leading 1’s as
parameters:
x2 = r, x5 = s, x6 = t,
we get
x1 = −t − 2s − 2r, x3 = −t − 3s, x4 = −t − 2s.
Thus the solution vector is given by
         
x1 t − 2s − 2r −2 −2 1
x2   r  1 0 0
         
x3   −t − 3s  0 −3 −1
 =  = r +s +t  .
x4   −t − 2s  0 −2 −1
x   s  0 1 0
5
x6 t 0 0 1
5.6. Row space, column space and null space 157

Since the vectors


     
−2 −2 1
1 0 0
     
0 −3 −1
 ,  ,  
0 −2 −1
0 1 0
0 0 1
are linearly independent, these vectors forms a basis for the null
space of A. Thus the null space has dimension 3 (which is same as
the number of variables that does not corresponds to leading 1’s.)
We state the following theorem without proof.
Theorem 5.6.9
If A is an m × n matrix, then the rank of A is the number of leading
ones in the reduced row echelon form of A or equivalently the num-
ber of leading variables in the solution of Ax = 0 and the nullity of
A is the number of variables that does not correspond to leading
1’s or equivalently the number parameters in the general solution
of Ax = 0.

Theorem 5.6.10 Dimension theorem


If A is any matrix of order m × n, then
rank of A+ nullity of A= n.

P ROOF. From the previous theorem, it is clear that the sum of


the rank of A and nullity of A is the number of variables in the
equation Ax = 0. Since this is n, the proof is completed.

Theorem 5.6.11 Consistency Theorem


For any linear system Ax = b, the following are equivalent:
(1) Ax = b is consistent.
(2) b is in the column space of A.
(3) The rank of A and the rank of the augmented matrix [A|b]
are equal.
158 Chapter 5. Real Vector Spaces

P ROOF. We have earlier proved that Ax = b is consistent if and


only b is in the column space of A. We will now prove that b is
in the column space of A if and only if A and [A|b] have the same
rank.
Let b be in the column space of A. We will show that the
column space of A and [A|b] are same. Let r be the rank of A.
Then there is a basis S for the column space of A consisting of r
column vectors of A. Thus span(S) is the column space of A. Any
vector in the column space of [A|b] is a linear combination of
vectors in S and b. Since b is in the column space of A, any vector
in the column space of [A|b] is a linear combination of vectors in
S and conversely. Thus
span(S) = span(S ∪ {b}).
Thus the column space of A and [A|b] are same and hence the rank
of A and [A|b] are equal.
Conversely, let the rank of A and the rank of the augmented
matrix [A|b] are equal. Let r be the rank of A. Then there are r
column vectors of A that forms a basis for the column space of
A. These vectors are also basis for the column space of [A|b] and
therefore b belongs to the column space of A.

Example 5.6.5
Find λ & µ so that the system of linear equations
x+y+z = 6
x + 2y + 3z = 10
x + 2y + λ z = µ
has no solution, unique solution, or infinite number of solutions.
The given system is
    
1 1 1 x 6
1 2 3  y = 10 .
1 2 λ z µ
5.6. Row space, column space and null space 159

Augmented matrix
 
1 1 1 6
[A|b] = 1 2 3 10
1 2 λ µ
 
1 1 1 6
R = R2 − R1
∼ 0 1 2 4  2
R3 = R3 − R1
0 1 λ −1 µ −6
 
1 1 1 6
∼  0 1 2 4  R3 = R3 − R2
0 0 λ − 3 µ − 10
Note that rank of A is 2 if λ = 3 and 3 if λ ̸= 3 while the rank of
[A|b] is 2 if λ = 3 and µ = 10 and 3 otherwise.
The system has no solution when rank of [A|b] is not equal to
rank of A i.e when λ = 3 and µ ̸= 10.
The system has a solution if the rank of A is equal to rank of
[A|b], ie when λ = 3 and µ = 10 or when λ ̸= 3. The solution is
unique if λ ̸= 3.
Chapter 6
Eigenvalues and Eigenvectors

6.1. E IGENVALUES AND EIGENVECTORS

Let A = [ai j ] be a square matrix of order n. The matrix A − λ I is called


the characteristic matrix of A. The equation |A − λ I| = 0 is called the
characteristic equation of A. The determinant |A − λ I| is always a
polynomial in λ and it is called the characteristic polynomial of A.
The roots of the characteristic equation are called characteristic roots
or eigenvalues of A. Note that λ is the eigenvalue of A if A − λ I is
singular. Let λ be an eigenvalue of A. Then a vector x ̸= 0 is called
an eigenvector corresponding to λ if Ax = λ x. If x is an eigenvector
corresponding to λ of the matrix A, then Ax = λ x and since, for any
k ̸= 0, A(kx) = kAx = λ kx it follows that kx is also an eigenvector.
Example 6.1.1
Find the eigenvalues and eigenvectors of the matrix
 
3 1 1
1 3 −1 .
1 −1 3
The characteristic equation, |A − λ I| = 0, is

3 − λ 1 1

1 3 − λ −1 =0

1 −1 3 − λ
162 Chapter 6. Eigenvalues and Eigenvectors

which becomes
λ 3 − 9λ 2 + 24λ − 16 = 0.
The roots of the above equation (or the eigenvalues of the matrix)
are 1, 4, 4.
The eigenvector corresponding to the eigenvalue λ is given by
the matrix equation
  
3−λ 1 1 x1
 1 3 − λ −1  x2  = 0.
1 −1 3 − λ x3
Case (i) (λ = 1) In this case, we have to solve
  
2 1 1 x1
1 2 −1 x2  = 0
1 −1 2 x3
or
2x1 + x2 + x3 = 0, x1 + 2x2 − x3 = 0, x1 − x2 + 2x3 = 0.
Subtracting the first two equation, we get the third equation. Hence
using the first two equation, we get
x1 x2 x3 k
= = = .
3 −3 −3 3
Therefore (k, −k, −k), k ̸= 0 is an eigenvector corresponding to λ =
1.
Case (ii) (λ = 4) In this case, we have to solve
  
−1 1 1 x1
 1 −1 −1 x2  = 0
1 −1 −1 x3
or
−x1 + x2 + x3 = 0, x1 − x2 − x3 = 0, x1 − x2 − x3 = 0.
Note that all the three equations are same. Hence we use the first
equation alone. By putting x3 = 0, we have x1 = x2 = k, where
k is a constant. Similarly by putting x2 = 0, we have x1 = x3 =
k. Hence for nonzero k, (k, k, 0) and (k, 0, k) are the eigenvectors
corresponding to λ = 4. Note that for the eigenvalue λ = 4 we
have two linearly independent eigenvectors.
6.1. Eigenvalues and eigenvectors 163

Example 6.1.2
Find the eigenvalues and eigenvectors of
 
2 −2 2
1 1 1 .
1 3 −1
The characteristic equation of the given matrix is

2 − λ −2 2

1 1−λ 1 = 0.

1 3 −1 − λ
This reduces to (2 − λ )(λ 2 − 4) = 0 which gives λ = −2, 2, 2. The
eigenvectors corresponding to λ is given by
  
2 − λ −2 2 x1
 1 1−λ 1  x2  = 0.
1 3 −1 − λ x3
Case (i) (λ = −2) The corresponding eigenvector is given by
the equations
4x1 − 2x2 + 2x3 = 0, x1 + 3x2 + x3 = 0, x1 + 3x2 + x3 = 0.
Note that the second and third equations are same. Hence using the
first and second, we have
x1 x2 x3
= =
−4 −1 7
and hence the corresponding eigenvector is (−4k, −k, 7k) where
k ̸= 0.
Case (ii) (λ = 2) The corresponding eigenvector is given by the
equations
−2x2 + 2x3 = 0, x1 − x2 + x3 = 0, x1 + 3x2 − 3x3 = 0.
Note the difference of the second and third equations gives an equa-
tion proportional to the first equation. Hence using the second and
third, we have
x1 x2 x3 k
= = =
0 4 4 4
and hence the corresponding eigenvector is (0, k, k) where k ̸= 0.
164 Chapter 6. Eigenvalues and Eigenvectors

Example 6.1.3
Find the eigenvalues and eigenvectors of
 
1 0 1
A = 0 1 1  .
1 1 0
The characteristic equation of the given matrix is

1 − λ 0 1


|A − λ I| = 0 1 − λ 1 = 0
1 1 −λ
or
λ 3 − 2λ 2 − λ + 2 = 0.
The eigenvalues are λ = −1, 1, 2. The corresponding eigenvectors
are      
−1 −1 1
x1 = −1 , x2 = 1 , x3 = 1 .
    
2 0 1

Lemma 6.1.1
Let A be a square matrix and x ̸= 0, λ be a number. The vector x is
an eigenvector corresponding to λ if and only if Ax = λ x.

P ROOF. Clearly if x is an eigenvector corresponding to λ , then


Ax = λ x. Conversely, if Ax = λ x, then we have to show that λ
is an eigenvalue of A or equivalently that A − λ I is singular. If
A − λ I is nonsingular, (A − λ I)x = 0 has unique solution x = 0
which violates x ̸= 0. Therefore A − λ I is singular or equivalently
λ is an eigenvalue of A.

Theorem 6.1.1
The eigenvectors corresponding to distinct eigenvalues are linearly
independent.
6.1. Eigenvalues and eigenvectors 165

P ROOF. Let x1 , x2 , . . ., xn be the eigenvectors corresponding to


the distinct eigenvalues λ1 , λ2 , . . ., λn of a square matrix A. As-
sume that the eigenvectors are linearly dependent. Then there are
scalars α1 , α2 , . . ., αn such that
α1 x1 + α2 x2 + · · · + αn xn = 0.
Among all such relations let
(6.1.1) αi1 xi1 + αi2 xi2 + · · · + αir xir = 0
be the one which is shortest and αi j ̸= 0 for all j. Then multiplying
(??) by A we have

αi1 Axi1 + αi2 Axi2 + · · · + αir Axir = 0


or
αi1 λi1 xi1 + αi2 λi2 xi2 + · · · + αir λir xir = 0.
Multiplying (??) by λi1 and subtracting it from the above equa-
tion, we have
(λi2 − λi1 )αi2 xi2 + (λi3 − λi1 )αi3 + · · · (λir − λi1 )αir = 0
which is shorter than the shortest. Therefore the vectors can not
be linearly dependent; they are linearly independent.

Theorem 6.1.2
Let λ1 , λ2 , . . ., λn be eigenvalues of a square matrix A of order n.
Then we have the following:
(i) λ1m , λ2m , . . ., λnm are eigenvalues of Am for any positive
integer m. In general, if p(x) is a polynomial, then p(λi ),
i = 1, 2, . . . , n are the eigenvalues of p(A).
(ii) A is nonsingular if and only if no eigenvalue is zero.
Equivalently A is singular if and only if at least one eigen-
value of A is zero.
(iii) If A is nonsingular, 1/λ1 , 1/λ2 , . . ., 1/λn are the eigen-
values of A−1 . Also |A|/λ1 , |A|/λ2 , . . ., |A|/λn are the
eigenvalues of Adj A.
(iv) A and AT have same eigenvalues.
166 Chapter 6. Eigenvalues and Eigenvectors

(v) If A is a triangular matrix, the diagonal entries are the


eigenvalues of A.
(vi) The sum of the eigenvalues of A is tr A. Recall that the
trace of a matrix A, denoted by tr A, is the sum of the diag-
onal elements of a matrix A. The sum of the eigenvalues
of a skew-symmetric matrix is zero.
(vii) The product of the eigenvalues of A is the determinant.
(viii) Similar matrices have same eigenvalues.

P ROOF. In the proof of the Theorem, we use Lemma ??. (i) Since
Axi = λi xi , we have A2 xi = A(λi xi ) = λi2 xi . Therefore λi2 is an
eigenvalue of A2 . Similarly λim is an eigenvalue of Am . Let
p(x) = a0 xn + a1 xn−1 + · · · + an
be the given polynomial. Then
p(A) = a0 An + a1 An−1 + · · · + an I
and therefore
p(A)xi = a0 An xi + · · · + an xi
= a0 λin xi + a1 λin−1 xi + · · · + an xi
= (a0 λin + a1 λin−1 + · · · + an )xi
= p(λi )xi .
Therefore p(λi ) is an eigenvalue of p(A).
(ii) A is singular ⇐⇒ |A| = 0 ⇐⇒ |A − 0I| = 0 ⇐⇒ 0 is an
eigenvalue of A.
(iii) Since A is nonsingular, all eigenvalues are nonzero. Also
Ax = λ x implies A−1 Ax = A−1 λ x or A−1 x = λ1 x.
(iv) Since
|AT − λ I| = |(A − λ I)T | = |A − λ I|,
the result follows.
(v) Let A = [ai j ]. Since A − λ I is triangular and the determi-
nant of triangular matrix is the product of its diagonal entries,
|A − λ I| = (a11 − λ )(a22 − λ ) . . . (ann − λ ).
6.1. Eigenvalues and eigenvectors 167

Therefore the characteristic equation is


(a11 − λ )(a22 − λ ) . . . (ann − λ ) = 0
and the eigenvalues are the diagonal entries a11 , . . . , ann .
(vi) and (vii) can be proved by expanding the determinant |A−
λ I|.
(viii) Let A and B be similar matrices. Then there is a nonsin-
gular matrix P such that B = P−1 AP. Since
|B − λ I| = |P−1 AP − λ P−1 IP|
= |P−1 (A − λ I)P|
= |P−1 ||P||A − λ I|
= |A − λ I|,
the result follows. Note that we have used the fact that
|A−1 ||A| = |A−1 A| = |I| = 1.

Theorem 6.1.3
(i) Eigenvalues of real orthogonal matrices are real or com-
plex conjugates in pairs and have absolute value one. Also
if λ is an eigenvalue of an orthogonal matrix, so is 1/λ .
(ii) All the eigenvalues of a real symmetric matrix are real.
(iii) All the eigenvalues of a real skew-symmetric matrix are
pure imaginary.

P ROOF. (i) Let λ be an eigenvalue of a real orthogonal matrix A


and let x ̸= 0 be the corresponding eigenvector. Then Ax = λ x,
A = A, AAT = AT A = I. Now
T
xT x = xT AT Ax = xT A Ax = (Ax)T Ax
= (λ x)T λ x = λ λ xT x = |λ |2 xT x.
Since xT x is positive, we have |λ |2 = 1 or |λ | = 1.
Also the characteristic equation of real matrix has real coef-
ficients, the roots are real or complex conjugate in pairs. Let λ
be an eigenvalue of the orthogonal matrix A. Then 1/λ is an
168 Chapter 6. Eigenvalues and Eigenvectors

eigenvalue of A−1 = AT . But A and AT have same eigenvalues.


Therefore 1/λ is an eigenvalue of A.
(ii) Since A is real symmetric, we have A = A and AT = A. Let
λ be an eigenvalue of A and x be the corresponding eigenvector.
Then
T
Ax = λ x, xT A = λ xT
or
xT A = λ xT .
Post multiplying by x,
xT Ax = λ xT x
or
λ xT x = λ xT x.
Since xT x is positive, we have λ = λ . Therefore λ is real.
Proof of (iii) is similar to proof of (ii).

Theorem 6.1.4
An eigenvector cannot correspond to two different eigenvalues.

P ROOF. If x is an eigenvector of two different eigenvalues λ and


µ of a matrix A, then Ax = λ x and Ax = µ x and therefore λ x =
µ x or (λ − µ )x = 0. Since this shows that x = 0 which is not
possible. Therefore a vector cannot corresponds to two different
eigenvalues.

Example 6.1.4
Find the eigenvalues of A2 where
 
2 −2 2
A = 1 1 1 .
1 3 −1
The eigenvalues of A2 are the squares of the eigenvalues of A. Since
the eigenvalues of A are -2, 2, 2, the eigenvalues of the matrix A2
are 4, 4, 4.
6.2. Cayley-Hamilton theorem 169

Example 6.1.5
Find the sum and product of the eigenvalues of the matrix
 
2 3 −2
−2 1 1  .
1 0 2
The sum of eigenvalues is the trace of the matrix, that is,
2+1+2=5. The product of the eigenvalues is the determinant of the
matrix
2 3 −2

−2 1 1 = 21.

1 0 2

6.2. C AYLEY -H AMILTON THEOREM

Theorem 6.2.1 Cayley-Hamilton theorem


Every square matrix satisfies its characteristic equation.

P ROOF. Note that the characteristic equation of a matrix A of or-


der n is |A − λ I| = 0 is a polynomial equation of degree n. Let
this polynomial equation be
p(x) = a0 λ n + a1 λ n−1 + · · · + an = 0.
We have to prove that
p(A) = a0 An + a1 An−1 + · · · + an I = 0.
Let B be the adjoint of A − λ I. The matrix B can be written as
polynomial in λ of degree n − 1 where each of the coefficient is a
matrix of order n:
B = B0 λ n−1 + B1 λ n−2 + · · · + Bn−1 .
Since for any matrix A, (Adj A)A = A(Adj A) = |A|I, we have
(A − λ I) Adj(A − λ I) = |A − λ I|I.
Therefore
(A − λ I)(B0 λ n−1 + B1 λ n−2 + · · · + Bn−1 )
= (a0 λ n + a1 λ n−1 + · · · + an )I.
170 Chapter 6. Eigenvalues and Eigenvectors

Comparing the coefficient of powers of λ , we have


−B0 = a0 I
AB0 − B1 = a1 I
AB1 − B2 = a2 I
... ...
ABn−1 = an I.
Pre-multiplying the above equations by An , An−1 , . . ., I respec-
tively and adding them we have
a0 An + a1 An−1 + · · · + an I = 0.
This proves the result.
If A is nonsingular, then the constant term an is nonzero. Therefore,
we have
−1
I= (a0 An + a1 An−1 + · · · + an−1 A)
an
and multiplying by A−1 we have
−1
A−1 = (a0 An−1 + a1 An−2 + · · · + an−1 I).
an
Using this equation we can compute A−1 by using the powers of A.
Example 6.2.1
Verify
 the Cayley-Hamilton Theorem for the matrix A =
1 0 −2
2 2 4 . Find the inverse of A, if it exists.
0 0 2
The characteristic equation of the above matrix is given by

1 − λ 0 −2

2 2−λ 4 = 0

0 0 2−λ
and this becomes
λ 3 − 5λ 2 + 8λ − 4 = 0.
6.2. Cayley-Hamilton theorem 171
   
1 0 −6 1 0 −14
Since A2 = 6 4 12  and A3 = 14 8 28 , it is seen that
0 0 4 0 0 8
A3 − 5A2 + 8A − 4I = 0.
Since 4I = A3 − 5A2 + 8A, we get after a multiplication by A−1 /4,
1
A−1 = (A2 − 5A + 8I)
4
 
1 0 1
= −1 1/2 −2  .
0 0 1/2

Example 6.2.2
Verify Cayley-Hamilton Theorem for the matrix J of order n where
 
1 1 ··· 1
1 1 · · · 1 
J= 
 ... ... · · · ...  .
1 1 ··· 1
The characteristic equation is

1 − λ 1 ··· 1

1 1−λ ··· 1
. .. = 0.
.. ..
· · ·
. .
1 1 ··· 1−λ
Writing the first column of this determinant as the sum of all the
columns and taking the common factors in the first column we have

1 1 ··· 1

1 1 − λ · · · 1
(n − λ ) .. .. .. = 0.
. . ··· .
1 1 ··· 1−λ
172 Chapter 6. Eigenvalues and Eigenvectors

By subtracting the first row from the remaining rows, we get



1 1 · · · 1

0 −λ · · · 0
(n − λ ) .. .. .. = 0
. . ··· .
0 0 · · · −λ
which becomes
(n − λ )(−λ )n−1 = 0
or
λ n − nλ n−1 = 0.
Note J 2 = nJ and hence J k = nk−1 J = nJ k−1 . In particular we have
J n − nJ n−1 = 0. Thus J satisfies its characteristic equation.

6.3. D IAGONALIZATION
A matrix is diagonalizable if it is similar to a diagonal matrix. That
is, the matrix A is diagonalizable if there exists a nonsingular matrix M
such that M −1 AM is a diagonal matrix.
Theorem 6.3.1
A square matrix of order n is diagonalizable if and only if it has n
linearly independent eigenvectors.

P ROOF. If A is diagonalizable, then there is a nonsingular ma-


trix M such that M −1 AM = D where D = diag(d1 , d2 , d3 , . . . , dn )
is the diagonal matrix whose diagonal entries are d1 , d2 , . . ., dn
respectively. Let x1 , x2 , x3 , . . ., xn be the columns of the ma-
trix M. Since M is nonsingular, xi are nonzero and also linearly
independent. Since AM = MD, we have
[Ax1 Ax2 . . . Axn ] = A[x1 x2 . . . xn ]
= [x1 x2 . . . xn ][d1 . . . dn ]
= [d1 x1 d2 x2 . . . dn xn ]
and therefore Axi = di xi for i = 1, 2, . . . , n. Therefore xi is an
eigenvector corresponding to di by Lemma ??.
6.3. Diagonalization 173

To prove the converse, let us assume that A has n linearly inde-


pendent eigenvectors xi , i = 1, 2, . . . , n. Let λi be the eigenvalue
corresponding to xi . Let M = [x1 x2 . . . xn ] be the matrix formed
by the eigenvectors. Since Axi = λi xi , we have
AM = A[x1 x2 . . . xn ]
= [Ax1 Ax2 . . . Axn ]
= [λ1 x1 λ2 x2 . . . λn xn ]
= M diag(λ1 λ2 . . . λn )
we have M −1 AM = diag(λ1 λ2 . . . λn ). This prove that A is diag-
onalizable.
A matrix is diagonalizable if there are n linearly independent eigen-
vectors and is invertible if there are n nonzero eigenvalues. Therefore
diagonalizability is concerned with the eigenvectors and invertibility is
concerned with eigenvalues. Since eigenvectors corresponding to dis-
tinct eigenvalues are linearly independent, diagonalization can (possi-
bly) fail only when the matrix has repeated eigenvalues. (Note that case
of identity matrix! Here all the eigenvalues are equal but however there
are n linearly independent eigenvectors.) If an eigenvalue is repeated
for p times, what we need for diagonalizability is p linearly indepen-
dent eigenvectors.
If A is diagonalizable, then M −1 AM = D where D is diagonal and
therefore
A = MDM −1
and hence
A2 = MDM −1 MDM −1
= MDIDM −1
= MD2 M −1 .
In general we have
Am = MDm M −1 .
Since D = diag(d1 . . . dn ), we have
Dm = diag(d1m . . . dnm ),
the above formula for Am is useful in computing higher powers of A.
174 Chapter 6. Eigenvalues and Eigenvectors

Also if A is diagonalizable, then A can found from the corresponding


eigenvalues and eigenvectors by using the formula A = MDM −1 where
D is the diagonal matrix whose diagonal entries are eigenvalues and M
is the matrix whose columns are the corresponding eigenvectors.
Example 6.3.1
Diagonalize the matrix
 
−2 0 6
A = −1 1 2 .
−2 0 5
The characteristic equation of A is given by

−2 − λ 0 6

−1 1−λ 2 = 0,

−2 0 5−λ
or
(λ − 1)(λ 2 − 3λ + 2) = (λ − 1)2 (λ − 2) = 0.
Therefore the eigenvalues are λ = 1, 1, 2.
When λ = 1, the corresponding eigenvectors are the solution of
  
−3 0 6 x1
−1 0 2 x2  = 0.
−2 0 4 x3
The above system reduces to the following single equation:
−x1 + 2x3 = 0.
Therefore x1 = 2k, x2 = l, x3 = k where k and l are two constants.
Since rank of A − I is one, the above system of equation has two lin-
early independent solution. In fact, x1 = [2, 0, 1]T x2 = [0, 1, 0]T
are two linearly independent eigenvectors.
When λ = 2, the corresponding eigenvector is the solution of
  
−4 0 6 x1
−1 −1 2 x2  = 0.
−2 0 3 x3
6.3. Diagonalization 175

The solution is given by x3 = [3, 1, 2]T . Consider the matrix


 
2 0 3
M = [x1 x2 x3 ] = 0 1 1 .
1 0 2
The inverse of M is given by
 
2 0 −3
M −1 =  1 1 −2 .
−1 0 2
Now
   
2 0 −3 −2 0 6 2 0 3
M −1 AM =  1 1 −2 −1 1 2  0 1 1 
−1 0 2 −2 0 5 1 0 2
  
2 0 −3 2 0 −3
=  1 1 −2  1 1 −2
−2 0 4 −1 0 2
 
1 0 0
= 0 1 0  .
0 0 2

Example 6.3.2
Find the eigenvalues and eigenvectors of
 
1 1 1
A = 1 −1 1  .
1 1 −1
The characteristic equation of the given matrix is
λ 4 + λ 2 − 4λ − 4 = 0.
The eigenvalues are λ = −2, −1, 2. The corresponding eigenvec-
tors are
     
0 1 2
x1 = −1 , x2 = −1 , x3 = 1 .
1 −1 1
176 Chapter 6. Eigenvalues and Eigenvectors

The matrix of eigenvectors is


 
0 1 2
M = −1 −1 1
1 −1 1
and its inverse is given by
 
0 −3 3
1
M −1 = 2 −2 −2 .
6 2 1 1
A computation shows that
 
−2 0 0
M −1 AM =  0 −1 0 .
0 0 2
Notice that the matrix A is symmetric and the eigenvectors
x1 , x2 , x3 are orthogonal.

Example 6.3.3
Find a matrix A whose eigenvalues are 1, 2 and 3 and the corre-
sponding eigenvectors are
     
1 1 0
0 ,  1  , 1 .
1 −1 0
Consider the matrix M of eigenvectors and the diagonal matrix
D of eigenvalues
   
1 1 0 1 0 0
M = 0 1 1  , D = 0 2 0  .
1 −1 0 0 0 3
The inverse of M is given by
 
1/2 0 1/2
M −1 =  1/2 0 −1/2
−1/2 1 1/2
6.3. Diagonalization 177

and therefore
 
5/2 0 1/2
A = MDM −1 = 1/2 1 −1/2
1/2 0 5/2
is the required matrix with given eigenvalues and eigenvectors.

A matrix P is orthogonal if PPT = PT P = I. Thus for orthogonal


matrices P, we have
PT = P−1 .
A matrix A is orthogonally diagonalizable if there is an orthogonal ma-
trix P such that the matrix PT AP is diagonal. The matrix P is said to
orthogonally diagonalize the matrix A.
A set of vectors {x1 , . . . , xn } is orthonormal if
∥xi ∥ = 1 for all i
and
xi .x j = 0 for all i ̸= j.
In other words, the vectors are orthogonal and of unit norm.
If A is orthogonally diagonalizable, then there is an orthogonal ma-
trix P such that PT AP = D, where D is a diagonal matrix. Thus we
have
PDPT = P(PT AP)PT
= (PPT )A(PPT )
= IAI
= A.
Since the transpose of a diagonal matrix is itself, we have
AT = (PDPT )T
= (PT )T DT PT
= PDPT
=A
and therefore A is symmetric.
178 Chapter 6. Eigenvalues and Eigenvectors

Theorem 6.3.2
Let A be a square matrix of order n. Then the following are equiv-
alent.
(1) A is orthogonally diagonalizable.
(2) A has an orthonormal set of n vectors.
(3) A is symmetric.

P ROOF. Proof is omitted.

Example 6.3.4
Orthogonally diagonalize the matrix
 
1 1 1
A = 1 −1 1  .
1 1 −1
The characteristic equation of the given matrix is
λ 4 + λ 2 − 4λ − 4 = 0.
The eigenvalues are λ = −2, −1, 2. The corresponding eigenvec-
tors are
     
0 1 2
x1 = −1 , x2 = −1 , x3 = 1 .
1 −1 1
The corresponding orthonormal eigenvectors are given by
   1   2 
0 √ √
3 6
− √1  − √1   √1 
v1 =  2  , v2 =  3  , v3 =  6
.
√1 − √3
1 √1
2 6
The matrix of eigenvectors is
 
0 √1 √2
3 6
− √1 − √1 √1 
P= 2 3 6
.
√1
2
− √1
3
√1
6
Now it easy to see that
PT P = PPT = I
6.4. Quadratic forms 179

and  
−2 0 0
PT AP =  0 −1 0 .
0 0 2

6.4. Q UADRATIC FORMS

A quadratic form q in n variables x1 , x2 , . . . , xn is a homogeneous


polynomial of degree 2:
n n
q= ∑ ∑ ai j xix j .
i=1 j=1

Note that
n n
q= ∑ ∑ a jixix j .
i=1 j=1
and therefore
n n ai j + a ji n n
q= ∑ ∑ xi x j = ∑ ∑ Ai j xi x j .
i=1 j=1 2 i=1 j=1
ai j +a ji
where Ai j = 2 satisfies
n n a ji + ai j
a ji = ∑ ∑ xi x j = A ji .
i=1 j=1 2

Also if x = [x1 , x2 , . . . , xn ]T , then


q = xT Ax
where A = [Ai j ] and note that A is symmetric. Therefore any quadratic
form can be written as q = xT Ax where A is symmetric. The matrix A
is the matrix of the quadratic form q.
For example,
q = 3x12 + 4x1 x2 + 5x22
can be written as
q = 3x12 + 2x1 x2
+2x1 x2 + 5x22
180 Chapter 6. Eigenvalues and Eigenvectors

and therefore   
3 2 x1
q = (x1 x2 ) .
2 5 x2

6.5. L INEAR TRANSFORMATION AND CANONICAL FORM

Let q = xT Ax be a quadratic form in n variables x1 , x2 , . . . , xn . Let


Y = [y1 , y2 , . . . , yn ] and x = PY where P is nonsingular matrix. This
transformation express each yi as a linear combination of x1 , x2 , . . . , xn .
Since P is nonsingular, Y = P−1 x and therefore every xi is a linear com-
bination of y1 , y2 , . . . , yn . We call this transform x = PY as nonsingular
linear transform. By using this transform we have
q = (PY )T A(PY ) = (Y T PT )A(PY ) = Y T (PT AP)Y = Y T BY
where B = PT AP. Note that A is symmetric, so is B. Therefore the
transform x = PY transform the quadratic form q in x to a quadratic
form in Y .
If B = PT BP = diag[d1 d2 . . . dn ] is a diagonal matrix, then the
quadratic form q becomes
q = Y T diag[d1 d2 . . . dn ]Y = d1 y21 + d2 y22 + · · · + dn y2n ,
a quadratic form having only square terms. Such quadratic forms are
known as canonical.
If we reduce the given quadratic form q by a orthogonal matrix P to
canonical form, we call this as orthogonal reduction to canonical form.
Note that the matrix of the quadratic form is symmetric and therefore it
can be diagonalized by orthogonal matrix. (Prove this!) Therefore every
quadratic form can reduced to canonical form by orthogonal transform.
Definition 6.5.1
A quadratic form q = xT Ax is
(1) positive definite if xT Ax > 0 for all x ̸= 0
(2) positive semi-definite if xT Ax ≥ 0 for all x ̸= 0 ( and
xT Ax = 0 for some x ̸= 0.)
(3) negative definite if xT Ax < 0 for all x ̸= 0
(4) negative semi-definite if xT Ax ≤ 0 for all x ̸= 0 ( and
xT Ax = 0 for some x ̸= 0.)
(5) indefinite otherwise. ( xT Ax is positive for x and negative
for some other x.)
6.5. Linear transformation and canonical form 181

We have the following theorem:


Theorem 6.5.1
A quadratic form q = xT Ax is
(1) positive definite if all the eigenvalues of A are positive.
(2) positive semi-definite if all the eigenvalues of A are non-
negative (and at least one of them is zero).
(3) negative definite if all the eigenvalues of A are negative.
(4) negative semi-definite if all the eigenvalues of A are non
positive (and at least one of them is zero).
(5) indefinite if some of the eigenvalues of A are positive and
some others are negative.

The rank of the quadratic form q = xT Ax is the rank of A. The


number of terms in any canonical form of q is the index of q. (Note that
the number of the terms in any canonical form is equal to the number of
nonzero eigenvalues of the matrix A.) The rank and index are denoted
by r and p and the quantity s = r − p is the signature of q.
Theorem 6.5.2
A quadratic form q = xT Ax is
(1) positive definite if s = 0 and r = n.
(2) positive semi-definite if s = 0 and r < n.
(3) negative definite if p = 0 and r = n.
(4) negative semi-definite if p = 0 and r < n.

Let Di be the determinant formed by the first i rows and columns of


A. That is

a11 a12 . . . a1i

Di = . . . . . . . . . . . . . . . . .
ai1 ai2 . . . aii

Note that D1 = a11 ,



a11 a12 . . . a1n
a a
D2 = 11 12 and Dn = . . . . . . . . . . . . . . . . . = |A|.
a21 a22 an1 an2 . . . ann

In terms of these determinants we have another useful result.


182 Chapter 6. Eigenvalues and Eigenvectors

Theorem 6.5.3
A quadratic form q = xT Ax is
(1) positive definite if Di > 0 for i = 1, 2, . . . , n.
(2) positive semi-definite if Di ≥ 0 for i = 1, 2, . . . , n and at
least one Di = 0.
(3) negative definite if (−1)n Di > 0 for i = 1, 2, . . . , n.
(4) negative semi-definite if (−1)n Di ≥ 0 for i = 1, 2, . . . , n
and at least one Di = 0.

Example 6.5.1
Find the eigenvalues and the determinant of A + kI in terms of the
eigenvalues of A.
Since |A − λ I| = |(A + kI) − (λ − k)I|, λ is an eigenvalue of A
if and only if λ − k is an eigenvalue of A + kI. If λi , i = 1, . . . , n are
the eigenvalues of A, then λi − k, i = 1, . . . , n are the eigenvalues of
A + kI. Also |A + kI| = (λ1 − k) · · · (λn − k).

Example 6.5.2
Show that A and A + I can not be similar.
By the previous problem, it is clear that A and A + I have differ-
ent eigenvalues. But similar matrices have same set of eigenvalues.
Therefore A and A + I can not be similar.

Example 6.5.3
Prove that the sum of the eigenvalues of A + B is equals to the sum
of all individual eigenvalues of A and B.
Note that the sum of the eigenvalues of a matrix equals to the
trace of the matrix. The result follows since trace of A + B is equal
to the sum of the trace of A and trace of B,

Example 6.5.4
 
a b
Verify Cayley-Hamilton Theorem for A = .
c d
6.5. Linear transformation and canonical form 183

The characteristics equation is


(a − λ )(d − λ ) − bc = 0
or
λ 2 − (a + d)λ + ad − bc = 0.
 that A2 − (a + d)A + (ad
 − bc)I = 0.
Note that we 2
 haveto verify
a b a b a + bc ab + bd
Since A2 = = , we have
c d c d ac + cd bc + d 2
A2 − (a + d)A + (ad − bc)I
 2     
a + bc ab + bd a b ad − bc 0
= − (a + d) +
ac + cd bc + d 2 c d 0 ad − bc
 2 
a + bc − a2 − ad + ad − bc ab + bd − ab − bd
= = 0.
ac + cd − ac − dc bc + d 2 − ad − d 2 + ad − bc

Example 6.5.5
 
a b c
Verify Cayley Hamilton Theorem for A = 0 d e .
0 0 f
The characteristic equation is (a − λ )(d − λ )( f − λ ) = 0 or
λ 3 − (a + d + f )λ 2 + (ad + a f + d f )λ − ad f = 0. Since
 2 
a ab + bd ac + be + c f
A2 =  0 d2 de + e f 
0 0 f2
and
 
a3 a2 b + abd + bd 2 a2 c + abe + ac f + bde + be f + c f 2
A3 =  0 d3 d 2 e + de f + e f 2 ,
0 0 f 3

we get
A3 − (a + d + f )A2 + (ad + a f + d f )A − ad f I = 0.

You might also like