0% found this document useful (0 votes)
199 views

Duc Tran - Basic Linear Algebra - An Introduction With An Intuitive Approach (2022)

This document appears to be the table of contents and introductory chapter for a book on basic linear algebra. It introduces vectors and matrices, with chapters covering topics like vector addition and subtraction, linear combinations, norms of vectors, dot products, matrix addition and multiplication, row operations, reduced row echelon form, and systems of linear equations. The book takes an intuitive approach to teaching these foundational concepts in linear algebra. It was written by Duc Van Khanh Tran, an undergraduate mathematics student at the University of Texas at Austin.

Uploaded by

trade fast
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views

Duc Tran - Basic Linear Algebra - An Introduction With An Intuitive Approach (2022)

This document appears to be the table of contents and introductory chapter for a book on basic linear algebra. It introduces vectors and matrices, with chapters covering topics like vector addition and subtraction, linear combinations, norms of vectors, dot products, matrix addition and multiplication, row operations, reduced row echelon form, and systems of linear equations. The book takes an intuitive approach to teaching these foundational concepts in linear algebra. It was written by Duc Van Khanh Tran, an undergraduate mathematics student at the University of Texas at Austin.

Uploaded by

trade fast
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 190

Basic Linear Algebra

An Introduction with an Intuitive


Approach

Duc Van Khanh Tran


Table of Contents

About the Author 1

Preface 2

I Vectors 3

Chapter 1: What are Vectors? 5


1.1 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Graphical Representation of Vectors . . . . . . . . . . . . . . . . . . 6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2: Addition, Subtraction, and Scalar Multiplication 11


2.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . 15
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 3: Linear Combination and Linear Independence 20


3.1 Linear Combination and Span of Vectors . . . . . . . . . . . . . . . . 21
3.2 Linear Independence and Dependence . . . . . . . . . . . . . . . . . 22
3.3 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Vector Space as Span of Vectors . . . . . . . . . . . . . . . . . . . . 26
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 4: Norms of Vectors 33


4.1 Definition and Properties of Norms . . . . . . . . . . . . . . . . . . . 34
4.2 Graphical Meaning of `2 Norm . . . . . . . . . . . . . . . . . . . . . 38
4.3 Unit Vectors and Normalization of Vectors . . . . . . . . . . . . . . . 41
4.4 Graphical Meaning of `1 Norm . . . . . . . . . . . . . . . . . . . . . 43
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 5: Dot Product and Orthogonality 47


5.1 Transpose of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Dot Product of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Graphical Representation of Dot Product . . . . . . . . . . . . . . . 51
5.4 Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

II Matrices 59

Chapter 6: What are Matrices? 61


6.1 Introduction to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Main Diagonal, Trace, and Transpose . . . . . . . . . . . . . . . . . 63
6.3 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . 65
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Chapter 7: Addition, Subtraction, and Multiplication 72


7.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Matrix Multiplication with Scalars and Vectors . . . . . . . . . . . . 74
7.3 Multiplication of Two Matrices . . . . . . . . . . . . . . . . . . . . . 77
7.4 Distributive and Associative Properties of Matrix Multiplication . . 85
7.5 Transpose of Product of Matrices . . . . . . . . . . . . . . . . . . . . 89
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Chapter 8: Row Operations on Matrices 94


8.1 Three Types of Row Operations . . . . . . . . . . . . . . . . . . . . . 95
8.2 Row Operation as Matrix Multiplication . . . . . . . . . . . . . . . . 97
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Chapter 9: Reduced Row Echelon Form and Rank of Matrices 106


9.1 Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . 107
9.2 Rank of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Chapter 10: The Four Fundamental Subspaces 116


10.1 What are Four Fundamental Subspaces? . . . . . . . . . . . . . . . . 117
10.2 Column Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.3 Row Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.4 Null and Left Null Spaces . . . . . . . . . . . . . . . . . . . . . . . . 123
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Chapter 11: Inverse of Matrices 131


11.1 Invertible and Singular Matrices . . . . . . . . . . . . . . . . . . . . 132
11.2 Finding Inverse of an Invertible Matrix . . . . . . . . . . . . . . . . . 133
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Chapter 12: Determinant of Matrices 138
12.1 Properties of Determinant . . . . . . . . . . . . . . . . . . . . . . . . 139
12.2 Determinant Formula for 2 x 2 Matrices . . . . . . . . . . . . . . . . 144
12.3 Determinant Formula for 3 x 3 Matrices . . . . . . . . . . . . . . . . 146
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Chapter 13: Systems of Linear Equations 149


13.1 Systems of Linear Equations as Matrix Linear Equations . . . . . . . 150
13.2 Characteristic of Matrix Linear Equations . . . . . . . . . . . . . . . 151
13.3 Solving Matrix Linear Equations . . . . . . . . . . . . . . . . . . . . 152
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Appendices 160

Appendix A: Cauchy-Schwarz Inequality 162

Appendix B: Circle in Taxicab Geometry 165

Appendix C: Beyond Vectors and Matrices: Blades and Tensors 167

Solutions to Exercise Problems 171

Reviews 184
About the Author

Hi! My name is Duc Van Khanh Tran, and I am a Vietnamese undergraduate at


the University of Texas at Austin majoring in Pure Mathematics.

When I was in elementary school, I studied at Morinosato Elementary School in


Kanazawa, Ishikawa province, Japan for about two years. For middle school edu-
cation, I studied at Le Quy Don Middle School in Ho Chi Minh City, Vietnam.

When I was in 9th grade, I came to the USA and started my high school study at
Brentwood Christian School in Austin, Texas. The classes in my high school were
more relaxing and less stressful than the classes in my middle school, so I began
to understand and enjoy the things I studied at school. After a while, I realized
that I enjoyed math the most out of all the subjects. Also, there was a math team
in my high school, and participating in the math team made me love mathematics
even more. That is how I “fell in love” with mathematics.

When I was in 11th grade, I came across @daily math , a very popular math account
on Instagram. Inspired by the @daily math page, I also created a math account
on Instagram called @dvkt math with about 27,000 followers currently. Other than
sharing mathematics knowledge by the means of a math page on Instagram, I de-
cided to also write some math books. Before writing this book, I also wrote two
other books called An Introduction to Calculus: With Hyperbolic Functions, Limi-
ts, Derivatives, and More, which was published a little before my high school gradu-
ation, and Integrals and Sums Fiesta: An Integral Part of a Math Enthusiast’s L-
ife, which was published during my first year in college.

1
Preface

Linear Algebra is usually only taught in college and not covered sufficiently or not
covered at all in high school. However, Linear Algebra is actually not very compli-
cated, and high school students can start learning it as well. This book introduces
some important basic concepts of Linear Algebra with high school Algebra and
Geometry as the only prerequisites.

However, this book is by no means only restricted to high school students. Any
learners who want to get started on Linear Algebra in an intuitive manner are wel-
comed to read this book. Sometimes, in college, Linear Algebra can be introduced
in a rigorous manner without any intuition at all. For many students who are not
ready for the rigor, it would be good to have some intuition for the subject as a
base from which they can build up the rigor.

This book is divided into two parts: the first part is about vectors, and the second
part is about matrices. Part I starts by introducing what vectors are and then
goes into some important concepts related to vectors. Similarly, part II starts by
introducing what matrices are and then goes into discussing some important basic
concepts about matrices. At the end of the book, there are appendices about some
additional topics related to some topics discussed in part I or part II.

As mentioned before, this book introduces some basic concepts of Linear Alge-
bra in an intuitive manner. So, not all topics that would usually be covered in an
introductory college-level Linear Algebra course are covered in this book, and some
topics covered in this book are not dived as deep into as they would be in introduc-
tory college-level Linear Algebra courses. If you choose to continue learning more
Linear Algebra, you could learn more about the topics covered in this book as well
as more topics in Linear Algebra.

I hope you enjoy reading this book and learning Linear Algebra!

Duc Van Khanh Tran


Texas, USA, 2022

2
Part I

Vectors

3
Chapter 1

What are Vectors?

5
1.1 Introduction to Vectors
Let us start our journey of linear algebra with an introduction to vectors. What
are vectors? Basically, a vector is a list of numbers such as
 
1
, (1.1)
2
 
3
5 , (1.2)
6
etc. The vector in (1.1) has two elements 1 and 2, and the vector in (1.2) has three
elements 3, 5, and 6. A vector with one element, such as
 
8 ,

is simply a number, also called a scalar. There can also be vectors with more than
three elements. A vector is usually denoted as a letter with a small arrow above,
such as ~v . For example, for the vectors in (1.1) and (1.2), we can denote them as
 
1
~v =
2

and  
3
~u = 5 .
6

1.2 Graphical Representation of Vectors


Graphically, a vector with two elements
 
v1
v2

is an arrow that goes v1 units in the horizontal direction (or in the x-direction of
the xy-coordinate plane) and v2 units in the vertical direction (or in the y-direction
of the xy-coordinate plane).

The arrow goes to the right if v1 is positive, and it goes to the left if v1 is negative.
For the vertical direction, The arrow goes up if v2 is positive, and it goes down if
v2 is negative.

6
 
v1
v2
|v2 |

|v1 |
 
v1
Figure 1.1: Graphical representation of if v1 > 0 and v2 > 0
v2

 
v1
v2
|v2 |

|v1 |
 
v
Figure 1.2: Graphical representation of 1 if v1 < 0 and v2 < 0
v2

 
v1
v2
|v2 |

|v1 |
 
v
Figure 1.3: Graphical representation of 1 if v1 > 0 and v2 < 0
v2

 
v1
v2
|v2 |

|v1 |
 
v
Figure 1.4: Graphical representation of 1 if v1 < 0 and v2 > 0
v2

7
An important point to note here is that direction matters. So, a vector is an object
with a direction and a magnitude. The magnitude (or length) of vectors is discussed
in chapter 4. If the arrow starts at the origin, i.e. the point (0,0), then the vector
 
v1
v2

represents the point (v1 , v2 ) in the xy-plane (or in the two-dimensional space), i.e
the point with x-coordinate v1 and y-coordinate v2 . We will talk about space in
more details later. For example, the vector
 
1
2

represents the point (1,2).

y-axis

(1, 2)
2

x-axis
1 2 3

 
1
Figure 1.5: Vector
2

Similarly, a vector with three elements represents a point in the three-dimensional


space. In the three-dimensional space, other than the x-coordinate and the y-
coordinate, there is z-coordinate representing the height.

z-axis

y-axis

x-axis

Figure 1.6: Three-dimensional space

8
A vector  
v1
v2 
v3
represents the point (v1 , v2 , v3 ) in the three-dimensional space, i.e. the point with
x-coordinate v1 , y-coordinate v2 , and z-coordinate v3 .

Vectors with more than three elements represent points in higher-dimensional


spaces. For example, a vector with four elements represents a point in the four-
dimensional space. A point in the four-dimensional space has four coordinates
corresponding to four elements of the vector. However, spaces with four dimen-
sions and higher are hard to imagine in our head since we are only used to the
three-dimensional world that we are living in.

9
Exercises

1. Which of the following are vectors? Which


  of the following are scalars?
  2.5
5    
3 4.1 5.7
A. 9 B. C. 0.5 D.   E. π F.
1 5 12
2
10

2
2. Draw the vector . What point does it represent?
−1
 
−3
3. Draw the vector . What point does it represent?
4
 
−2
4. Draw the vector . What point does it represent?
−3
 
1
5. What point does the vector 3 represent? What is its y-coordinate?
6
 
−4
6. What point does the vector  5  represent? What is its x-coordinate?
−2
 
8
7. What point does the vector  15  represent? What is its z-coordinate?
−20
 
3
 −2 
 
 −7  represents a point in the space with how many dimensions?
8. The vector  
 5 
−11

10
Chapter 2

Addition, Subtraction, and


Scalar Multiplication

11
2.1 Addition and Subtraction
Now that we learned what vectors are, let us learn how to do basic arithmetic
operations to vectors. When we add two vectors together, we add them element-
by-element. That is, we add the first element (from top to bottom) of the first
vector to the first element of the second vector, the second element of the first
vector to the second element of the second vector, and so on. Note that we can
only add two vectors with the same number of elements (in the same dimension).
Let us look at a few examples.
   
1 0
Example 2.1: + =?
4 3
       
1 0 1+0 1
+ = = .
4 3 4+3 7
   
1 5
Example 2.2: 2 + −8 =?
3 12
       
1 5 1+5 6
2 + −8 = 2 + (−8) = −6 .
3 12 3 + 12 15
   
−3 4
 5  −9
 1  +  10  =?
Example 2.3:    

7 −7
       
−3 4 (−3) + 4 1
 5  −9 5 + (−9) −4
 + =  =  .
 1   10   1 + 10   11 
7 −7 7 + (−7) 0

Similarly to addition of numbers, vector addition satisfies commutative and asso-


ciative properties. That is, for vectors ~u, ~v , and w,
~ we have

~u + ~v = ~v + ~u

and
~u + (~v + w)
~ = (~u + ~v ) + w.
~
We can prove these properties by doing some simple algebra and using the proper-
ties of addition of numbers. Let
 
u1
 u2 
~u =  .  ,
 
 .. 
un

12
 
v1
 v2 
~v =  .  ,
 
 .. 
vn

and
 
w1
 w2 
w
~ =  . .
 
 .. 
wn

Then, we have
           
u1 v1 u1 + v 1 v1 + u1 v1 u1
 u2   v2   u2 + v2   v2 + u2   v2   u2 
~u + ~v =  .  +  .  =  =  =  ..  +  ..  = ~v + ~u
           
.. ..
 ..   ..   .   .  .  . 
un vn un + vn vn + un vn un

and
      
u1 v1 w1 u1 + (v1 + w1 )
 u2   v2   w2   u2 + (v2 + w2 ) 
~u + (~v + w)
~ =  .  +  .  +  .  = 
       
..
 ..   ..   ..  

. 
un vn wn un + (vn + wn )
      
(u1 + v1 ) + w1 u1 v1 w1
 (u2 + v2 ) + w2   u2   v2   w2 
=  =  ..  +  ..  +  ..  = (~u + ~v ) + w.
~
       
..
 .   .   .   . 
(un + vn ) + wn un vn wn

Subtraction of vectors works in the same way as addition. We subtract element-


by-element, and we can only subtract vectors in the same dimension.
   
9 4
Example 2.4: − =?
5 1
       
9 4 9−4 5
− = = .
5 1 5−1 4

  
16 −1
Example 2.5:  8  −  12  =?
−2 3
       
16 −1 16 − (−1) 17
 8  −  12  =  8 − 12  = −4 .
−2 3 (−2) − 3 −5

13
2.2 Scalar Multiplication

Recall that a scalar is a number. When we multiply a vector with a scalar, we


distribute and multiply the scalar to each element of the vector. Let us look at a
few examples.
 
−2
Example 2.6: 3 · =?
5

     
−2 3 · (−2) −6
3· = = .
5 3·5 15


1
3
 
−2 =?
Example 2.7: −4 ·  
5
0

     
1 (−4) · 1 −4
 3   (−4) · 3  −12
     
−2 = (−4) · (−2) =  8  .
−4 ·      
 5   (−4) · 5  −20
0 (−4) · 0 0

Similarly to multiplication of numbers, scalar multiplication of vectors satisfies the


distributive property. That is, for a scalar a and vectors ~u and ~v , we have

a(~u + ~v ) = a · ~u + a · ~v .

We can prove this by using the distributive property of multiplication of numbers.


Let
 
u1
 u2 
~u =  . 
 
 .. 
un

and
 
v1
 v2 
~v =  .  .
 
 .. 
vn

14
Then, we have
       
u1 v1 u1 + v 1 a(u1 + v1 )
 u2   v2   u2 + v2   a(u2 + v2 ) 
a(~u + ~v ) = a  .  +  .  = a ·  =
       
.. ..
 ..   .. 

 .   . 
un vn un + vn a(un + vn )
     
a · u1 + a · v1 u1 v1
 a · u2 + a · v2   u2   v2 
=  = a ·  ..  + a ·  ..  = a · ~u + a · ~v .
     
..
 .   .  .
a · un + a · vn un vn

2.3 Graphical Representation


Next, let us take a look at the graphical representations of these arithmetic opera-
tions to vectors.

First, let us take a look at the graphical representation of vector addition ~u + ~v . If


we put the starting point of ~v at the endpoint of ~u, then ~u + ~v is the vector going
from the starting point of ~u to the endpoint of ~v .

~v
~u + ~v

~u

Figure 2.1: Graphical representation of vector addition

Let us look at the two-dimensional case more closely. For vectors


 
u
~u = 1
u2

and  
v
~v = 1 ,
v2
we have  
u1 + v1
~u + ~v = .
u2 + v2
Assuming u1 , u2 , v1 , and v2 are all positive, graphing the vectors ~u, ~v , and ~u + ~v
would give us the following picture.

15
~v v2
~u + ~v u2 + v2

u2 v1
~u

u1
u1 + v1

Figure 2.2: Addition of two-dimensional vectors

We can see that the components of ~u + ~v are indeed u1 + v1 and u2 + v2 . From the
graphical representation of vector addition, we can derive the graphical representa-
tion of vector subtraction ~u − ~v . Since ~v + (~u − ~v ) = ~u, the graphical representation
of vector subtraction ~u − ~v is as follows.

~u − ~v
~u

~v

Figure 2.3: Graphical representation of vector subtraction

Now let us look at the graphical representation of scalar multiplication of vectors.


When we multiply a vector ~v by a scalar a, if a > 0, we stretch or shrink the vector
by a factor of a. If a > 1, we are stretching the vector. If a < 1, we are shrinking
the vector.

a · ~v

~v

Figure 2.4: Graphical representation of a · ~v when a > 1

16
~v
a · ~v

Figure 2.5: Graphical representation of a · ~v when 0 < a < 1

For example, 2 · ~v would stretch ~v by a factor of 2, and 12 · ~v would shrink ~v by a


factor of 21 . Let us look at the two-dimensional case more closely. Let
 
v
~v = 1 ,
v2

then we have  
a · v1
a · ~v = .
a · v2
Assuming v1 and v2 are positive and a > 1, graphing the vectors ~v and a · ~v would
give us the following picture.

a · ~v

a · v2
B

~v
v2

A
v1 C E
a · v1

Figure 2.6: Scalar multiplication of a two-dimensional vector

We can see that 4ABC is similar to 4ADE. So, indeed to stretch a vector, we
need to stretch each component of the vector, v1 and v2 in this case.

If a < 0, the scalar multiplication a · ~v would flip the vector ~v to the opposite
direction and then stretch or shrink it depending on whether |a| > 1 or |a| < 1.

17
~v

a · ~v

Figure 2.7: Graphical representation of a · ~v when a < 0

18
Exercises
   
−4 −1
1.  0  +  5  =?
6 −9
   
−3 −5
5 4
 9  −  10  =?
2.    

−7 −2
 
8
3. 2 ·  15  =?
−6
 
1
−3
 
4
4. −4 · 
 0  =?

 
2
−2
   
1 0
5. 3 · +5· =?
2 1
   
5 15
6. 6 · 2 − 2 ·  10  =?
0 −3

7. For any scalars a and b and vector ~v , show that (a + b) · ~v = a~v + b~v .

19
Chapter 3

Linear Combination and


Linear Independence

20
3.1 Linear Combination and Span of Vectors
Let us talk about linear combination of vectors applying what we learned in the
last chapter. What is a linear combination of vectors? A linear combination of
vectors is a sum of scalar multiples of vectors. For example, let us look back at
questions 7 and 8 in the exercises of the last chapter:
   
1 0
3· +5·
2 1
   
1 0
is a linear combination of and , and
2 1
   
5 15
6 · 2 − 2 ·  10 
0 −3
   
5 15
is a linear combination of 2 and  10  because
0 −3
       
5 15 5 15
6 · 2 − 2 ·  10  = 6 · 2 + (−2) ·  10  .
0 −3 0 −3

In general, a linear combination of n vectors ~v1 , ~v2 , · · · , ~vn is of the form

c1~v1 + c2~v2 + · · · + cn~vn

for some scalars c1 , c2 , · · · , cn . If there is only one vector ~v , then a linear combi-
nation of ~v is just a scalar multiple of ~v : c~v .

The span of vectors ~v1 , ~v2 , · · · , ~vn , denoted as

span {~v1 , ~v2 , · · · , ~vn } ,

is the set of all possible linear combinations of ~v1 , ~v2 , · · · , ~vn . If a vector ~u can be
written as a linear combination of ~v1 , ~v2 , · · · , ~vn , then we write

~u ∈ span {~v1 , ~v2 , · · · , ~vn } .



The notation
√ “∈” means “belongs to.” For example,√ we can write “ 2 ∈ R” to
mean “ 2 belongs to the set of real numbers” or “ 2 is a real number.”

Let us look at a few more examples.


     
5 4 3
Example 3.1: Write as a linear combination of and .
−6 1 8
     
5 4 3
=2 − .
−6 1 8

21

      
10 6 4 0
Example 3.2: Write  7  as a linear combination of 3,  5 , and 1.
−2 8 13 0
       
10 6 4 0
 7  = 3 3 − 2  5  + 8 1 .
−2 8 13 0
     
12  3 2 
Example 3.3: Verify that −5 ∈ span −4 , 1 .
9 0 3
 
     
12 3 2
−5 = 2 −4 + 3 1 .
9 0 3
     
12 3 2
That means −5 can be written as a linear combination of −4 and 1, so
9 0 3
     
12  3 2 
−5 ∈ span −4 , 1 .
9 0 3
 

3.2 Linear Independence and Dependence


Next, let us discuss the concept of linear independence, which is closely related to
the concept of linear combination. We say that vectors are linearly independent if
none of them can be written as a linear combination of other vectors. Otherwise,
the vectors are said to be linearly dependent.

For example, three vectors ~v1 , ~v2 , and ~v3 are linearly independent if

~v1 6∈ span {~v2 , ~v3 } ,

~v2 6∈ span {~v1 , ~v3 } ,


and
~v3 6∈ span {~v1 , ~v2 } .
Three vectors ~v1 , ~v2 , and ~v3 are linearly dependent if

~v1 ∈ span {~v2 , ~v3 } ,

~v2 ∈ span {~v1 , ~v3 } ,


or
~v3 ∈ span {~v1 , ~v2 } .

Let us look at a few examples.

22
   
1 0
Example 3.4: Are and linearly independent or dependent?
0 1
     
1 0 0
We can see that is not a scalar multiple of , which means that is
0 1 1
 
1
not a scalar multiple of , either. So,
0
   
1 0
6∈ span
0 1

and    
0 1
6∈ span .
1 0
   
1 0
Therefore, and are linearly independent.
0 1
     
5 4 3
Example 3.5: Are , , and linearly independent or dependent?
−6 1 8
From example 3.1, we know that
     
5 4 3
∈ span , .
−6 1 8
    
5 4 3
Therefore, , , and are linearly dependent.
−6 1 8
     
6 4 0
Example 3.6: Are 3,  5 , and 1 linearly independent or dependent?
8 13 0
By doing some additions and scalar multiplications for those vectors, we can see
that none of them can be written as a linear combination of other two vectors. So,
     
6  4 0 
3 6∈ span  5  , 1 ,
8 13 0
 

     
4  6 0 
 5  6∈ span 3 , 1 ,
13 8 0
 

and      
0  6 4 
1 6∈ span 3 ,  5  .
0 8 13
 
     
6 4 0
Therefore, 3,  5 , and 1 are linearly independent.
8 13 0

23

    
12 3 2
Example 3.7: Are −5, −4, and 1 linearly independent or dependent?
9 0 3
From example 3.3, we know that
     
12  3 2 
−5 ∈ span −4 , 1 .
9 0 3
 
     
12 3 2
Therefore, −5, −4, and 1 are linearly dependent.
9 0 3

Before going to the next section, let us talk about the formal definition of linear
independence: if the only solution to

c1~v1 + c2~v2 + · · · + cn~vn = 0

is c1 = c2 = · · · = cn = 0, then ~v1 , ~v2 , · · · , ~vn are linearly independent.

Well, how is this definition related to the definition given at the beginning of this
section? If all of the c1 , c2 , · · · , cn are 0, then there is no way to rearrange the
expression to write one of the vectors as a linear combination of other vectors. If
at least one of the c1 , c2 , · · · , cn is non-zero, then we can rearrange the expression
to write one of the vectors as a linear combination of other vectors. For example,
assuming that c1 6= 0, we can rearrange the expression as
c2 c3 cn
~v1 = − ~v2 − ~v3 − · · · − ~vn .
c1 c1 c1

3.3 Vector Spaces and Subspaces


We have been talking about two-dimensional and three-dimensional spaces, but we
have not discussed the concept of “space” in details.

What is a vector space? For the introductory purpose of this book, we only need
to know that a vector space is a set of vectors closed under addition and scalar
multiplication. What does “closed” here mean? That means if ~u and ~v are in a
vector space, then ~u + ~v and c~u are also in the same vector space for any scalar c.
The word “closed” here means that addition and scalar multiplication do not make
the vectors go out of the vector space.

There are a few other axioms (or conditions) for a set of vectors to be a vector
space, but they are not really important for our introductory purpose now. For
example, one of the other axioms is that the vectors in a vector space must satisfy
the commutative property, ~u +~v = ~v + ~u, but we already know that vector addition
satisfies the commutative property from chapter 2. The other axioms are more
important when we consider vector spaces in a broader sense. However, in this

24
book we are only considering vector spaces of real geometric vectors with finitely
many elements which are real numbers.

An important point to note is that a vector space must have the 0 vector. A
0 vector is a vector whose elements are all 0. For example, the three-dimensional
0 vector is  
0
0 .
0
When we do scalar multiplication, the scalar c can be 0. So, if ~v is in a vector
space, then
c~v = 0~v = 0
must also be in the same vector space.

Now why is the two-dimensional space a vector space? If it was not a vector
space, we would not call it “two-dimensional space.” Remember from chapter 1
that every point (x, y) can be represented by a vector
 
x
.
y

When we add two-dimensional vectors in the two-dimensional space together, the


result is another two-dimensional vector also in the two-dimensional space. When
we multiply a two-dimensional vector by a scalar, the result is also another two-
dimensional vector. So, the two-dimensional space is indeed a vector space closed
under addition and scalar multiplication. Similarly, the three-dimensional space is
a vector space consisting of three-dimensional vectors
 
x
y  .
z

In general, any n-dimensional space is a vector space with n-dimensional vectors.

Also, there is a notation to denote the n-dimensional space. In algebra, we learned


that the set of real numbers is denoted by R. In an n-dimensional space, each
point is a list of n real numbers, so the n-dimensional space is denoted by Rn . For
example, we can write  
1
∈ R2
5
 
1
to mean that is a vector in the two-dimensional space.
5
Each n-dimensional space Rn has subspaces. A subspace is a vector space con-
tained in another vector space of the same size or bigger size, such as Rn . A
subspace of Rn has n dimensions or fewer. For example, R2 has two-dimensional
subspace (R2 itself) and one-dimensional subspaces (lines in R2 ), and R3 has three-
dimensional subspace (R3 itself), two-dimensional subspaces (planes in R3 ), and

25
one-dimensional subspaces (lines in R3 ).

It should be easy to see that vectors lying on the same line are closed under addi-
tion and scalar multiplication if you think of the graphical representations of vector
addition and vector multiplication by a scalar. Similarly for planes in R3 , if we add
two vectors lying on the same plane together or multiply a vector lying on a plane
by a scalar, the result will be another vector lying on that same plane.

z-axis

 
0
0
0
y-axis

x-axis

Figure 3.1: A two-dimensional subspace (or a plane) in R3

Remember that a subspace is also a vector space. So, for the lines and planes to be
subspaces, they have to contain the origin, or the 0 vector. The 0 vector by itself
can also be a subspace or a vector space with zero dimension, but it is not a very
interesting vector space.

3.4 Vector Space as Span of Vectors


The concept of vector space is very closely related to the concepts of linear combi-
nation and linear independence. A vector space V can be written as the span of a
certain set of linearly independent vectors ~v1 , ~v2 , · · · , ~vn , i.e.

V = span {~v1 , ~v2 , · · · , ~vn } .

The vectors ~v1 , ~v2 , · · · , ~vn are called basis vectors of V . We will discuss why basis
vectors have to be linearly independent later in this section. Well, how does this
align with the definition of a vector space? How is the span of vectors a set of
vectors closed under addition and scalar multiplication? We know that the span of
~v1 , ~v2 , · · · , ~vn is the set of vectors of the form

c1~v1 + c2~v2 + · · · + cn~vn .

For two vectors ~u, ~v ∈ V , let

~u = c1~v1 + c2~v2 + · · · + cn~vn

26
and
~v = c01~v1 + c02~v2 + · · · + c0n~vn .
Then, we have

~u + ~v = (c1 + c01 ) ~v1 + (c2 + c02 ) ~v2 + · · · + (cn + c0n ) ~vn


∈ span {~v1 , ~v2 , · · · , ~vn }

and

c~u = cc1~v1 + cc2~v2 + · · · + ccn~vn


∈ span {~v1 , ~v2 , · · · , ~vn } .

So, V = span {~v1 , ~v2 , · · · , ~vn } is indeed a vector space closed under addition and
scalar multiplication.

Basis Vectors for n-Dimensional Space

Next, let us discuss how a vector space can be written as the span of a certain
set of vectors. Let us consider the two-dimensional space R2 first. Note that any
two-dimensional vector  
x
y
   
1 0
in R2 can be written as a linear combination of and :
0 1
     
x 1 0
=x +y .
y 0 1

So,    
1 0
R2 = span , .
0 1
However,    
1 0
,
0 1
is nottheonly set of basis vectors for R2 . For example, from
 example
  3.1, we know
5 4 3
that can be written as a linear combination of and . In fact, any
−6 1 8   
4 3
two-dimensional vector can be written as a linear combination of and :
1 8
     
x 8x − 3y 4 4y − x 3
= + .
y 29 1 29 8

So, we can also write    


2 4 3
R = span , ,
1 8

27
and    
4 3
,
1 8
is another possible set of basis vectors for R2 . What is the common characteristic
of the two sets    
1 0
,
0 1
and    
4 3
, ?
1 8
Both of these sets have two linearly independent vectors. In general, any set of two
linearly independent two-dimensional vectors can be a set of basis vectors for the
two-dimensional space R2 .

Similarly, for higher dimensions, any set of n linearly independent n-dimensional


vectors can be a set of basis vectors for n-dimensional space Rn . For example, a
possible set of basis vectors for R3 is the set of three linearly independent three-
dimensional vectors      
 1 0 0 
0 , 1 , 0 ,
0 0 1
 

and we can write      


 1 0 0 
R3 = span 0 , 1 , 0 .
0 0 1
 

Note that any vector in R3 can be written as a linear combination of these three
vectors:        
x 1 0 0
y  = x 0 + y 1 + z 0 .
z 0 0 1
It is important to note that there are only at most n linearly independent n-
dimensional vectors. Since any n-dimensional vector can be written as a linear
combination of n linearly independent n-dimensional vectors, if we have n + 1 or
more n-dimensional vectors, at least one of them must be able to be written as
a linear combination of the other n vectors, so they are linearly dependent. For
example, there are at most three linearly independent vectors in R3 ; there cannot
be 4 or 5 or more linearly independent vectors.

Basis Vectors for Subspaces

What about the basis vectors for subspaces? Let us consider the one-dimensional
subspaces in R2 , which are lines in R2 . Let us think of the graphical representa-
tion of scalar multiplication of vectors. When we multiply a vector by a scalar, we
stretch or shrink the vector and possibly flip it to the opposite direction. So, all
the scalar multiples of a vector lie on the same line with the vector. That means a
line in R2 is the span of a two-dimensional vector lying on that line.

28
y-axis

2
 
2
1 1

x-axis
−4 −3 −2 −1 1 2 3 4
−1

−2

Figure 3.2: A one-dimensional subspace in R2

For example,  
2
span
1

is the line y = 21 x in R2 , which is a one-dimensional subspace in R2 .

Similarly, a one-dimensional subspace, or a line, in R3 is the span of a three-


dimensional vector lying on that line. The basis is three-dimensional instead of
two-dimensional because the subspace is in R3 .

Let us also consider the two-dimensional subspaces, or planes, in R3 . What are the
basis vectors of a two-dimensional subspace in R3 ? Let us think back to the basis
vectors of R2 for a moment because R2 is basically a plane in R3 at z = 0. A set of
basis vectors for R2 is any set of two linearly independent two-dimensional vectors.
Similarly, a plane in R3 , not necessarily the same as R2 , is the span of any two
linearly independent three-dimensional vectors lying on that plane. Note that the
basis vectors are three-dimensional because the subspace is in R3 . For example,
   
 3 2 
span −4 , 1
0 3
 

   
3 2
is the plane in R3 with the vectors −4 and 1 lying on it, which is a two-
0 3
dimensional subspace in R3 .

In general, a set of basis vectors for a m-dimensional subspace in Rn is any set


of m linearly independent n-dimensional vectors in that subspace.

Span of Linearly Dependent Vectors

An important point about basis vectors is that basis vectors have to be linearly
independent. For example, as mentioned before, a plane, or a two-dimensional

29
subspace, in R3 is the span of any two linearly independent three-dimensional
vectors lying on that plane. So, why do the basis vectors have to be linearly inde-
pendent?

Let us consider the span of two linearly dependent vectors ~v1 and ~v2 . Since ~v1
and ~v2 are linearly dependent, let ~v2 = c~v1 for some scalar c. If ~u ∈ span {~v1 , ~v2 },
then ~u is a vector of the form

~u = c1~v1 + c2~v2 = c1~v1 + c2 c~v1 = (c1 + c2 c)~v1 ∈ span {~v1 } .

Thus, if ~v1 and ~v2 are linearly dependent, then span {~v1 , ~v2 } is the same as span {~v1 },
which is a one-dimensional vector space instead of a two-dimensional vector space.
That is why basis vectors have to be linearly independent.

Main Points of This Section

If this section was too complicated and too wordy for you, the main points of
this section are summarized below. Although understanding more detailed expla-
nations is important, you should at least memorize these facts to familiarize yourself
with the topic.

• Vector spaces (and subspaces) can be written as span of a set of vectors


called basis vectors.

• Basis vectors have to be linearly independent.

• The span of any m n-dimensional basis vectors is a m-dimensional subspace


in Rn , where m ≤ n. Remember that n-dimensional subspace in Rn is Rn itself.

• There are at most n linearly independent vectors in Rn .

30
Exercises
       
−5 1 4 0
1. Write  0  as a linear combination of 4, 6, and 0.
8 3 2 1
     
1 
 5 2 
1    
 ∈ span   ,  0  .
1

2. Verify that 
0 −2 −1

 
−2 4 3
 

       
5 −2 6 −1
3. Are 1,  3 , 0, and −3 linearly independent or dependent?
4 8 9 7

4. Which of the following is NOT a vector space?


A. The three-dimensional space R3
B. A line in R2 with y-intercept (0, 3)
C. A plane in R3 passing through (0, 0, 0)
D. The line y = 5x in R2

5. There can be at most how many linearly independent vectors in R5 ?

   
−2 −3
6. Is , a set of basis vectors for R2 ? Why or why not?
6 9

7. What is the dimension of the vector space


     
 6 4 0 
span 3 ,  5  , 1 ?
8 13 0
 

This vector space is a subspace in Rn ; what is n?

31
8. What is the dimension of the vector space
     

 1 −1 0 
     
2 ,   , 2 ?
0

span  3  5  8

 
5 1 6
 

This vector space is a subspace in Rn ; what is n?

32
Chapter 4

Norms of Vectors

33
4.1 Definition and Properties of Norms
There are many vector norms. By definition, for p ≥ 1, the `p norm of a vector
 
v1
 v2 
~v =  . 
 
 .. 
vn

is defined as p
p
||~v ||p = |v1 |p + |v2 |p + · · · + |vn |p .
For example, we have

||~v ||1 = |v1 | + |v2 | + · · · + |vn |,


p
||~v ||2 = |v1 |2 + |v2 |2 + · · · + |vn |2 ,
p
||~v ||3 = 3 |v1 |3 + |v2 |3 + · · · + |vn |3 ,
etc. Let us look at a few examples.
 
1
Example 4.1: ~u = −5, ||~u||1 =?
2

||~u||1 = |1| + | − 5| + |2| = 8.

 
3
Example 4.2: ~v = , ||~v ||2 =?
4
p
||~v ||2 = |3|2 + |4|2 = 5.

The vector norms satisfy the following properties:

1) ||~v || > 0 for all ~v 6= 0, and ||~v || = 0 if and only if ~v = 0;

2) ||k~v || = |k| · ||~v || for any scalar k;

3) ||~u + ~v || ≤ ||~u|| + ||~v ||.

Let us look at the first property. Recall that a 0 vector is a vector whose ele-
ments are all 0. If at least one element, say vi , is non-zero, then the norm ||~v ||p is
larger than 0 because |vi | > 0 for any non-zero vi .

34
y-axis
3

x-axis
−3 −2 −1 1 2 3
−1

−2

−3

Figure 4.1: Graph of absolute value function |x|

If all elements are 0, i.e. v1 = v2 = · · · = vn = 0, then


p
||~v ||p = p |0|p + |0|p + · · · + |0|p = 0.

If ||~v ||p = 0, then the only way for


p
p
|v1 |p + |v2 |p + · · · + |vn |p = 0

is when
|v1 |p = |v2 |p = · · · = |vn |p = 0,
which is when v1 = v2 = · · · = vn = 0. Remember that “if and only if” is a two-way
implication. “||~v || = 0 if and only if ~v = 0” means “||~v || = 0 if ~v = 0” and “~v = 0
if ||~v || = 0.”

Next, let us look at the second property. Since for any numbers a and b we have
|ab| = |a| · |b|,
p
||k~v ||p = p |kv1 |p + |kv2 |p + · · · + |kvn |p
p
= p |k|p |v1 |p + |k|p |v2 |p + · · · + |k|p |vn |p
p p
= p |k|p · p |v1 |p + |v2 |p + · · · + |vn |p
= |k| · ||~v ||p .

The third property is called the triangle inequality, and it is an extension of the
triangle inequality for numbers

|x + y| ≤ |x| + |y| (4.1)


2
to vectors. Let us prove the triangle inequality for numbers first. Since |x| =
|x| · |x| = x2 = x2 , we have

|x + y|2 = (x + y)2 = x2 + y 2 + 2xy.

35
Squaring the right hand side of (4.1) gives
2
(|x| + |y|) = |x|2 + |y|2 + 2|x||y| = x2 + y 2 + 2|x||y|.
When x and y have the same sign (both positive or both negative) or one of them
is 0,
2xy = 2|x||y|.
When x and y have different signs,
2xy < 2|x||y|.
Thus, we have
2xy ≤ 2|x||y|
⇔ x2 + y 2 + 2xy ≤ x2 + y 2 + 2|x||y|
2
⇔ |x + y|2 ≤ (|x| + |y|) .

y-axis

x-axis
−2 −1 1 2

Figure 4.2: Graph of function x2

By observing the graph of the function x2 , we can see that for two numbers a, b ≥ 0,
if a2 ≥ b2 , then a ≥ b. Since
|x + y| ≥ 0,
|x| + |y| ≥ 0,
and
2
|x + y|2 ≤ (|x| + |y|) ,
we can conclude that
|x + y| ≤ |x| + |y|.
1
The triangle inequality for ` norm follows directly from the triangle inequality for
numbers. Let  
u1
 u2 
~u =  .  ,
 
 .. 
un

36
then we have
||~u + ~v ||1 = |u1 + v1 | + |u2 + v2 | + · · · + |un + vn |
≤ |u1 | + |v1 | + |u2 | + |v2 | + · · · + |un | + |vn |
= ||~u||1 + ||~v ||1 .
The triangle inequality for `2 norm can be proven by using the triangle inequality
for numbers and an inequality called Cauchy-Schwarz inequality. Using the triangle
inequality for numbers, we get
p
||~u + ~v ||2 = |u1 + v1 |2 + |u2 + v2 |2 + · · · + |un + vn |2
q
2 2 2
≤ (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) . (4.2)
√ √
The inequality in (4.2) follows
√ from the fact that a ≥ b if a ≥ b and a, b ≥ 0, as
observed in the graph of x.

y-axis

x-axis
1 2 3 4

Figure 4.3: Graph of function x

By definition of `2 norm,
p p
||~u||2 + ||~v ||2 = |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 .
Squaring both sides,
p 2
2
p
(||~u||2 + ||~v ||2 ) = |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2

p 2 p 2
= |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2
p p
+ 2 |u1 |2 + |u2 |2 + · · · + |un |2 · |v1 |2 + |v2 |2 + · · · + |vn |2

=|u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2


p
+ 2 (|u1 |2 + |u2 |2 + · · · + |un |2 ) (|v1 |2 + |v2 |2 + · · · + |vn |2 ).
(4.3)
This is where we use the Cauchy-Schwarz inequality. The Cauchy-Schwarz inequal-
ity states that
2
a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n ≥ (a1 b1 + a2 b2 + · · · + an bn ) .
 

37
The proof of Cauchy-Schwarz inequality is in an appendix for those who are inter-
ested. Using the Cauchy-Schwarz inequality, we have

|u1 |2 + |u2 |2 + · · · + |un |2 |v1 |2 + |v2 |2 + · · · + |vn |2


 

2
≥ (|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |) .

Taking the square root of both sides,


p
(|u1 |2 + |u2 |2 + · · · + |un |2 ) (|v1 |2 + |v2 |2 + · · · + |vn |2 )
≥ |u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |. (4.4)

Note that here we have


q
2
(|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |) = |u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |

because
|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn | ≥ 0.
Remember that the square root √ always takes non-negative
p values. √
If a number a is
negative, then we would have a2 = |a|. For example, (−2)2 = 4 = 2 = | − 2|.
Substituting (4.4) into (4.3),
2
(||~u||2 + ||~v ||2 ) ≥|u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2
+ 2|u1 ||v1 | + 2|u2 ||v2 | + · · · + 2|un ||vn |
2 2 2
= (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) .

Taking the square root of both sides,


q
2 2 2
||~u||2 + ||~v ||2 ≥ (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) . (4.5)

Inequality is transitive: if a ≥ b and b ≥ c, then a ≥ c. Thus, from (4.2) and (4.5),


we can conclude that
||~u + ~v ||2 ≤ ||~u||2 + ||~v ||2 .
The proof of triangle inequality for `p norms for other values of p is a little bit
more complicated and requires the use of the more generalized version of Cauchy-
Schwarz inequality called Hölder’s inequality. The triangle inequality for `p norms
is also known as Minkowski inequality. We will not prove the triangle inequality
for `p norms (the Minkowski inequality) here, and the readers will find out in the
next section that our main concern in this book is the `2 norm. At the same time,
we will also explore the `1 norm a little bit.

4.2 Graphical Meaning of `2 Norm


As we learned in geometry, the distance between two points (x1 , y1 ) and (x2 , y2 ) is
q
2 2
(x2 − x1 ) + (y2 − y1 ) .

38
So, the distance between a point (x, y) and the origin (0, 0) is
p p
(x − 0)2 + (y − 0)2 = x2 + y 2 .
Hence, for a two-dimensional vector
 
v
~v = 1 ,
v2
the `2 norm p q
||~v ||2 = |v1 |2 + |v2 |2 = v12 + v22
is the distance between the point (v1 , v2 ) and the origin (0, 0). Recall from chapter
1 that the vector ~v is a vector going from (0, 0) to (v1 , v2 ). So, the distance between
(0, 0) and (v1 , v2 ) is the length of the vector ~v , and thus the `2 norm of a vector is
the length of that vector. This is also a direct consequence from the Pythagoras’
theorem.

(v1 , v2 )

p
||v||2 = v12 + v22
v2

(0, 0) v1

Figure 4.4: Graphical interpretation of `2 norm in R2

Similarly in R3 , the distance between two points (x1 , y1 , z1 ) and (x2 , y2 , z2 ) is


q
2 2 2
(x2 − x1 ) + (y2 − y1 ) + (z2 − z1 ) .

So, the `2 norm of a three-dimensional vector


 
v1
~v = v2  ,
v3
which is q
||~v ||2 = v12 + v22 + v32 ,
is the distance between (0, 0, 0) and (v1 , v2 , v3 ) and thus the length of ~v since ~v is
a vector going from (0, 0, 0) to (v1 , v2 , v3 ). In general, the `2 norm of a vector
 
v1
 v2 
~v =  . 
 
 .. 
vn

39
is the length of ~v in Rn .

Now let us take a look at the triangle inequality again. In geometry, we learned the
triangle inequality which states that the sum of lengths of two sides of a triangle
is greater than the length of the third side.

C
A

Figure 4.5: AB + BC > AC

This is actually exactly the same as the triangle inequality for `2 norm. Recall
from chapter 2 that the graphical representation of vector addition is as follows.

~v
~u

~u + ~v

Figure 4.6: Graphical representation of vector addition

From what we learned about the graphical meaning of `2 norm, we know that ~u
has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 . The triangle
inequality for `2 norm states that

||~u||2 + ||~v ||2 ≥ ||~u + ~v ||2 .

However, the case


||~u||2 + ||~v ||2 = ||~u + ~v ||2
only happens when ~v = a · ~u for some non-negative scalar a:

||~u + ~v ||2 = ||~u + a · ~u||2 = ||(1 + a) · ~u||2 = (1 + a)||~u||2


= ||~u||2 + a||~u||2 = ||~u||2 + ||a · ~u||2 = ||~u||2 + ||~v ||2 .

If ~v = a · ~u, then ~u + ~v = (1 + a) · ~u. Thus,

~u, ~v , ~u + ~v ∈ span {~u} ,

40
which means all three vectors ~u, ~v , and ~u + ~v lie on the same line. When they do
not lie on the same line and form a triangle like in figure 4.6, the triangle inequality
for `2 norm would give us

||~u||2 + ||~v ||2 > ||~u + ~v ||2 ,

which is exactly the same as the triangle inequality we learned in geometry.

4.3 Unit Vectors and Normalization of Vectors


In this section, we will discuss the unit vectors briefly. It is a simple but also im-
portant concept in linear algebra.

What are unit vectors? Well, as the word “unit” suggests, a unit vector is a vector
with length one. A unit vector is usually denoted as a letter with a small hat above,
such as v̂. Using what we learned about `2 norm, if v̂ is a unit vector, then ||v̂||2 = 1.

Let us look at a few examples of unit vectors, especially the important ones re-
lated to basis of vector spaces. In the last chapter, we learned that a possible set
of basis vectors for R2 is    
1 0
, .
0 1
There is something special about these vectors: they are unit vectors because
   
1
= 0 = 1.


0 1
2 2

These basis vectors also represent the points (1, 0) and (0, 1) on the x-axis and
the y-axis, respectively. These basis vectors are so special that they have special
notations to denote them:  
1
ı̂ =
0
and  
0
̂ = .
1

So, any vectors in R2 can be written as


 
x
= x ı̂ + y ̂.
y

A set of basis vectors consisting of unit vectors representing points on the coordinate
axes like this is called the standard basis. Similarly to R2 , the standard basis of
R3 is      
 1 0 0 
0 , 1 , 0 .
0 0 1
 

41
Note that these are unit vectors because
     
1 0 0

0 = 1 = 0 = 1.

0 0 1
2 2 2

These vectors also have special notations:


 
1
ı̂ = 0 ,
0
 
0
̂ = 1 ,
0
and  
0
k̂ = 0 .
1
The first two have the same notations as those of the standard basis vectors of R2
because they are the same points on the x-axis and the y-axis.

Next, let us talk about normalization of vectors. Normalizing a vector is turning


that vector into a unit vector. When we normalize a vector, we keep the direction
of the vector the same, and we only changes the length, or the `2 norm. Let v̂
denote the normalized version of ~v . To normalize a vector, we divide that vector
by its length:
~v
v̂ = .
||~v ||2
We can check that v̂ has length one:

= ||~v ||2 = 1.
~v
||v̂||2 =
||~v ||2 2 ||~v ||2

We can factor out ||~v1||2 , which is a scalar, from the norm by using the second prop-
erty of norms. Let us look at a few examples.
 
1
Example 4.3: Normalize the vector ~u = .
−2
The length of ~u is p √
||~u||2 = 12 + (−2)2 = 5.
The normalized vector û is
   √ 
~u 1 1 1/ √5
û = =√ = .
||~u||2 5 −2 −2/ 5
 
3
Example 4.4: Normalize the vector ~v = 0.
4

42
The length of ~v is p
||~v ||2 = 32 + 02 + 42 = 5.
The normalized vector v̂ is
   
3 3/5
~v 1  
v̂ = = 0 = 0 .
||~v ||2 5
4 4/5

4.4 Graphical Meaning of `1 Norm


In this section, we will explore the graphical meaning of `1 norm. Similarly to `2
norm, `1 norm also represents a distance but not the same distance as that of `2
norm.

The `1 norm represents the distance in Taxicab geometry, one of the non-Euclidean
geometries. The distance in Taxicab geometry is called “L1 distance.” The distance
represented by the `2 norm, which we discussed in the last section, is the distance
in Euclidean geometry, which is the type of geometry that we learn in high school
geometry. Now what is the difference between L1 distance and Euclidean distance?

q(x2 , y2 )

p(x1 , y1 )

Figure 4.7: Euclidean distance

In Euclidean geometry, the Euclidean distance between two points p(x1 , y1 ) and
q(x2 , y2 ) is the length of the line segment connecting those two points, and we have
the familiar formula
q
2 2
d(p, q) = (x2 − x1 ) + (y2 − y1 ) ,

where d(p, q) denotes the Euclidean distance between p and q. In Euclidean geom-
etry, there is only one unique path for the distance between two points.

43
q(x2 , y2 )

p(x1 , y1 )

Figure 4.8: L1 distance in Taxicab geometry

In Taxicab geometry, the L1 distance between two points p(x1 , y1 ) and q(x2 , y2 ) is
the shortest distance from p to q (or q to p) moving only horizontally and verti-
cally. If you play chess, this is like how the rooks in chess move. Unlike Euclidean
distance, L1 distance has many possible paths, as shown in figure 4.8.

q(x2 , y2 )

|y2 − y1 |

p(x1 , y1 )
|x2 − x1 |

Figure 4.9: L1 distance formula

Since we are moving horizontally and vertically, the formula for L1 distance be-
tween p(x1 , y1 ) and q(x2 , y2 ) can be obtained by adding the absolute differences of
their coordinates:
d1 (p, q) = |x2 − x1 | + |y2 − y1 |,
where d1 (p, q) denotes the L1 distance between p and q. Thus, for a vector
 
v
~v = 1 ,
v2

the `1 norm
||~v ||1 = |v1 | + |v2 |
is the L1 distance between the point (v1 , v2 ) and the origin (0, 0).

In higher dimensions, the L1 distance is the shortest distance when moving only

44
in directions along the coordinate axes. For example, in three dimensions, the L1
distance is the shortest distance when moving only in x-direction, y-direction, and
z-direction, and we have the formula

d1 (p, q) = |x2 − x1 | + |y2 − y1 | + |z2 − z1 |

for the L1 distance between p(x1 , y1 , z1 ) and q(x2 , y2 , z2 ). So, for a three-dimensional
vector  
v1
~v = v2  ,
v3
the `1 norm
||~v ||1 = |v1 | + |v2 | + |v3 |
1
is the L distance between the point (v1 , v2 , v3 ) and the origin (0, 0, 0).

In general, the `1 norm of a vector


 
v1
 v2 
~v =  . 
 
 .. 
vn

is the L1 distance between the origin and the point (v1 , v2 , · · · , vn ) in Rn .

There is another Taxicab geometry fun fact in an appendix. You should check
it out if you find Taxicab geometry interesting.

45
Exercises

2
1. ~u = −3. ||~u||1 =?
5
 
5
2. ~v = . ||~v ||2 =?
12
 
1
~ = −2.
3. Normalize the vector w
−1
   
1 2
4. Let ~u = −3 and ~v = 4.
0 1
a. Find ||~u||2 , ||~v ||2 , and ||~u + ~v ||2 .

b. Verify that ||~u + ~v ||2 ≤ ||~u||2 + ||~v ||2 .

5. Express the distance∗ of the point (12, −9) from the origin as a vector norm.

6. Express the distance of the point (7, −1, 10) from the origin as a vector norm.

7. Imagine a rook moving from the origin to the point (−15, 9) on the xy-plane
and assume that each grid is 1 cm by 1 cm.

a. What is the shortest distance the rook can travel?

b. Express this distance as a vector norm.

∗ The word “distance” in this book is assumed to be Euclidean distance unless implied by

context or specified to be otherwise.

46
Chapter 5

Dot Product and


Orthogonality

47
5.1 Transpose of Vectors
Before discussing dot product, let us talk about the transpose of vectors briefly. It
is a very simple concept. When we take the transpose of a vector, we rotate the
vector so that it becomes horizontal: a vertical vector becomes a horizontal vector.
The transpose of a vector ~v is denoted as ~v T . Let
 
v1
 v2 
~v =  .  ,
 
 .. 
vn
then we have
~v T = v1
 
v2 ··· vn .
Let us look at a few examples.
 
−2
Example 5.1: ~u = , ~u T =?
7

~u T = −2
 
7 .


5
Example 5.2: ~v =  0 , ~v T =?
−11

~v T = 5
 
0 −11 .
Transpose of vectors does not have a geometrical meaning in particular. A vector ~v
and its transpose ~v T both represent the same point; it is just that ~v T is a horizontal
vector (or a row vector) while ~v is a vertical vector (or a column vector).

5.2 Dot Product of Vectors


In this section, we will talk about the definition and properties of dot product.
First, what is a dot product? The dot product of ~u and ~v is denoted as ~u ·~v . When
we take the dot product of two vectors, we multiply each element of the vectors
element-by-element and then add all of the products together:
   
u1 v1
 u2   v2 
~u · ~v =  .  ·  .  = u1 v1 + u2 v2 + · · · + un vn .
   
 ..   .. 
un vn
Let us look at a few examples.
   
2 −1
Example 5.3: · =?
5 4
   
2 −1
· = 2 · (−1) + 5 · 4 = 18.
5 4

48
   
1 0
Example 5.4: · =?
0 1
   
1 0
· = 1 · 0 + 0 · 1 = 0.
0 1
   
3 2
Example 5.5: −5 · 4 =?
6 3
   
3 2
−5 · 4 = 3 · 2 + (−5) · 4 + 6 · 3 = 4.
6 3
   
−1 −1
Example 5.6:  2  · −3 =?
1 5
   
−1 −1
 2  · −3 = (−1) · (−1) + 2 · (−3) + 1 · 5 = 0.
1 5

The dot product of vectors can also be written as multiplication of a row vector
and a column vector:
~u · ~v = ~u T ~v .
For example,      
2 −1   −1
· = 2 5 .
5 4 4
This fact is important when we learn the multiplication of matrices in part II.

Next, let us discuss the properties that the dot product satisfies. Like the mul-
tiplication of numbers, the dot product satisfies the commutative property,

~u · ~v = ~v · ~u,

and the distributive property,

~u · (~v + w)
~ = ~u · ~v + ~u · w.
~

Let  
u1
 u2 
~u =  .  ,
 
 .. 
un
 
v1
 v2 
~v =  .  ,
 
 .. 
vn

49
and
 
w1
 w2 
~ =  . .
w
 
 .. 
wn

Then, using the commutative property and the distributive property of multiplica-
tion of numbers, we have

   
u1 v1
 u2   v2 
~u · ~v =  .  ·  .  = u1 v1 + u2 v2 + · · · + un vn
   
 ..   .. 
un vn

  
v1 u1
 v2   u2 
= v1 u1 + v2 u2 + · · · + vn un =  .  ·  .  = ~v · ~u
   
 ..   .. 
vn un

and

         
u1 v1 w1 u1 v1 + w1
 u2   v2   w2   u2   v2 + w2 
~ =  .  ·  .  +  .  =  .  · 
~u · (~v + w)
         
..
 ..   ..   ..   ..  

. 
un vn wn un vn + wn

= u1 (v1 + w1 ) + u2 (v2 + w2 ) + · · · + un (vn + wn )

= u1 v1 + u1 w1 + u2 v2 + u2 w2 + · · · + un vn + un wn

       
u1 v1 u1 w1
 u2   v2   u2   w2 
=  .  ·  .  +  .  ·  .  = ~u · ~v + ~u · w.
~
       
 ..   ..   ..   .. 
un vn un wn

The dot product also satisfies associative property with a scalar. For any vectors
~u and ~v and any scalar a,

(a~u) · ~v = a(~u · ~v ).

50
Using the distributive property of multiplication of numbers,
        
u1 v1 a u1 v1
  u2   v2   a u2   v2 
(a~u) · ~v = a  .  ·  .  =  .  ·  .  = a u1 v1 + a u2 v2 + · · · + a un vn
        
  ..   ..   ..   .. 
un vn a un vn
   
u1 v1
 u2   v2 
= a (u1 v1 + u2 v2 + · · · + un vn ) = a  .  ·  .  = a(~u · ~v ).
   
 ..   .. 
un vn

There is an interesting relation between the `2 norm and the dot product:
2
~v · ~v = (||~v ||2 )

because
   
v1 v1
 v 2   v2 
~v · ~v =  .  ·  . 
   
 ..   .. 
vn vn
= v12
+ + · · · + vn2
v22
q 2
= 2 2 2
v1 + v2 + · · · + vn
2
= (||~v ||2 ) .

5.3 Graphical Representation of Dot Product


Now that we learned what the dot product is, let us take a look at the graphical
meaning of dot product.

The dot product ~u · ~v is the length of the vector obtained after projecting ~u on
to ~v and then multiplying by the length of ~v . Also, we know that the dot product
is commutative, so the order does not matter. We can also project ~v on to ~u and
then multiply by the length of ~u.

Let us look at the projection part first. Remember that the vector ~u has length
||~u||2 . By using the trigonometric ratio in right triangles, we get that the length of
projection of ~u on to ~v is ||~u||2 cos θ, where θ is the smaller angle between ~u and ~v .

51
~u
||~u||2

θ
~v
||~u||2 cos θ

Figure 5.1: Projection of ~u on to ~v

Then, after the projection, we need to multiply by the length of ~v , which is ||~v ||2 .
So, we have the formula
~u · ~v = ||~u||2 ||~v ||2 cos θ.
We can also use this formula to find the measure of the smaller angle between two
vectors. Rearranging the formula above, we have
 
~u · ~v
θ = cos−1 .
||~u||2 ||~v ||2
Let us look at a few examples.
   
−1 2
Example 5.7: Find the measure of the smaller angle between and .
3 4
First, we need to find the dot product of the two vectors and the length of each
vector:    
−1 2
· = (−1) · 2 + 3 · 4 = 10,
3 4

 
−1 p
2 2
3 = (−1) + 3 = 10,

2
and

 
2 p
2 2
4 = 2 + 4 = 2 5.

2
Therefore, the measure of the smaller angle θ between those two vectors is
   
10 1
θ = cos−1 √ √ = cos−1 √ = 45◦ .
10 · 2 5 2
   
1 0
Example 5.8: Find the measure of the smaller angle between  2  and  1 .
−1 −1
First, we need to find the dot product of the two vectors and the length of each
vector:    
1 0
 2  ·  1  = 1 · 0 + 2 · 1 + (−1) · (−1) = 3,
−1 −1

52
 
1
p √
 2  = 12 + 22 + (−1)2 = 6,

−1
2

and  
0
p √
 1  = 02 + 12 + (−1)2 = 2.

−1
2

Therefore, the measure of the smaller angle θ between those two vectors is
  √ !
3 3
θ = cos−1 √ √ = cos−1 = 30◦ .
6· 2 2
  √ 
0 3
Example 5.9: Find the measure of the smaller angle between and .
2 1
First, we need to find the dot product of the two vectors and the length of each
vector:

  √ 
0 3
· = 0 · 3 + 2 · 1 = 2,
2 1
 
0 p
2 2
2 = 0 + 2 = 2,

2

and
√ 2
√  q
3
=
1 3 + 12 = 2.
2

Therefore, the measure of the smaller angle θ between those two vectors is
   
2 1
θ = cos−1 = cos−1 = 60◦ .
2·2 2

Note that when ~u = ~v , the angle θ between them is 0 degree, and we have
2
~v · ~v = ||~v ||2 ||~v ||2 cos (0◦ ) = (||~v ||2 )

because cos (0◦ ) = 1.

5.4 Orthogonal Vectors


Next, let us discuss the orthogonality of vectors and how to determine when two
vectors are orthogonal. The word “orthogonal” is just a fancier word for “per-
pendicular.” However, it is worth noting that the word “orthogonal” is better for
vectors in general, and the word “perpendicular” only applies to geometric vectors
(arrows) that we are learning in this book. If you continue to learn more Linear
Algebra, you will learn that there are more types of vectors than just the geometric
vectors. For the non-geometric vectors, it only makes sense to say that they are or-
thogonal, not perpendicular. Although we only discuss orthogonality of geometric

53
vectors here, the definition of orthogonality of vectors discussed here still applies
to non-geometric vectors in general.

We can determine whether two non-zero vectors are orthogonal by using the dot
product. If the dot product of two non-zero vectors is equal to 0, then those two
non-zero vectors are orthogonal. Let us see why this is true.

There are two ways to see why this is true. First, we can use the formula ob-
tained in the last section:

~u · ~v = ||~u||2 ||~v ||2 cos θ.

When two vectors ~u and ~v are orthogonal, the angle θ between them is 90◦ . So,

~u · ~v = ||~u||2 ||~v ||2 cos (90◦ ) = 0

when ~u and ~v are orthogonal because cos (90◦ ) = 0.

~u + ~v
~v

~u

Figure 5.2: ~u + ~v when ~u and ~v are orthogonal

Another way is by using the Pythagoras’ theorem. When two vectors ~u and ~v are
orthogonal, we have the graph for the vector addition ~u + ~v as in figure 5.2.

Recall that ~u has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 .
By Pythagoras’ theorem, we have
2 2 2
(||~u||2 ) + (||~v ||2 ) = (||~u + ~v ||2 ) .

Using the fact that the dot product of a vector with itself is the square of the `2
2
norm of that vector, i.e. ~v · ~v = (||~v ||2 ) , we have

~u · ~u + ~v · ~v = (~u + ~v ) · (~u + ~v ).

Using the distributive property of dot product,

(~u + ~v ) · (~u + ~v ) = ~u · (~u + ~v ) + ~v · (~u + ~v )


= ~u · ~u + ~u · ~v + ~v · ~u + ~v · ~v
= ~u · ~u + ~v · ~v + 2~u · ~v .

Thus, we get
~u · ~u + ~v · ~v = ~u · ~u + ~v · ~v + 2~u · ~v .

54
Subtracting ~u · ~u + ~v · ~v from both sides, we obtain

2~u · ~v = 0
~u · ~v = 0

when ~u and ~v are orthogonal.

Let us look at a few examples of orthogonal vectors. From example 5.4, we know
that    
1 0
· = 0,
0 1
   
1 0
so and are orthogonal. From example 5.6, we know that
0 1
   
−1 −1
 2  · −3 = 0,
1 5
   
−1 −1
so  2  and −3 are orthogonal.
1 5
The concept of orthogonal vectors can be extended to more than two vectors.
When there are more than two vectors, the vectors are orthogonal if each vector is
orthogonal to each of the other vectors. For example, let us look at the standard
basis vectors for R3 :      
 1 0 0 
0 , 1 , 0 .
0 0 1
 

These vectors are orthogonal because


           
1 0 1 0 0 0
0 · 1 = 0 · 0 = 1 · 0 = 0.
0 0 0 1 0 1

In the last chapter, we learned that the standard basis vectors are unit vectors.
Here, we saw that the standard basis vectors are orthogonal as well. Vectors that
are both unit and orthogonal like these are called orthonormal vectors. If we have a
set of orthogonal vectors, we can always obtain orthonormal vectors by normalizing
each vector. For example, we know that
 
−1
2
1

and  
−1
−3
5

55
are orthogonal, so we can obtain orthonormal vectors by normalizing
 
−1
2
1

and  
−1
−3 .
5

56
Exercises

 
2
5 T
 0 , ~v =?
1. ~v =  

−8

2. For any vectors ~u and ~v , show that (~u + ~v ) T = ~u T + ~v T .

   
1 9
3. −3 ·  7  =?
−4 −6
 
  8
4. −1 6 =?
3


  1
5. 9 −2 4 −5 =?
−7
   
1 −1
6. Find the measure of the smaller angle between and .
3 7
   
−2 3
7. Find the measure of the smaller angle between  1  and −5.
5 4
 √   √ 
1/√2 −1/√ 2
8. Are and orthogonal? If yes, are they orthonormal?
1/ 2 1/ 2
     
1 −1 1
9. Are 4, −2, and −1 orthogonal? If yes, are they orthonormal?
3 3 1

57
       
1 −1 1 1
−1 −1
 , and −1 orthogonal? If yes, are they orthonor-
1  
10. Are  
 ,  ,
−1
1 1 1
1 1 1 −1
mal?

58
Part II

Matrices

59
Chapter 6

What are Matrices?

61
6.1 Introduction to Matrices
In chapter 1, we learned that a vector is a list of numbers. A matrix is bigger; it is
a block of numbers such as  
−1 2
, (6.1)
3 −4
 
−3 0 5
, (6.2)
7 −2 6
 
−9 8 −6
1 0 7 , (6.3)
−5 −1 10
etc. A matrix is usually denoted as an uppercase letter:
 
−1 2
A= .
3 −4

We can also think of a matrix as a list of column (vertical) vectors or a list of row
(horizontal) vectors. For example,
 
  | | |
−3 0 5
= ~v1 ~v2 ~v3 
7 −2 6
| | |

where  
−3
~v1 = ,
7
 
0
~v2 = ,
−2
 
5
~v3 = ,
6
and  
~u1T
|

 
−9 8 −6  
1 0 7 = ~u2T
 
|


−5 −1 10  
~u3T
|

where  
−9
~u1 =  8  ,
−6
 
1
~u2 = 0 ,
7
 
−5
~u3 = −1 .
10

62
Similarly to vectors, matrices also have dimensions. We describe the dimensions of
a matrix like how we describe the dimensions of paper. A m × n matrix is a matrix
with m rows and n columns. For example, the matrix in (6.2) is a 2 × 3 matrix.
When m = n, a n × n matrix is a square matrix, obviously because it is a square.
For example, the matrix in (6.1) is a 2 × 2 square matrix, and the matrix in (6.3)
is a 3 × 3 square matrix.

There is also a convenient notation for the elements of matrices. Elements of a


matrix are usually denoted as lowercase letters with subscripts containing the row
number and the column number. It is like a coordinates system where we identify
a point with coordinates. For example, the element on row 2 and column 3 can be
denoted as a23 . A 3 × 3 matrix A can be written as
 
a11 a12 a13
A = a21 a22 a23  ,
a31 a32 a33

and a 2 × 4 matrix B can be written as


 
b b b13 b14
B = 11 12 .
b21 b22 b23 b24

Remember that we always put the row number before the column number, whether
they are dimensions of matrices or subscripts of matrix elements.

6.2 Main Diagonal, Trace, and Transpose


In this section, we will talk about some really basic concepts related to matrices,
particularly square matrices.

First, let us talk about the main diagonal of a square matrix. The main diago-
nal of a matrix is the diagonal segment from the top left corner to the bottom right
corner of the matrix. In other words, the main diagonal of a matrix A contains
the elements aij such that the row number i is the same as the column number j
(i = j). For example, the main diagonal of a 4 × 4 matrix
 
a11 a12 a13 a14
a21 a22 a23 a24 
 
a31 a32 a33 a34 
a41 a42 a43 a44

contains the elements a11 , a22 , a33 , and a44 .

Now what is the trace of a matrix? The trace of a matrix A, denoted as Tr(A), is
the sum of all the elements on the main diagonal of A. For example,
 
a11 a12 a13
Tr a21 a22 a23  = a11 + a22 + a33 .
a31 a32 a33

63
In general, for a n × n square matrix
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
A= . ..  ,
 
.. ..
 .. . . . 
an1 an2 ··· ann

the main diagonal of A contains the elements a11 , a22 , · · · , ann , and

Tr(A) = a11 + a22 + · · · + ann .

Next, let us discuss the transpose of a matrix. This concept also applies to non-
square matrices. In chapter 5, we learned the transpose of vectors, which are
basically matrices with only one column. When we take the transpose of a matrix,
we take the transpose of each column of that matrix. So, column 1 of A becomes
row 1 of AT , column 2 of A becomes row 2 of AT , and so on. Also, note that
as columns of A become rows of AT , the rows of A also become columns of AT .
So, taking the transpose of a matrix is basically flipping columns and rows of that
matrix.

For example, if  
a11 a12
a21 a22 
A=
a31
,
a32 
a41 a42
then  
a11 a21 a31 a41
AT = .
a12 a22 a32 a42
If  
b b12 b13
B = 11 ,
b21 b22 b23
then  
b11 b21
B T = b12 b22  .
b13 b23
So, in general, the transpose of a m × n matrix is a n × m matrix. The matrix A
above is a 4 × 2 matrix, and AT is a 2 × 4 matrix; B is a 2 × 3 matrix, and B T is
a 3 × 2 matrix.

For square matrices, the transpose is much simpler. For example, if


 
c11 c12 c13
C = c21 c22 c23  ,
c31 c32 c33

64
then  
c11 c21 c31
C T = c12 c22 c32  .
c13 c23 c33
So, we can see that when we take the transpose of a square matrix, we are basically
flipping the elements across the main diagonal. Perhaps, it would be easier to see
with numbers than with symbols: if
 
1 −3 6
M = −2 5 0 ,
9 −1 7

then  
1 −2 9
MT = −3 5 −1 .
6 0 7
Notice how the main diagonal stays the same and the other numbers are reflecting
across the main diagonal.

6.3 Special Types of Matrices


There are a few basic special types of matrices that you should be familiar with.
In this section, we will learn about those special matrices. Also, note that these
types of matrices only apply to square matrices.

Triangular Matrices

There are two types of triangular matrices: lower-triangular matrices and upper-
triangular matrices.

A lower-triangular matrix is a matrix whose elements above the main diagonal


are all 0. So, the elements on the main diagonal and below form a triangle in the
lower-half of the matrix. For example,
 
1 0 0
2 3 0
4 5 6

and  
−2 0 0 0
5 8 0 0
 
0 −1 0 0
7 0 9 −6
are lower-triangular matrices. The elements on the main diagonal and below are
not necessarily non-zero.

65
In general, a lower-triangular matrix is of the form
 
l11 0 ··· 0
 l21 l22 ··· 0 
L= . .
 
.. .. ..
 .. . . . 
ln1 ln2 ··· lnn
Note that lij = 0 whenever i < j.

Upper-triangular matrices are the opposite of lower-triangular matrices. An upper-


triangular matrix is a matrix whose elements below the main diagonal are all 0.
So, the elements on the main diagonal and above form a triangle in the upper-half
of the matrix. For example,  
5 −3
0 2
and  
10 −6 1
0 7 0
0 0 −9
are upper-triangular matrices. The elements on the main diagonal and above are
not necessarily non-zero.

In general, an upper-triangular matrix is of the form


 
u11 u12 · · · u1n
 0 u22 · · · u2n 
U = . ..  .
 
.. ..
 .. . . . 
0 0 · · · unn
Note that uij = 0 whenever i > j.

Symmetric Matrices

A symmetric matrix is a matrix whose elements above the main diagonal are the
same as the elements below the main diagonal. So, the elements of the matrix are
symmetric across the main diagonal. For example,
 
1 −3
,
−3 5
 
4 3 −2
 3 0 7 ,
−2 7 −1
and  
0 1 2 3
1 4 5 6
 
2 5 7 8
3 6 8 9

66
are symmetric matrices. In general, if we denote the elements of a symmetric matrix
S as sij , then we have sij = sji . For example, for
   
s11 s12 s13 4 3 −2
S = s21 s22 s23  =  3 0 7  ,
s31 s32 s33 −2 7 −1

we have s12 = s21 = 3, s13 = s31 = −2, and s23 = s32 = 7.

Note that when we take the transpose of a symmetric matrix, we end up with
the same matrix. In other words, if S is a symmetric matrix, then S T = S. For
example,
 T  
4 3 −2 4 3 −2
 3 0 7  =  3 0 7 .
−2 7 −1 −2 7 −1

Skew-symmetric Matrices

A skew-symmetric matrix is a matrix whose elements above the main diagonal


are the negatives of the elements below the main diagonal and whose elements on
the main diagonal are all 0. For example,
 
0 3
−3 0

and  
0 −6 8
6 0 1
−8 −1 0
are skew-symmetric matrices. In general, if we denote the elements of a skew-
symmetric matrix A as aij , then we have aij = −aji . For example, for
   
a a12 0 3
A = 11 = ,
a21 a22 −3 0

we have a12 = −a21 = 3, a11 = −a11 = 0, and a22 = −a22 = 0.

Note that when we take the transpose of a skew-symmetric matrix, we end up


with the negative of the original matrix. In other words, if A is a skew-symmetric
matrix, then AT = −A. We have not talked about multiplying a matrix with a
scalar yet, but it is basically the same as multiplying a vector with a scalar: we
multiply the scalar to each element of the matrix. For example,
 T    
0 −6 8 0 6 −8 0 −6 8
6 0 1 = −6 0 −1 = −  6 0 1 .
−8 −1 0 8 1 0 −8 −1 0

Diagonal Matrices

67
A diagonal matrix is a matrix whose elements not on the main diagonal are all
0. For example,  
1 0 0
0 2 0
0 0 3
and  
−5 0 0 0
0 8 0 0
 
0 0 0 0
0 0 0 6
are diagonal matrices. The elements on the main diagonal are not necessarily non-
zero.

In general, a diagonal matrix is of the form


 
d11 0 · · · 0
 0 d22 · · · 0 
D= . .
 
.. .. ..
 .. . . . 
0 0 ··· dnn
Note that dij = 0 whenever i 6= j.

It is also important to know that a diagonal matrix is also a lower-triangular ma-


trix, an upper-triangular matrix, and a symmetric matrix. However, not all of the
lower-triangular matrices, upper-triangular matrices, and symmetric matrices are
diagonal matrices. It is like how squares are rectangles, but not all rectangles are
squares.

A diagonal matrix is a lower-triangular matrix because the elements above the


main diagonal are all 0. It is an upper-triangular matrix because the elements be-
low the main diagonal are all 0. It is also a symmetric matrix because the elements
above and below the main diagonal are all the same: they are all 0.

Identity Matrix

The identity matrix is a special case of diagonal matrices. It is the diagonal matrix
whose elements on the main diagonal are all 1:
 
1 0 ··· 0
0 1 · · · 0
I = . . . .
 
 .. .. . . ... 

0 0 ··· 1
For example, the 3 × 3 identity matrix is
 
1 0 0
I3 = 0 1 0 .
0 0 1

68
The identity matrix is the matrix version of number 1. We will see why when
learning about the matrix multiplication in the next chapter.

Orthogonal Matrices

Last but not least, orthogonal matrices are also an important type of matrices.
An orthogonal matrix is a matrix whose columns are orthonormal. That means
the column vectors must be orthogonal to each other and have unit length. The
name “orthogonal” is somewhat misleading because the columns are not just or-
thogonal; they are orthonormal.

For example,  
1 0 0
0 1 0
0 0 1
is an orthogonal matrix because the column vectors
     
1 0 0
0 , 1 , 0
0 0 1

are orthonormal, and  


1/2 −1/2 1/2 1/2
1/2
 −1/2 −1/2 −1/2
1/2 1/2 −1/2 1/2 
1/2 1/2 1/2 −1/2
is an orthogonal matrix because the column vectors
       
1/2 −1/2 1/2 1/2
1/2 −1/2 −1/2 −1/2
 ,
1/2  1/2  , −1/2 ,  1/2 
    

1/2 1/2 1/2 −1/2

are orthonormal.

69
Exercises
 
3 0 −6 7
1. What are the dimensions of the matrix 5 −2 11 9 .
8 12 −1 10

2. Which of the following


 is NOT a square
 matrix?
2 4 6 8  
  2 −5
1 0 −9 −7 −5 −3 
−3
A. B.  C. 9
0 1 1 0 −1 10 
6 10
1 2 3 4

3. Let  
3 −7 8
A=
−1 0 5

and denote the elements of A as aij . a12 =? a21 =? a23 =?

4. Let  
2 −3 6
B = −1 0 5 .
8 12 −9

a. From top to bottom, what are the elements on the main diagonal of B?
b. Tr(B) =?

 
3 0 −7
5. C = 8 −5 2 . C T =?
1 6 9

6. For any matrix A, (AT )T =?

7. Categorize each of the following matrices as lower-triangular matrix, upper-


triangular matrix, symmetric matrix, skew-symmetric matrix, and/or diagonal ma-
trix.

70
   
  0 0 0 −2 0 0  
1 2 1 10
A. B. 0 0 0 C.  7 8 0 D.
0 3 10 3
0 0 0 5 −3 1
     
1 2 0
8. Given −1, 1, and −1, check that these vectors are orthogonal, and
−1 1 1
use these three column vectors to construct a 3 × 3 orthogonal matrix.

71
Chapter 7

Addition, Subtraction, and


Multiplication

72
7.1 Addition and Subtraction
In this chapter, we will learn how to do arithmetic operations for matrices. First,
in this section, we will learn how to add or subtract matrices.

Addition of matrices is the same as addition of vectors: we add matrices element-


by-element. That is, we add the row 1 column 1 element of the first matrix to the
row 1 column 1 element of the second matrix, the row 1 column 2 element of the
first matrix to the row 1 column 2 element of the second matrix, and so on. Note
that we can only add two matrices of the same dimensions. Let us look at a few
examples.
   
1 2 3 −3 0 9
Example 7.1: + =?
4 5 6 7 −6 1

       
1 2 3 −3 0 9 1 + (−3) 2+0 3+9 −2 2 12
+ = = .
4 5 6 7 −6 1 4+7 5 + (−6) 6+1 11 −1 7

   
10 −3 2 −1
Example 7.2: + =?
6 0 −6 −7

       
10 −3 2 −1 10 + 2 −3 + (−1) 12 −4
+ = = .
6 0 −6 −7 6 + (−6) 0 + (−7) 0 −7

   
12 −11 −10 8
Example 7.3: −3 5  +  −7 6 =?
9 2 −15 1

       
12 −11 −10 8 12 + (−10) −11 + 8 2 −3
−3 5  +  −7 6 =  −3 + (−7) 5 + 6  = −10 11  .
9 2 −15 1 9 + (−15) 2+1 −6 3

Like the addition of numbers and vectors, matrix addition satisfies the commutative
and associative properties. That is, for matrices A, B, and C with the same
dimensions,

A+B =B+A

and

A + (B + C) = (A + B) + C.

Similarly to addition, subtraction of matrices is also done element-by-element. Let


us look at a few examples.

73
   
−5 3 2 5 −3
2
Example 7.4:  0 1 −6 4  =?
7 − 7
6 −8 9 −9 10
9
     
−5 3 2 2 5 −3 −5 − 2 3−5 2 − (−3)
0 1 7 − 7 −6 4  =  0 − 7 1 − (−6) 7−4 
6 −8 9 9 −9 10 6 − 9 −8 − (−9) 9 − 10
 
−7 −2 5
= −7 7 3 .
−3 1 −1
   
3 6 9 0 5 11 −1 0
Example 7.5: − =?
7 −2 −8 12 10 5 −7 8
     
3 6 9 0 5 11 −1 0 3 − 5 6 − 11 9 − (−1) 0−0
− =
7 −2 −8 12 10 5 −7 8 7 − 10 −2 − 5 −8 − (−7) 12 − 8
 
−2 −5 10 0
= .
−3 −7 −1 4

7.2 Matrix Multiplication with Scalars and Vec-


tors
In this section, we will learn how to multiply a matrix with a scalar or a vector. In
the next section, we will use what we learn here to multiply two matrices.

Matrix Multiplication with Scalars

As mentioned before in the last chapter, matrix multiplication with scalars is the
same as vector multiplication with scalars: we multiply the scalar to each element
of the matrix. Let us look at a few examples.
 
−1 3
Example 7.6: 2 =?
5 2
     
−1 3 2 · (−1) 2 · 3 −2 6
2 = = .
5 2 2·5 2·2 10 4
 
3 0 −2 1
Example 7.7: −5 −1 6 4 −5 =?
7 −3 0 2
   
3 0 −2 1 −5 · 3 −5 · 0 −5 · (−2) −5 · 1
−5 −1 6 4 −5 = −5 · (−1) −5 · 6 −5 · 4 −5 · (−5)
7 −3 0 2 −5 · 7 −5 · (−3) −5 · 0 −5 · 2
 
−15 0 10 −5
= 5 −30 −20 25  .
−35 15 0 −10

74
 
3 −5
Example 7.8: −  0 2  =?
−6 8
     
3 −5 −3 −(−5) −3 5
− 0 2  =  −0 −2  =  0 −2 .
−6 8 −(−6) −8 6 −8

Matrix Multiplication with Column Vectors

Matrix multiplication with vectors is a little bit more complicated. First, let us
learn how to multiply a matrix with a column vector. When we multiply a matrix
with a column vector, we multiply the first element (from top to bottom) of the
vector to the first column (from left to right) of the matrix, the second element of
the vector to the second column of the matrix, and so on, then we add them all up
together:  
  v1
| | |  v2 
~a1 ~a2 · · · ~an  
 ..  = v1~a1 + v2~a2 + · · · + vn~an .

| | |  .
vn
It is important to note that we can only multiply a matrix with a column vector
whose dimension is the same as the number of columns of the matrix. Let us look
at a few examples.
  
3 −5 6
Example 7.9: =?
−2 6 2
            
3 −5 6 3 −5 18 −10 8
=6 +2 = + = .
−2 6 2 −2 6 −12 12 0
 
  5
1 7 3  
Example 7.10: −1 =?
0 −2 4
2
 
  5              
1 7 3   1 7 3 5 −7 6 4
−1 = 5 + (−1) +2 = + + = .
0 −2 4 0 −2 4 0 2 8 10
2
 
−1 −5  
2 −3 3 =?
Example 7.11: 
8 6  −2
7 9
           
−1 −5   −1 −5 −3 10 7
2 −3 3 2 −3  6   6  12
 
 8  + (−2)  6  =  24  + −12 = 12 .
= 3         
8 6  −2
7 9 7 9 21 −18 3

75
From the examples, we see that the column vectors are always put to the right of
the matrices. We cannot change the order and put the column vectors to the left
because matrix multiplication is not commutative.

Also, note that when we multiply a matrix with a column vector, we are adding
scalar multiples of each column of the matrix. So, multiplying a matrix with a
column vector gives a linear combination of the columns of the matrix.

Matrix Multiplication with Row Vectors

Next, let us discuss matrix multiplication with row vectors. It is very similar
to matrix multiplication with column vectors. We multiply each element of the
row vector to each row of the matrix, and then add them all up together:

~a1T
 
|

|
  ~a2T 
 
|

 = v1~a1T + v2~a2T + · · · + vn~anT .



v1 v 2 · · · vn   .. 
 . 
~anT
|

Addition and scalar multiplication of row vectors are exactly the same as addition
and scalar multiplication of column vectors. It is important to note that we can
only multiply a matrix with a row vector whose dimension is the same as the num-
ber of rows of the matrix. Let us take a look at some examples.
 
  2 4
Example 7.12: 3 −5 =?
1 3
 
  2 4          
3 −5 = 3 2 4 + (−5) 1 3 = 6 12 + −5 −15 = 1 −3 .
1 3
 
  6 0
Example 7.13: 1 2 3 −3 8  =?
7 −5
 
  6 0      
1 2 3 −3 8  = 1 6 0 + 2 −3 8 + 3 7 −5
7 −5
     
= 6 0 + −6 16 + 21 −15
 
= 21 1 .
 
  1 0 5
Example 7.14: 3 5 =?
0 1 −3
 
  1 0 5    
3 5 = 3 1 0 5 + 5 0 1 −3
0 1 −3
   
= 3 0 15 + 0 5 −15
 
= 3 5 0 .

76
From the examples, we see that the row vectors are always put to the left of the
matrices. Also, note that multiplying a matrix with a row vector gives a linear
combination of the rows of the matrix.

7.3 Multiplication of Two Matrices


Before we learn the three methods for matrix multiplication, there are some im-
portant points about matrix multiplication that we should pay attention to.

First, we need to know the condition that the dimensions of two matrices A and
B must satisfy to be able to do the matrix multiplication AB. To be able to do
the matrix multiplication AB, the number of columns of A must be the same as
the number of rows of B. That is, we can only multiply a m × n matrix with a
n × p matrix. Also, you should notice from the examples below that the product
of a m × n matrix with a n × p matrix is a m × p matrix.

Second, you should remember that matrix multiplication is not commutative. That
is, generally
AB 6= BA.
If A is m × n and B is n × p, then we can do the multiplication AB. However, if we
change the order, the multiplication BA is impossible to do because the number of
columns of B, p, is not the same as the number of rows of A, m. If A is a m × n
matrix and B is a n × m matrix, then we can still do the multiplication BA, but
the result will be different from AB.

Third, as mentioned in the last chapter, the identity matrix is the matrix ver-
sion of number 1. In high school algebra, we learned that any number times 1 is
the number itself. Similarly, any m × n matrix multiplied by the n × n identity
matrix is the same m × n matrix itself. That is,

AIn = A.

Also, the n × n identity matrix multiplied by any n × p matrix is the same n × p


matrix itself. That is,
In A = A.

You should observe these important points while looking at the examples below.
There are three ways to multiply two matrices, and you can choose whichever way
you like the best although you should be familiar with all three. All three methods
produce the same final results; they are just three different ways to carry out the
multiplication.

Method 1: Multiplying Each Column

The first way to do the matrix multiplication

AB = C

77
is to multiply A with each column of B. The first column of C is A times the first
column of B; the second column of C is A times the second column of B; and so
on:    
| | | | | |
A ~b1 ~b2 · · · ~bp  = A~b1 A~b2 · · · A~bp  .
| | | | | |
Let us do a few examples.
  
2 3 5 −4
Example 7.15: =?
0 −1 −2 1
First, we find the first column of the product:
        
2 3 5 2 3 4
=5 + (−2) = .
0 −1 −2 0 −1 2

Then, we find the second column of the product:


        
2 3 −4 2 3 −5
= (−4) +1 = .
0 −1 1 0 −1 −1

Therefore,     
2 3 5 −4 4 −5
= .
0 −1 −2 1 2 −1
  
5 −4 2 3
Example 7.16: =?
−2 1 0 −1
First, we find the first column of the product:
        
5 −4 2 5 −4 10
=2 +0 = .
−2 1 0 −2 1 −4

Then, we find the second column of the product:


        
5 −4 3 5 −4 19
=3 + (−1) = .
−2 1 −1 −2 1 −7

Therefore,     
5 −4 2 3 10 19
= .
−2 1 0 −1 −4 −7
  
3 1 4 3 −5
Example 7.17: 2 7 2 −2 1  =?
0 5 8 2 −4
First, we find the first column of the product:
          
3 1 4 3 3 1 4 15
2 7 2 −2 = 3 2 + (−2) 7 + 2 2 = −4 .
0 5 8 2 0 5 8 6

78
Then, we find the second column of the product:
          
3 1 4 −5 3 1 4 −30
2 7 2  1  = −5 2 + 1 7 + (−4) 2 = −11 .
0 5 8 −4 0 5 8 −27
Therefore,     
3 1 4 3 −5 15 −30
2 7 2 −2 1  = −4 −11 .
0 5 8 2 −4 6 −27
 
1 2  
1 −2 0 2
Example 7.18: −3 −5 =?
3 1 −1 0
−1 6
First, we find the first column of the product:
       
1 2   1 2 7
−3 −5 1
= 1 −3 + 3 −5 = −18 .
3
−1 6 −1 6 17
Then, we find the second column of the product:
       
1 2   1 2 0
−3 −5 −2
= (−2) −3 + 1 −5 = 1 .
1
−1 6 −1 6 8
Next, we find the third column of the product:
       
1 2   1 2 −2
−3 −5 0 = 0 −3 + (−1) −5 =  5  .
−1
−1 6 −1 6 −6
Lastly, we find the fourth column of the product:
       
1 2   1 2 2
−3 −5 2 = 2 −3 + 0 −5 = −6 .
0
−1 6 −1 6 −2
Therefore,
   
1 2   7 0 −2 2
−3 1 −2 0 2
−5 = −18 1 5 −6 .
3 1 −1 0
−1 6 17 8 −6 −2
 
  1 0 0
−2 3 0 
Example 7.19: 0 1 0 =?
1 5 −8
0 0 1
First, we find the first column of the product:
 
  1        
−2 3 0   −2 3 0 −2
0 =1 +0 +0 = .
1 5 −8 1 5 −8 1
0

79
Then, we find the second column of the product:
 
  0        
−2 3 0   −2 3 0 3
1 =0 +1 +0 = .
1 5 −8 1 5 −8 5
0
Next, we find the third column of the product:
 
  0        
−2 3 0   −2 3 0 0
0 =0 +0 +1 = .
1 5 −8 1 5 −8 −8
1
Therefore,  
  1 0 0  
−2 3 0  −2 3 0
0 1 0 =
 .
1 5 −8 1 5 −8
0 0 1

Method 2: Multiplying Each Row

Another way to do the matrix multiplication


AB = C
is to multiply B with each row of A. The first row of C is the first row of A times
B; the second row of C is the second row of A times B; and so on:
 
T

~a1 T
 ~a B
|

1
|

 
T
 ~a2  ~a2T B 
   
|


B = 
.. .
..
  
 
 .   . 
 
T
~am ~a T B
|

Let us look at a few examples.


  
2 −1 0 3 2 1
Example 7.20: 1 3 −2 0 −1 0  =?
0 5 0 0 3 −5
First, we find the first row of the product:
 
  3 2 1      
2 −1 0 0 −1 0  = 2 3 2 1 + (−1) 0 −1 0 +0 0 3 −5
0 3 −5
 
= 6 5 2 .
Then, we find the second row of the product:
 
  3 2 1      
1 3 −2 0 −1 0  = 1 3 2 1 + 3 0 −1 0 + (−2) 0 3 −5
0 3 −5
 
= 3 −7 11 .

80
Next, we find the third row of the product:
 
  3 2 1      
0 5 0 0 −1 0  = 0 3 2 1 + 5 0 −1 0 +0 0 3 −5
0 3 −5
 
= 0 −5 0 .

Therefore,     
2 −1 0 3 2 1 6 5 2
1 3 −2 0 −1 0  = 3 −7 11 .
0 5 0 0 3 −5 0 −5 0
  
−1 6 12 −10 0
Example 7.21: =?
2 −5 5 −2 −3
First, we find the first row of the product:
 
  12 −10 0    
−1 6 = (−1) 12 −10 0 + 6 5 −2 −3
5 −2 −3
 
= 18 −2 −18 .

Then, we find the second row of the product:


 
  12 −10 0    
2 −5 = 2 12 −10 0 + (−5) 5 −2 −3
5 −2 −3
 
= −1 −10 15 .

Therefore,     
−1 6 12 −10 0 18 −2 −18
= .
2 −5 5 −2 −3 −1 −10 15
 
  −3 5
1 −1 2 
Example 7.22: 7 −2 =?
3 4 −2
1 0
First, we find the first row of the product:
 
  −3 5      
1 −1 2  7 −2 = 1 −3 5 + (−1) 7 −2 + 2 1 0
1 0
 
= −8 7 .

Then, we find the second row of the product:


 
  −3 5      
3 4 −2  7 −2 = 3 −3 5 + 4 7 −2 + (−2) 1 0
1 0
 
= 17 7 .

81
Therefore,  
  −3 5  
1 −1 2  −8 7
7 −2 =
 .
3 4 −2 17 7
1 0
 
−3 5  
1 −1 2
Example 7.23:  7 −2 =?
3 4 −2
1 0
First, we find the first row of the product:
 
  1 −1 2    
−3 5 = (−3) 1 −1 2 + 5 3 4 −2
3 4 −2
 
= 12 23 −16 .
Then, we find the second row of the product:
 
  1 −1 2    
7 −2 = 7 1 −1 2 + (−2) 3 4 −2
3 4 −2
 
= 1 −15 18 .
Next, we find the third row of the product:
 
  1 −1 2    
1 0 = 1 1 −1 2 + 0 3 4 −2
3 4 −2
 
= 1 −1 2 .
Therefore,    
−3 5   12 23 −16
 7 −2 1 −1 2
=1 −15 18  .
3 4 −2
1 0 1 −1 2
  
1 0 0 0 1 2
0 1 0 0 −3 4 
Example 7.24: 
0 0 1 0  5 −6 =?
 

0 0 0 1 −7 −8
First, we find the first row of the product:
 
1 2
  −3 4         
1 0 0 0   5 −6 = 1 1 2 + 0 −3 4 +0 5 −6 + 0 −7 −8

−7 −8
 
= 1 2 .
Then, we find the second row of the product:
 
1 2
  −3 4         
0 1 0 0   5 −6 = 0 1 2 + 1 −3 4 +0 5 −6 + 0 −7 −8

−7 −8
 
= −3 4 .

82
Next, we find the third row of the product:
 
1 2
  −3 4        
0 0 1 0   = 0 1 2 + 0 −3 4 +1 5 −6 + 0 −7 −8
5 −6
−7 −8
 
= 5 −6 .

Lastly, we find the fourth row of the product:


 
1 2
  −3 4         
0 0 0 1   5 −6 = 0 1
 2 + 0 −3 4 +0 5 −6 + 1 −7 −8
−7 −8
 
= −7 −8 .

Therefore,     
1 0 0 0 1 2 1 2
0 1 0 0 −3 4 −3 4
  = .
0 0 1 0  5 −6  5 −6
0 0 0 1 −7 −8 −7 −8

Method 3: Multiplying Each Row with Each Column

Last but not least, another way to do the matrix multiplication

AB = C

is to multiply each row of A with each column of B to obtain each element of C:

~a1T ~a1 ~b1 ~a1T ~b2 ~a1T ~bp


   T 
···
|

 
 | | |
~a2T ~a T ~b1 ~a2T ~b2 ··· ~a2T ~bp 
  
|

 ~b1 ~b2 ~bp  =  2.


 
.. ··· .. .. ..  .

 .
.
 
 .  | | |  . . . 
T~ T~
T
~am ~am b1 ~am b2 ··· ~a T ~bp
|

So, in general, if we denote the elements of C as cij , then we have

cij = ~aiT ~bj .

In other words, the row i column j element of C is i-th row of A times j-th column
of B. Also, recall from chapter 5 that

~u T ~v = ~u · ~v .

Let us take a look at some examples.


  
−1 2 1 0
Example 7.25: =?
2 5 0 1

83
Let   
−1 2 1 0
= C.
2 5 0 1
For the row 1 column 1 element, we have
     
  1 −1 1
c11 = −1 2 = · = (−1) · 1 + 2 · 0 = −1.
0 2 0

For the row 1 column 2 element, we have


     
  0 −1 0
c12 = −1 2 = · = (−1) · 0 + 2 · 1 = 2.
1 2 1

For the row 2 column 1 element, we have


     
  1 2 1
c21 = 2 5 = · = 2 · 1 + 5 · 0 = 2.
0 5 0

For the row 2 column 2 element, we have


     
  0 2 0
c22 = 2 5 = · = 2 · 0 + 5 · 1 = 5.
1 5 1

Therefore,       
−1 2 1 0 c11 c12 −1 2
= = .
2 5 0 1 c21 c22 2 5
  
−2 3 6 −1 2
Example 7.26: =?
5 −4 3 1 −4
Let   
−2 3 6 −1 2
= C.
5 −4 3 1 −4
For the row 1 column 1 element, we have
     
  6 −2 6
c11 = −2 3 = · = (−2) · 6 + 3 · 3 = −3.
3 3 3

For the row 1 column 2 element, we have


     
  −1 −2 −1
c12 = −2 3 = · = (−2) · (−1) + 3 · 1 = 5.
1 3 1

For the row 1 column 3 element, we have


     
  2 −2 2
c13 = −2 3 = · = (−2) · 2 + 3 · (−4) = −16.
−4 3 −4

For the row 2 column 1 element, we have


     
  6 5 6
c21 = 5 −4 = · = 5 · 6 + (−4) · 3 = 18.
3 −4 3

84
For the row 2 column 2 element, we have
     
  −1 5 −1
c22 = 5 −4 = · = 5 · (−1) + (−4) · 1 = −9.
1 −4 1
For the row 2 column 3 element, we have
     
  2 5 2
c23 = 5 −4 = · = 5 · 2 + (−4) · (−4) = 26.
−4 −4 −4
Therefore,
      
−2 3 6 −1 2 c c12 c13 −3 5 −16
= 11 = .
5 −4 3 1 −4 c21 c22 c23 18 −9 26

7.4 Distributive and Associative Properties of Ma-


trix Multiplication
Although matrix multiplication does not satisfy commutative property, it still does
satisfy distributive and associative properties:

A(B + C) = AB + AC

and
(AB)C = A(BC).
In this section, we will prove these properties of matrix multiplication.

Distributive Property

Before proving the distributive property of matrix multiplication in general, let


us first prove that
A(~u + ~v ) = A~u + A~v
for any m × n matrix A and n-dimensional vectors ~u and ~v .

Let  
| | |
A = ~a1 ~a2 · · · ~an  ,
| | |
 
u1
 u2 
~u =  .  ,
 
 .. 
un
and  
v1
 v2 
~v =  .  .
 
 .. 
vn

85
Then, we have
   
  u1 v1
| | |  u2   v2 
A(~u + ~v ) = ~a1 ~a2 ··· ~an   .  +  . 
   
 ..   .. 
| | |
un vn
 
  u1 + v1
| | |  u2 + v2 
= ~a1 ~a2 ··· ~an  
 
.. 
| | |  . 
u n + vn

= (u1 + v1 )~a1 + (u2 + v2 )~a2 + · · · + (un + vn )~an

= u1~a1 + v1~a1 + u2~a2 + v2~a2 + · · · + un~an + vn~an

= (u1~a1 + u2~a2 + · · · + un~an ) + (v1~a1 + v2~a2 + · · · + vn~an )

   
  u1   v1
| | |  u2  | | |  v2 
= ~a1 ~a2 ··· ~an   .  + ~a1 ~a2 ··· ~an   . 
   
. .
| | |  .  | | | .
un vn

= A~u + A~v .

Next, we can use this result to prove the more general distributive property:

A(B + C) = AB + AC

for any m × n matrix A and n × p matrices B and C. In this proof, we will use
method 1 of matrix multiplication, which is multiplying the first matrix to each
column of the second matrix to obtain each column of the product.

Let
 
| | |
B = ~b1 ~b2 ··· ~bp 
| | |

and
 
| | |
C = ~c1 ~c2 ··· ~cp  .
| | |

86
Then, we have
   
| | | | | |
A(B + C) = A ~b1 ~b2 · · · ~bp  + ~c1 ~c2 · · · ~cp 
| | | | | |
 
| | |
~ ~ ~
= A b1 + ~c1 b2 + ~c2 · · · bp + ~cp 

| | |
 
| | |
= A(~b1 + ~c1 ) A(~b2 + ~c2 ) · · · A(~bp + ~cp )
| | |
 
| | |
= A~b1 + A~c1 A~b2 + A~c2 · · · A~bp + A~cp 
| | |
   
| | | | | |
= A~b1 A~b2 · · · A~bp  + A~c1 A~c2 · · · A~cp 
| | | | | |
   
| | | | | |
= A ~b1 ~b2 · · · ~bp  + A ~c1 ~c2 · · · ~cp 
| | | | | |
= AB + AC.

Thus, we proved the distributive property of matrix multiplication. It could also


be proven in a similar manner using method 2 of matrix multiplication, which is
multiplying each row of the first matrix to the second matrix to obtain each row
of the product, instead of method 1.

Similarly, we also have the property

(A + B)C = AC + BC.

Associative Property

Now let us prove the associative property of matrix multiplication. Before proving
the associative property of matrix multiplication in general, let us first prove that

(AB)~v = A(B~v )

for any m × n matrix A, n × p matrix B, and p-dimensional vector ~v .

Let
 
| | |
B = ~b1 ~b2 ··· ~bp 
| | |

87
and  
v1
 v2 
~v =  .  .
 
 .. 
vp
Then, we have
 
   v1
| | |  v2 
(AB)~v = A ~b1 ~b2 ··· ~bp   
 .. 
.
| | |
vp

 
  v1
| | |  v2 
 ~
= Ab1 ~
Ab2 ··· ~
Abp   .. 
 
.
| | |
vp

= v1 · A~b1 + v2 · A~b2 + · · · + vp · A~bp

= A(v1~b1 + v2~b2 + · · · + vp~bp )

  
  v1
 | | | v2 
= A ~b1 ~b2 ··· ~bp  
 
 .. 
 . 
| | |

vp

= A(B~v ).

Note that the fourth step, where we factored out A, follows from the distributive
property of matrix multiplication. Next, we can use this result to prove the more
general associative property:

(AB)C = A(BC)

for any m × n matrix A, n × p matrix B, and p × r matrix C. Again, in this proof,


we will use method 1 of matrix multiplication, which is multiplying the first matrix
to each column of the second matrix to obtain each column of the product.

Let  
| | |
C = ~c1 ~c2 ··· ~cr  .
| | |

88
Then, we have
 
| | |
(AB)C = (AB) ~c1 ~c2 · · · ~cr 
| | |
 
| | |
= (AB)~c1 (AB)~c2 · · · (AB)~cr 
| | |
 
| | |
= A(B~c1 ) A(B~c2 ) · · · A(B~cr )
| | |
 
| | |
= A B~c1 B~c2 · · · B~cr 
| | |
  
| | |
= A B ~c1 ~c2 · · · ~cr 
| | |
= A(BC).

Thus, we proved the associative property of matrix multiplication. The associa-


tive property could also be proven in a similar manner using method 2 of matrix
multiplication instead of method 1.

7.5 Transpose of Product of Matrices


In this section, we will prove the interesting formula

(AB)T = B T AT ,

where A is any m × n matrix and B is any n × p matrix. To prove that formula,


we first need to prove that
(A~v )T = ~v T AT

for any m × n matrix A and n-dimensional vector ~v .

Let  
| | |
A = ~a1 ~a2 ··· ~an 
| | |
and  
v1
 v2 
~v =  .  .
 
 .. 
vn

89
Then, we have
  T
  v1
 | | |  v2 
(A~v )T = ~a1 ~a2 ··· ~an   . 
  
 .. 
| | |

vn
T
= (v1~a1 + v2~a2 + · · · + vn~an ) .

Using the result (~u + ~v ) T = ~u T + ~v T in question 2 of the exercises in chapter 5,


we have

(A~v )T = v1~a1T + v2~a2T + · · · + vn~anT


~a1T
 

|
  ~a2T
 

|
 
= v1 v2 · · · vn   .. .

 . 
~anT
|

|
Note that
 T
v1
 v2 
~v T
 
=  .  = v1 v2 ··· vn
 
 .. 
vn

and
~a1T
 
|

 T
| | | 
~a2T

|

AT = ~a1 ~a2
 
··· ~an  = 
 .. 

| | |  . 
~anT
|

because the transpose turns columns into rows. Therefore, we proved that

(A~v )T = ~v T AT .

Now let us prove the more general formula

(AB)T = B T AT .

We will use method 1 and method 2 of matrix multiplication in the proof.

Let  
| | |
B = ~b1 ~b2 ··· ~bp  .
| | |

90
Then, we have
  T
| | |
(AB)T = A ~b1 ~b2 · · · ~bp 
| | |
 T
| | |
= A~b1 A~b2 · · · A~bp  (7.1)
| | |
(A~b1 ) T
 

|
 (A~b2 ) T 
 

|
= .. 

 . 
~
(Abp ) T
|

|
~b T AT
 
|

|
1
 ~T T 
 b2 A
|

| 
= .. 

 . 
~b T AT
|

p
~b T
 
|

1
 ~T 
 b2
|

 T
= .. A
 (7.2)
 . 
~b T
|

p
T T
=B A .

Note that we used method 1 of matrix multiplication in (7.1) and method 2 of


matrix multiplication in (7.2). This formula can be used to prove an interesting
result in question 11 in the exercises.

91
Exercises
   
1 −2 −1 4
1. 3 +5 =?
3 0 2 5
   
2 −3 3 −4
2. 10 3 1  − 7 5 0  =?
6 −7 8 1
 
  3
10 −9 7  
3. 2 =?
−5 6 3
1
 
  5 3 1
4. −2 4 =?
2 4 6
  
1 −3 7 11
5. −5 0 6  5  =?
2 1 8 −3

6. Show that any matrix times the 0 vector is the 0 vector:


    
a11 a12 · · · a1n 0 0
 a21 a22 · · · a2n  0 0
..   ..  =  ..  .
    
 .. .. ..
 . . . .  . .
am1 am2 ··· amn 0 0
 
1 2  
2 −6 0 −8
7. −3 4 =?
1 0 −1 7
5 −6
  
2 0 6 −1 3
8. =?
0 −2 −3 5 0
  
3 2 1 2 −1 0
9. 0 −1 0  1 3 −2 =?
0 3 −5 0 5 0

92
10. Show that the product of two diagonal matrices is a diagonal matrix:
    
c1 0 · · · 0 d1 0 · · · 0 c1 d1 0 ··· 0
 0 c2 · · · 0   0 d 2 · · · 0   0 c2 d 2 · · · 0 
..  =  .. ..  .
    
 .. .. . . ..   .. .. . . .. . .
. . . .   . . . .   . . . . 
0 0 · · · cn 0 0 · · · dn 0 0 · · · cn dn

11. For any matrix A, show that AT A is a symmetric matrix. (Hint: a symmetric
matrix S is a matrix such that S T = ?)

93
Chapter 8

Row Operations on Matrices

94
8.1 Three Types of Row Operations
In this chapter, we will learn about the row operations, and then we will learn an
application of it in the next chapter. Simply speaking, row operations are some-
thing we do to the rows of a matrix. There are three types of row operations.

Switching Rows

When doing row operations on a matrix, we can switch two rows with each other.
Let us look at some examples.
 
1 5
Example 8.1: Switch row 1 and row 2 of .
2 6
  
5 1 2 6
→ .
6 2 1 5
 
3 −1 0
Example 8.2: Switch row 1 and row 3 of 2 −2 8 .
1 5 10
   
3 −1 0 1 5 10
2 −2 8  → 2 −2 8 .
1 5 10 3 −1 0
 
1 2 3 −5
Example 8.3: Switch row 2 and row 3 of −2 1 0 6 .
5 7 8 −9
   
1 2 3 −5 1 2 3 −5
−2 1 0 6 → 5 7 8 −9 .
5 7 8 −9 −2 1 0 6

Multiplying a Row by a Scalar

The second type of row operations is multiplying a row by a non-zero scalar. When
we multiply row i1 by k, we replace the original row i1 with a new row vector
obtained by multiplying the original row i1 by k. Let us do some examples.
 
1 2
Example 8.4: Multiply row 1 by 3 for the matrix .
−5 3
We replace row 1 with 3 times row 1:
     
1 2 →3 1 2 = 3 6 .

So,    
1 2 3 6
→ .
−5 3 −5 3

95
 
3 6 −9
Example 8.5: Multiply row 2 by −5 for the matrix .
1 −2 2
We replace row 2 with −5 times row 2:
     
1 −2 2 → (−5) 1 −2 2 = −5 10 −10 .
So,    
3 6 −9 3 −9
6
→ .
1 −2 2 −5 −10
10
 
1 5 12
Example 8.6: Multiply row 3 by 2 for the matrix −9 −3 10 .
6 3 −2
We replace row 3 with 2 times row 3:
     
6 3 −2 → 2 6 3 −2 = 12 6 −4 .
So,    
1 5 12 1 5 12
−9 −3 10  → −9 −3 10  .
6 3 −2 12 6 −4

Adding a Multiple of Another Row to a Row

Another type of row operations which we can do is adding a multiple of another


row to a row. When we add k times row i1 to row i2 , we replace the original row
i2 with a new row vector obtained by adding k times row i1 to row i2 . Let us do a
few examples.
 
−1 2
Example 8.7: Add 2 times row 1 to row 2 for the matrix .
3 −5
Adding 2 times row 1 to row 2 gives
     
2 −1 2 + 3 −5 = 1 −1 .
So, we will replace row 2 with this new row vector:
   
−1 2 −1 2
→ .
3 −5 1 −1
 
−2 3
Example 8.8: Add −3 times row 2 to row 3 for the matrix  0 5 .
7 10
Adding −3 times row 2 to row 3 gives
     
(−3) 0 5 + 7 10 = 7 −5 .
So, we will replace row 3 with this new row vector:
   
−2 3 −2 3
0 5→ 0 5 .
7 10 7 −5

96
 
1 −1 5
Example 8.9: Add row 1 to row 3 for the matrix 3 7 −8.
2 10 6
Adding row 1 to row 3 gives
     
1 −1 5 + 2 10 6 = 3 9 11 .
So, we will replace row 3 with this new row vector:
   
1 −1 5 1 −1 5
3 7 −8 → 3 7 −8 .
2 10 6 3 9 11

8.2 Row Operation as Matrix Multiplication


The row operations done to a m×n matrix can be expressed as multiplying a m×m
matrix to the left of the m × n matrix. When expressing the row operations as
matrix multiplications like this, the second method of matrix multiplication learned
in chapter 7, which is multiplying each row of the first matrix to the second matrix
to obtain each row of the product, will be helpful.

Switching Rows

Let us look back at an example we did in the last section. In example 8.1, we
saw that switching row 1 and row 2 of
 
1 5
(8.1)
2 6
gives  
2 6
.
1 5
Now consider the matrix multiplication
  
0 1 1 5
.
1 0 2 6
The first row of the product is
 
  1 5      
0 1 =0 1 5 +1 2 6 = 2 6 ,
2 6
so we can see that the original row 2 is now moved to the position of row 1. The
second row of the product is
 
  1 5      
1 0 =1 1 5 +0 2 6 = 1 5 ,
2 6
so the original row 1 is now moved to the position of row 2. Thus,
    
0 1 1 5 2 6
= ,
1 0 2 6 1 5

97
which is the same as the matrix obtained after switching row 1 and row 2 of the
original matrix in (8.1). Do you notice anything particular about the matrix
 
0 1
1 0
which we are multiplying at the left? It is the 2 × 2 identity matrix with row 1 and
row 2 switched:    
1 0 0 1
→ .
0 1 1 0

Next, let us look at another example we did in the last section. In example 8.3, we
saw that switching row 2 and row 3 of
 
1 2 3 −5
−2 1 0 6  (8.2)
5 7 8 −9
gives  
1 2 3 −5
5 7 8 −9 .
−2 1 0 6
Now consider the matrix multiplication
  
1 0 0 1 2 3 −5
0 0 1 −2 1 0 6 .
0 1 0 5 7 8 −9
The first row of the product is
 
  1 2 3 −5
1 0 0 −2 1 0 6 
5 7 8 −9
     
=1 1 2 3 −5 + 0 −2 1 0 6 +0 5 7 8 −9
 
= 1 2 3 −5 ,
so row 1 stays the same. The second row of the product is
 
  1 2 3 −5
0 0 1 −2 1 0 6 
5 7 8 −9
     
=0 1 2 3 −5 + 0 −2 1 0 6 + 1 5 7 8 −9
 
= 5 7 8 −9 ,
so the original row 3 is now moved to the position of row 2. The third row of the
product is
 
  1 2 3 −5
0 1 0 −2 1 0 6 
5 7 8 −9
     
=0 1 2 3 −5 + 1 −2 1 0 6 + 0 5 7 8 −9
 
= −2 1 0 6 ,

98
so the original row 2 is now moved to the position of row 3. Thus,
    
1 0 0 1 2 3 −5 1 2 3 −5
0 0 1 −2 1 0 6 = 5 7 8 −9 ,
0 1 0 5 7 8 −9 −2 1 0 6
which is the same as the matrix obtained after switching row 2 and row 3 of the
original matrix in (8.2). Again, do you notice anything particular about the matrix
 
1 0 0
0 0 1
0 1 0
which we are multiplying at the left? It is the 3 × 3 identity matrix with row 2 and
row 3 switched:    
1 0 0 1 0 0
0 1 0 → 0 0 1 .
0 0 1 0 1 0

So, here is the general rule: when we switch row i1 and row i2 of a m × n matrix
A, it is the same as multiplying a m × m matrix E1 to the left of A, where E1 is
the m × m identity matrix with row i1 and row i2 switched.

For example, if we switch row 2 and row 4 of a 4 × 3 matrix, it is the same as


multiplying the 4 × 4 identity matrix with row 2 and row 4 switched,
 
1 0 0 0
0 0 0 1
0 0 1 0 ,
 

0 1 0 0
to the left of the 4 × 3 matrix.

Multiplying a Row by a Scalar

Next, let us see how this type of row operations can be expressed as a matrix
multiplication. In example 8.5, we saw that multiplying row 2 by −5 for the ma-
trix  
3 6 −9
(8.3)
1 −2 2
gives  
3 6 −9
.
−5 10 −10
Now consider the matrix multiplication
  
1 0 3 6 −9
.
0 −5 1 −2 2
The first row of the product is
 
  3 6 −9      
1 0 =1 3 6 −9 + 0 1 −2 2 = 3 6 −9 ,
1 −2 2

99
so row 1 stays the same. The second row of the product is
 
  3 6 −9      
0 −5 = 0 3 6 −9 + (−5) 1 −2 2 = −5 10 −10 ,
1 −2 2
so row 2 is multiplied by −5. Thus,
    
1 0 3 6 −9 3 6 −9
= ,
0 −5 1 −2 2 −5 10 −10
which is the same as the matrix obtained after multiplying row 2 by −5 for the
original matrix in (8.3). Do you notice anything particular about the matrix
 
1 0
0 −5
which we are multiplying at the left? It is the 2 × 2 identity matrix with row 2
multiplied by −5:    
1 0 1 0
→ .
0 1 0 −5

Next, let us look at another example from the last section. In example 8.6, we saw
that multiplying row 3 by 2 for the matrix
 
1 5 12
−9 −3 10  (8.4)
6 3 −2
gives  
1 5 12
−9 −3 10  .
12 6 −4
Now consider the matrix multiplication
  
1 0 0 1 5 12
0 1 0 −9 −3 10  .
0 0 2 6 3 −2
The first row of the product is
 
  1 5 12      
1 0 0 −9 −3 10  = 1 1 5 12 + 0 −9 −3 10 + 0 6 3 −2
6 3 −2
 
= 1 5 12 ,

so row 1 stays the same. The second row of the product is


 
  1 5 12      
0 1 0 −9 −3 10  = 0 1 5 12 + 1 −9 −3 10 + 0 6 3 −2
6 3 −2
 
= −9 −3 10 ,

100
so row 2 stays the same. The third row of the product is
 
  1 5 12      
0 0 2 −9 −3 10  = 0 1 5 12 + 0 −9 −3 10 + 2 6 3 −2
6 3 −2
 
= 12 6 −4 ,

so row 3 is multiplied by 2. Thus,


    
1 0 0 1 5 12 1 5 12
0 1 0 −9 −3 10  = −9 −3 10  ,
0 0 2 6 3 −2 12 6 −4
which is the same as the matrix obtained after multiplying row 3 by 2 for the
original matrix in (8.4). Again, do you notice anything particular about the matrix
 
1 0 0
0 1 0
0 0 2
which we are multiplying at the left? It is the 3 × 3 identity matrix with row 3
multiplied by 2:    
1 0 0 1 0 0
0 1 0 → 0 1 0 .
0 0 1 0 0 2

So, we have the following general rule: when we multiply row i1 by k for a m × n
matrix A, it is the same as multiplying a m × m matrix E2 to the left of A, where
E2 is the m × m identity matrix with row i1 multiplied by k.

For example, if we multiply row 2 by −1 for a 4 × 2 matrix, it is the same as


multiplying the 4 × 4 identity matrix with row 2 multiplied by −1,
 
1 0 0 0
0 −1 0 0
0 0 1 0 ,
 

0 0 0 1
to the left of the 4 × 2 matrix.

Adding a Multiple of Another Row to a Row

Now let us see how to express this type of row operations as a matrix multipli-
cation. In example 8.7, we saw that adding 2 times row 1 to row 2 for the matrix
 
−1 2
(8.5)
3 −5
gives  
−1 2
.
1 −1

101
Row 1 stays the same, so we have
 
      −1 2
1 −1 2 +0 3 −5 = 1 0 .
3 −5

Row 2 is replaced with 2 times the original row 1 added to the original row 2, so
we have  
      −1 2
2 −1 2 + 1 3 −5 = 2 1 .
3 −5
Thus, we have     
1 0 −1 2 −1 2
= .
2 1 3 −5 1 −1
So, multiplying  
1 0
2 1
to the left of the original matrix in (8.5) is the same as adding 2 times row 1 to
row 2 for that matrix. Do you notice anything particular about the matrix which
we are multiplying at the left? It is the 2 × 2 identity matrix with 2 times row 1
added to row 2:    
1 0 1 0
→ .
0 1 2 1

Next, let us look at example 8.8. In that example, we saw that adding −3 times
row 2 to row 3 for the matrix  
−2 3
0 5 (8.6)
7 10
gives  
−2 3
0 5 .
7 −5
Row 1 stays the same, so we have
 
        −2 3
1 −2 3 +0 0 5 +0 7 10 = 1 0 0 0 5 .
7 10

Row 2 also stays the same, so we have


 
        −2 3
0 −2 3 +1 0 5 +0 7 10 = 0 1 0 0 5 .
7 10

Row 3 is replaced with −3 times the original row 2 added to the original row 3, so
we have
 
        −2 3
0 −2 3 + (−3) 0 5 + 1 7 10 = 0 −3 1  0 5 .
7 10

102
Thus, we have
    
1 0 0 −2 3 −2 3
0 1 0  0 5= 0 5 .
0 −3 1 7 10 7 −5

So, multiplying
 
1 0 0
0 1 0
0 −3 1

to the left of the original matrix in (8.6) is the same as adding −3 times row 2 to
row 3 for that matrix. Again, do you notice anything particular about the matrix
which we are multiplying at the left? It is the 3 × 3 identity matrix with −3 times
row 2 added to row 3:    
1 0 0 1 0 0
0 1 0 → 0 1 0 .
0 0 1 0 −3 1

Here is the general rule: when we add k times row i1 to row i2 for a m × n matrix
A, it is the same as multiplying a m × m matrix E3 to the left of A, where E3 is
the m × m identity matrix with k times row i1 added to row i2 .

For example, if we add row 1 to row 3 for a 3 × 5 matrix, it is the same as


multiplying the 3 × 3 identity matrix with row 1 added to row 3,
 
1 0 0
0 1 0 ,
1 0 1

to the left of the 3 × 5 matrix.

As you have seen, all of the matrices which we multiply to the left of other matrices
to perform row operations on those matrices are identity matrices being done the
same row operations on. These matrices which perform row operations on other
matrices when multiplied to the left are called elementary matrices.

Well, what do we do when we perform two or more row operations successively


on the same matrix? How do we express it as a matrix multiplication? If there
are two or more row operations, we just need to multiply two or more elementary
matrices corresponding to those row operations.

For example, if we switch row 2 and row 3 and then multiply row 2 by 5 for a
3 × 2 matrix B, it is the same as multiplying
 
1 0 0
0 0 1 ,
0 1 0

103
which switches row 2 and row 3, and then
 
1 0 0
0 5 0 ,
0 0 1

which multiplies row 2 by 5, to the left of B:


  
1 0 0 1 0 0
0 5 0 0 0 1 B.
0 0 1 0 1 0

If we want to express it as multiplication of two matrices instead of three, we just


need to multiply the two elementary matrices together:
    
1 0 0 1 0 0 1 0 0
0 5 0 0 0 1 B = 0 0 5 B.
0 0 1 0 1 0 0 1 0

104
Exercises
 
−1 5 2
3 0 6
1. Let A =  .
7 −9 10 
−2 8 −3
a) What is the resulting matrix after switching row 2 and row 4?

b) If we express this row operation as a matrix multiplication EA A, EA =?

 
3 −2 1 0
2. Let B = .
5 −3 6 2
a) What is the resulting matrix after multiplying row 1 by 2?

b) If we express this row operation as a matrix multiplication EB B, EB =?

 
6 11 −3
3. Let C = 3 −5 6 .
1 2 −1
a) What is the resulting matrix after adding −3 times row 3 to row 1?

b) If we express this row operation as a matrix multiplication EC C, EC =?

 
−2 7
4. Let D =  3 −1.
6 −8
a) What is the resulting matrix after switching row 1 and row 3 and then
adding row 3 to row 2?

b) If we express these row operations as a matrix multiplication ED D, ED =?

105
Chapter 9

Reduced Row Echelon Form


and Rank of Matrices

106
9.1 Reduced Row Echelon Form
Any matrix can be reduced to the reduced row echelon form, and the reduced row
echelon form of a matrix A is denoted as rref(A). In this section, we will learn
how to reduce a matrix to reduced row echelon form by applying row operations
we learned in the last chapter.

Here is how to reduce a matrix A to reduced row echelon form:

1. Start with the non-zero number at the top left corner (row 1 column 1 posi-
tion). If the number at the top left corner is 0, then switch rows so that the
number at the top left corner is a non-zero number.

2. Then, do row operations on the matrix so that all elements below on the same
column become 0 and the number we started with becomes 1.

3. Go down to the next row and start with the first non-zero number (from left
to right) on that row. If all elements on that row are 0, go down one more
row to see if there is a non-zero number and start with that number.

4. Then, do row operations on the matrix so that all elements above and below
on the same column become 0 and the new number we started with becomes
1.

5. Repeat step 3-4 until we reach the last row. Then, if there are any rows with
all 0’s, switch rows so that the rows with all 0’s come to the bottom.

6. The resulting matrix is reduced row echelon form of A. The columns with 1
and 0’s at other places are called pivot columns, and the 1’s on pivot columns
are pivots.

Now let us look at some examples to understand this better.


 
2 1
Example 9.1: Let A = , rref(A) =?
6 −3
First, we start with the number at the top left corner, which is 2. Then, we
need to do row operations so that that 2 becomes 1 and the 6 below becomes 0.

Multiplying first row by 21 ,


1
   
2 1 1
→ 2 .
6 −3 6 −3

Adding −6 times row 1 to row 2,

1 12 1
   
1
→ 2 .
6 −3 0 −6

Next, we go down to row 2 and start with the first non-zero number on row 2,
which is −6. Then, we need to do row operations so that that −6 becomes 1 and

107
1
the 2 above becomes 0.

Multiplying row 2 by − 16 ,
1 1
 
 
1 1
2 → 2 .
0 −6 0 1
Adding − 12 times row 2 to row 1,
1
 
 
1 1 0
2 → .
0 1 0 1

Therefore,  
1 0
rref(A) = .
0 1
Column 1 and column 2 are pivot columns.
 
−2 3 −6
Example 9.2: Let B =  10 −9 18 , rref(B) =?
8 −6 12
First, we start with the number at the top left corner, which is −2. Then, we
need to do row operations so that that −2 becomes 1 and the 10 and 8 below
become 0.

Multiplying row 1 by − 21 ,

− 32
   
−2 3 −6 1 3
 10 −9 18  → 10 −9 18 .
8 −6 12 8 −6 12

Adding −10 times row 1 to row 2,

− 32 − 32
   
1 3 1 3
10 −9 18 → 0 6 −12 .
8 −6 12 8 −6 12

Adding −8 times row 1 to row 3,

1 − 32 − 32
   
3 1 3
0 6 −12 → 0 6 −12 .
8 −6 12 0 6 −12

Next, we go down to row 2 and start with the first non-zero number on row 2,
which is 6. Then, we need to do row operations so that that 6 becomes 1 and the
− 23 above and the 6 below become 0.

Multiplying row 2 by 61 ,

− 32 − 32
   
1 3 1 3
0 6 −12 → 0 1 −2  .
0 6 −12 0 6 −12

108
3
Adding 2 times row 2 to row 1,

− 32
   
1 3 1 0 0
0 1 −2  → 0 1 −2  .
0 6 −12 0 6 −12

Adding −6 times row 2 to row 3,


   
1 0 0 1 0 0
0 1 −2  → 0 1 −2 .
0 6 −12 0 0 0

Next, we go down to row 3. Since row 3, the last row, is all 0’s, there is nothing
more to do. Therefore,
 
1 0 0
rref(B) = 0 1 −2 .
0 0 0
Column 1 and column 2 are pivot columns.
 
1 −1
Example 9.3: Let C =  3 5 , rref(C) =?
−2 6
First, we start with the number at the top left corner, which is 1. Then, we need
to do row operations so that that 1 becomes 1 and the 3 and −2 below become 0.
(Obviously, since the 1 at the top left corner is already 1, we do not need to do any
row operations for that part.)

Adding −3 times row 1 to row 2,


   
1 −1 1 −1
3 5 → 0 8 .
−2 6 −2 6

Adding 2 times row 1 to row 3,


   
1 −1 1 −1
0 8  → 0 8 .
−2 6 0 4

Next, we go down to row 2 and start with the first non-zero number on row 2,
which is 8. Then, we need to do row operations so that that 8 becomes 1 and the
−1 above and the 4 below become 0.

Multiplying row 2 by 18 ,
   
1 −1 1 −1
0 8  → 0 1 .
0 4 0 4

109
Adding row 2 to row 1,    
1 −1 1 0
0 1  → 0 1 .
0 4 0 4
Adding −4 times row 2 to row 3,
   
1 0 1 0
0 1 → 0 1 .
0 4 0 0

Next, we go down to row 3. Since row 3, the last row, is all 0’s, there is nothing
more to do. Therefore,  
1 0
rref(C) = 0 1 .
0 0
Column 1 and column 2 are pivot columns.
 
3 9 6
Example 9.4: Let D = , rref(D) =?
5 15 7
First, we start with the number at the top left corner, which is 3. Then, we
need to do row operations so that that 3 becomes 1 and the 5 below becomes 0.

Multiplying row 1 by 13 ,
   
3 9 6 1 3 2
→ .
5 15 7 5 15 7

Adding −5 times row 1 to row 2,


   
1 3 2 1 3 2
→ .
5 15 7 0 0 −3

Next, we go down to row 2 and start with the first non-zero number on row 2,
which is −3. Then, we need to do row operations so that that −3 becomes 1 and
the 2 above becomes 0.

Multiplying row 2 by − 13 ,
   
1 3 2 1 3 2
→ .
0 0 −3 0 0 1

Adding −2 times row 2 to row 1,


   
1 3 2 1 3 0
→ .
0 0 1 0 0 1

Therefore,  
1 3 0
rref(D) = .
0 0 1

110
Column 1 and column 3 are pivot columns.
 
1 5 3 6
Example 9.5: Let E = 2 10 6 12, rref(E) =?
5 25 8 9
First, we start with the number at the top left corner, which is 1. Then, we
need to do row operations so that that 1 becomes 1 and the 2 and 5 below become
0. (Obviously, since the 1 at the top left corner is already 1, we do not need to do
any row operations for that part.)

Adding −2 times row 1 to row 2,


   
1 5 3 6 1 5 3 6
2 10 6 12 → 0 0 0 0 .
5 25 8 9 5 25 8 9

Adding −5 times row 1 to row 3,


   
1 5 3 6 1 5 3 6
0 0 0 0 → 0 0 0 0 .
5 25 8 9 0 0 −7 −21

Next, we go down to row 2. Since row 2 is all 0’s, we go down to row 3 and start
with the first non-zero number on row 3, which is −7. Then, we need to do row
operations so that that −7 becomes 1 and the 3 above becomes 0.

Multiplying row 3 by − 17 ,
   
1 5 3 6 1 5 3 6
0 0 0 0  → 0 0 0 0 .
0 0 −7 −21 0 0 1 3

Adding −3 times row 3 to row 1,


   
1 5 3 6 1 5 0 −3
0 0 0 0 → 0 0 0 0 .
0 0 1 3 0 0 1 3

Since row 2 in the middle is a 0 row, we switch row 2 and row 3 so that the 0 row
comes to the bottom:
   
1 5 0 −3 1 5 0 −3
0 0 0 0  → 0 0 1 3 .
0 0 1 3 0 0 0 0

Therefore,  
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0
Column 1 and column 3 are pivot columns.

111
9.2 Rank of Matrices

Notice that when we reduce a matrix to reduced row echelon form, each pivot is
in a different column and a different row from other pivots. In other words, each
row and each column only have at most one pivot. Also, note that the rows that
do not have pivots are 0 rows.

The number and positions of the pivots of reduced row echelon form of a ma-
trix play important roles in linear algebra. In this section, we will focus on the
meaning of the number of pivots, which is the rank of the matrix. For example, if
the reduced row echelon form of a matrix A has two pivots, then we say that the
matrix A has rank 2.

The rank of a matrix is the number of linearly independent rows of that matrix.
For example, let us say we have a matrix of five rows. If row 1, row 2, and row 4 are
linearly independent, and row 3 and row 5 can be written as linear combinations
of row 1, row 2, and row 4, then that matrix has three linearly independent rows,
and its rank is 3.

It should not be too hard to understand why the number of pivots of reduced
row echelon form is equal to the number of linearly independent rows. Let us think
of an example. Say we have a matrix of three rows with row 1 and row 2 being
linearly independent and row 3 being linearly dependent with respect to row 1 and
row 2, specifically row 3 equals row 1 plus row 2. Recall that we obtain reduced row
echelon form by doing row operations on the matrix. By adding −1 times row 1 to
row 3 and then adding −1 times row 2 to row 3, row 3 in the reduced row echelon
form would be a 0 row, and only row 1 and row 2 of the reduced row echelon form
would have pivots. In general, the linearly dependent rows will be cancelled out
when doing row operations, and the linearly independent rows will remain non-zero
and contain pivots.

There is also a nice fact that the number of linearly independent rows of a matrix
is equal to the number of linearly independent columns of that matrix. However,
the proof of that is slightly complicated, so we will not prove it here.

To summarize, the number of pivots of reduced row echelon form of a matrix


is the rank of the matrix, which is also the number of linearly independent rows
and the number of linearly independent columns of the matrix. Now let us look at
some examples.
 
2 1
Example 9.6: Let A = . What is the rank of A?
6 −3
In example 9.1, we found that

 
1 0
rref(A) = .
0 1

112
Since there are two pivots, the rank of A is 2.
 
−2 3 −6
Example 9.7: Let B =  10 −9 18 . How many linearly independent rows
8 −6 12
are there in B?

In example 9.2, we found that


 
1 0 0
rref(B) = 0 1 −2 .
0 0 0

Since there are two pivots, B has two linearly independent rows.
 
1 −1
Example 9.8: Let C =  3 5 . How many linearly independent columns
−2 6
are there in C?

In example 9.3, we found that


 
1 0
rref(C) = 0 1 .
0 0

Since there are two pivots, C has two linearly independent columns.
 
3 9 6
Example 9.9: Let D = . What is the rank of D?
5 15 7
In example 9.4, we found that
 
1 3 0
rref(D) = .
0 0 1

Since there are two pivots, the rank of D is 2.


 
1 5 3 6
Example 9.10: Let E = 2 10 6 12. How many linearly independent
5 25 8 9
columns are there in E?

In example 9.5, we found that


 
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0

Since there are two pivots, E has two linearly independent columns.

113
Exercises
 
2 −2 6
1. Let A1 =  6 3 0 .
12 −12 15
a) rref(A1 ) =?

b) Which column(s) of rref(A1 ) is/are pivot column(s)?

c) What is the rank of A1 ?

d) How many linearly independent rows are there in A1 ?

c) How many linearly independent columns are there in A1 ?

 
−3 2 9
2. Let A2 = .
6 −4 −18
a) rref(A2 ) =?

b) Which column(s) of rref(A2 ) is/are pivot column(s)?

c) What is the rank of A2 ?

d) How many linearly independent rows are there in A2 ?

c) How many linearly independent columns are there in A2 ?

 
5 −3 2 −1
3. Let A3 = −2 6 −5 1 .
7 −1 3 2
a) rref(A3 ) =?

b) Which column(s) of rref(A3 ) is/are pivot column(s)?

c) What is the rank of A3 ?

114
d) How many linearly independent rows are there in A3 ?

c) How many linearly independent columns are there in A3 ?

 
−6 3
4. Let A4 =  1 −2.
5 1
a) rref(A4 ) =?

b) Which column(s) of rref(A4 ) is/are pivot column(s)?

c) What is the rank of A4 ?

d) How many linearly independent rows are there in A4 ?

c) How many linearly independent columns are there in A4 ?

115
Chapter 10

The Four Fundamental


Subspaces

116
10.1 What are Four Fundamental Subspaces?
The four fundamental subspaces are some vector spaces related to a matrix which
are subspaces of some n-dimensional space Rn . The four fundamental subspaces
of a matrix are column space, row space, null space, and left null space. Given an
m × n matrix A,
• The column space is the vector space of all linear combinations of columns of
A and is a subspace of Rm ;
• The row space is the vector space of all linear combinations of rows of A and
is a subspace of Rn ;
• The null space is the vector space of all n-dimensional vectors ~v such that
A~v = 0 and is a subspace of Rn ;
• The left null space is the vector space of all m-dimensional vectors ~u such
that ~u T A = 0 and is a subspace of Rm .
Now let us learn about these four spaces in more details. If you are not yet very
familiar with the concept of vector spaces, I recommend that you review chapter 3
before moving on with this chapter.

10.2 Column Space


The column space of A, denoted as C(A), is the vector space of all linear combina-
tions of columns of A, so it is the span of columns of A. However, remember from
chapter 3 that only the number of linearly independent basis vectors determines
the dimensions of the vector space, and the linearly dependent vectors do not affect
the vector space at all. So, more precisely, C(A) is the span of linearly independent
columns of A.

In the last chapter, we learned how to determine how many linearly independent
columns there are in a matrix. In this section, we will learn how to determine
which columns are linearly independent so that we can find the basis vectors for
the column space of that matrix.

To determine which columns of A are linearly independent, we need to use the


reduced row echelon form of A. The positions of pivot columns of rref(A) are the
same as the positions of linearly independent columns of A. For example, if column
1 and column 3 of rref(A) are pivot columns, then column 1 and column 3 of A are
the linearly independent columns of A.

The proof that the positions of pivot columns of rref(A) are the same as the posi-
tions of linearly independent columns of A is slightly complicated, so we will not
prove it here. Instead, think of it intuitively like this: because the pivot columns of
rref(A) are linearly independent, the columns of A at the same positions are also
linearly independent. It should not be too difficult to see why the pivot columns
are linearly independent. Each pivot column has 0’s and one 1, and the 1’s of

117
different pivot columns are at different places. For example, there is only one pivot
column with 1 as its first element, and there is only at most one pivot column with
1 as its second element, and so on. As an example, let us consider the following
vectors which are some possible pivot columns of the reduced row echelon form of
a matrix:      
1 0 0
0 1 0
 , , .
0 0 1
0 0 0
Is there any non-zero linear combination of these vectors which gives the 0 vector?
In other words, is it possible to have
       
1 0 0 0
0 1 0 0
0 + c2 0 + c3 1 = 0
c1        

0 0 0 0

where c1 6= 0, c2 6= 0, or c3 6= 0? The answer is no. The only possible linear


combination to give the 0 vector is when c1 = c2 = c3 = 0, so those three vectors
are linearly independent.

Now let us do some examples of finding the column space of a matrix as the span
of linearly independent columns.
 
2 1
Example 10.1: Let A = . Find C(A).
6 −3
In example 9.1, we found that
 
1 0
rref(A) = .
0 1

Since column 1 and column 2 of rref(A) are the pivot columns, column 1 and column
2 of A are linearly independent, and thus
   
2 1
C(A) = span , .
6 −3

Because C(A) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
−2 3 −6
Example 10.2: Let B =  10 −9 18 . Find C(B).
8 −6 12
In example 9.2, we found that
 
1 0 0
rref(B) = 0 1 −2 .
0 0 0

118
Since column 1 and column 2 of rref(B) are the pivot columns, column 1 and
column 2 of B are linearly independent, and thus
   
 −2 3 
C(B) = span  10  , −9 .
8 −6
 

Because C(B) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
1 −1
Example 10.3: Let C =  3 5 . Find C(C).
−2 6
In example 9.3, we found that
 
1 0
rref(C) = 0 1 .
0 0

Since column 1 and column 2 of rref(C) are the pivot columns, column 1 and
column 2 of C are linearly independent, and thus
   
 1 −1 
C(C) = span  3  ,  5  .
−2 6
 

Because C(C) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
3 9 6
Example 10.4: Let D = . Find C(D).
5 15 7
In example 9.4, we found that
 
1 3 0
rref(D) = .
0 0 1

Since column 1 and column 3 of rref(D) are the pivot columns, column 1 and
column 3 of D are linearly independent, and thus
   
3 6
C(D) = span , .
5 7

Because C(D) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
1 5 3 6
Example 10.5: Let E = 2 10 6 12. Find C(E).
5 25 8 9

119
In example 9.5, we found that
 
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0

Since column 1 and column 3 of rref(E) are the pivot columns, column 1 and
column 3 of E are linearly independent, and thus
   
 1 3 
C(E) = span 2 , 6 .
5 8
 

Because C(E) is the span of two linearly independent vectors, it is a two-dimensional


vector space.

There are a few more things to note about the column space before we move on to
the row space.

First, the column space of a m × n matrix is a subspace of Rm . Note that each col-
umn of a m × n matrix has m elements, so it is a m-dimensional vector. Remember
from chapter 3 that the span of m-dimensional linearly independent vectors is a
subspace of Rm . So, the column space of a m × n matrix is a subspace of Rm . For
example, the column space of a 3 × 2 matrix is a subspace of R3 .

Second, the column space of A is the same as the row space of AT . As we will
learn in the next section, the row space of a matrix is the vector space of all linear
combinations of rows of that matrix. When we take the transpose of a matrix, the
columns of A become rows of AT . So, linear combinations of columns of A are the
same as linear combinations of rows of AT .

10.3 Row Space


The row space of A, denoted as R(A), is the vector space of all linear combinations
of rows of A. Like how C(A) is the span of linearly independent columns, R(A)
is the span of linearly independent rows because only the number of linearly inde-
pendent basis vectors determines the dimensions of the vector space.

However, for R(A), we do not need to use linearly independent rows of A as the
basis vectors although we can. Instead, we can write R(A) as the span of non-zero
rows of rref(A).

Remember that we obtain rref(A) by doing row operations to A. When we do row


operations, we switch rows, multiply a row by a scalar, and/or adding a multiple
of a row to another row. So, the new rows of rref(A) are some linear combinations
of rows of A. Also, as explained in section 9.2 of the last chapter, the number of
non-zero rows of rref(A) is the same as the number of linearly independent rows of

120
A, and the non-zero rows of rref(A) are linearly independent because they would
have been cancelled out to become 0 rows if they were linearly dependent instead.

If we have a set of linearly independent vectors and another set of other linearly
independent vectors which can be written as linear combinations of the vectors of
the first set, then the span of the second set is the same as the span of the first set.
Let us see why this is true with an example with two vectors. Say we have a set of
two linearly independent vectors ~v1 and ~v2 and another set of linearly independent
vectors ~u1 and ~u2 , where ~u1 and ~u2 can be written as some linear combinations of
~v1 and ~v2 , i.e.
~u1 = c1~v1 + c2~v2
and
~u2 = c3~v1 + c4~v2
for some scalars c1 , c2 , c3 , and c4 . Then, for some scalars a and b, a vector w
~ in
the span of ~u1 and ~u2 is a vector of the form
w
~ = a~u1 + b~u2
= a(c1~v1 + c2~v2 ) + b(c3~v1 + c4~v2 )
= ac1~v1 + ac2~v2 + bc3~v1 + bc4~v2
= (ac1 + bc3 )~v1 + (ac2 + bc4 )~v2 ,
which is a linear combination of ~v1 and ~v2 in the span of ~v1 and ~v2 as well.

Thus, since the non-zero rows of rref(A) are linearly independent and are some
linear combinations of the linearly independent rows of A, the span of non-zero
rows of rref(A) is the same as the span of linearly independent rows of A, and we
can write R(A) as the span of non-zero rows of rref(A).

Now let us do some examples of finding the row space of a matrix.


 
2 1
Example 10.6: Let A = . Find R(A).
6 −3
In example 9.1, we found that
 
1 0
rref(A) = .
0 1

Since row 1 and row 2 of rref(A) are non-zero rows of rref(A),


   
1 0
R(A) = span , .
0 1

Because R(A) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
−2 3 −6
Example 10.7: Let B =  10 −9 18 . Find R(B).
8 −6 12

121
In example 9.2, we found that
 
1 0 0
rref(B) = 0 1 −2 .
0 0 0

Since row 1 and row 2 of rref(B) are non-zero rows of rref(B),


   
 1 0 
R(B) = span 0 ,  1  .
0 −2
 

Because R(B) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
1 −1
Example 10.8: Let C =  3 5 . Find R(C).
−2 6
In example 9.3, we found that
 
1 0
rref(C) = 0 1 .
0 0

Since row 1 and row 2 of rref(C) are non-zero rows of rref(C),


   
1 0
R(C) = span , .
0 1

Because R(C) is the span of two linearly independent vectors, it is a two-dimensional


vector space.
 
3 9 6
Example 10.9: Let D = . Find R(D).
5 15 7
In example 9.4, we found that
 
1 3 0
rref(D) = .
0 0 1

Since row 1 and row 2 of rref(D) are non-zero rows of rref(D),


   
 1 0 
R(D) = span 3 , 0 .
0 1
 

Because R(D) is the span of two linearly independent vectors, it is a two-dimensional


vector space.

122
 
1 5 3 6
Example 10.10: Let E = 2 10 6 12. Find R(E).
5 25 8 9
In example 9.5, we found that
 
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0

Since row 1 and row 2 of rref(E) are non-zero rows of rref(E),


   

 1 0 
   
5  , 0 .

R(E) = span   0  1

 
−3 3
 

Because R(E) is the span of two linearly independent vectors, it is a two-dimensional


vector space.

There are a few more things to note about the row space before we move on to the
null and left null spaces.

First, the row space of a m × n matrix is a subspace of Rn . Note that each row of
a m × n matrix has n elements, so it is a n-dimensional vector. Remember from
chapter 3 that the span of n-dimensional linearly independent vectors is a subspace
of Rn . So, the row space of a m × n matrix is a subspace of Rn . For example, the
row space of a 4 × 5 matrix is a subspace of R5 .

Second, the row space of A is the same as the column space of AT . As we learned
in the last section, the column space of a matrix is the vector space of all linear
combinations of columns of that matrix. When we take the transpose of a matrix,
the rows of A become columns of AT . So, linear combinations of rows of A are the
same as linear combinations of columns of AT .

10.4 Null and Left Null Spaces


The null space of a m × n matrix A, denoted as N (A), is the vector space of all
n-dimensional vectors ~v such that A~v = 0. The vectors ~v have to be n-dimensional
because a m × n matrix can only be multiplied by a n-dimensional vector. Because
N (A) is a vector space of n-dimensional vectors, it is a subspace of Rn .

We know that any matrix times the 0 vector is the 0 vector, so ~v = 0 is in the
null space of any matrix A. The more interesting question is when there can be
non-zero vectors ~v such that A~v = 0. There can be non-zero vectors ~v such that
A~v = 0 when not all columns of A are linearly independent, i.e. when A has some
linearly dependent column(s).

123
Let  
| | |
A = ~a1 ~a2 ··· ~an 
| | |
and  
v1
 v2 
~v =  .  .
 
 .. 
vn
Then,  
  v1
| | |  v2 
~a1 ~a2 ··· ~an   .  = v1~a1 + v2~a2 + · · · + vn~an ,
 
.
| | | .
vn
so the equation A~v = 0 becomes

v1~a1 + v2~a2 + · · · + vn~an = 0.

From the definition of linear independence, we know that the only solution to this
is v1 = v2 = · · · = vn = 0 if the column vectors ~a1 , ~a2 , · · · , ~an are all linearly in-
dependent. So, if all columns of A are linearly independent, then the only solution
to A~v = 0 is ~v = 0. Otherwise, if not all columns of A are linearly independent,
then not all of v1 , v2 , · · · , vn are 0, so there can be non-zero vectors ~v such that
A~v = 0.

To find the vectors ~v in N (A), we need to solve the equation A~v = 0. How-
ever, solving A~v = 0 directly could be complicated, so we can solve the equation
rref(A)~v = 0 instead because these two equations are equivalent.

Reduced row echelon form rref(A) is obtained by doing row operations to A, and
recall from chapter 8 that doing row operations to A is the same as multiplying
some elementary matrices to A. In high school algebra, we learned that to keep
an equation the same, we need to multiply both sides of the equation by the same
thing. For the equation A~v = 0, multiplying the elementary matrices to both sides
gives rref(A)~v = 0 because the elementary matrices times A is rref(A), and the
elementary matrices times 0 is still 0 because any matrix times the 0 vector is the
0 vector.

Now let us do some examples of finding the null space of a matrix.


 
2 1
Example 10.11: Let A = . Find N (A).
6 −3
In example 9.1, we found that
 
1 0
rref(A) = .
0 1

124
Let  
v
~v = 1 ,
v2
and we need to solve the equation rref(A)~v = 0:
    
1 0 v1 0
=
0 1 v2 0
     
1 0 0
v1 + v2 =
0 1 0
   
v1 0
= .
v2 0
So, we have v1 = 0 and v2 = 0:
   
v 0
~v = 1 = .
v2 0
Thus, N (A) contains only the 0 vector:
 
0
N (A) = .
0
We can see that N (A) contains only the 0 vector because all columns of A are lin-
early independent. Since it contains only the 0 vector, it is a 0-dimensional vector
space.
 
−2 3 −6
Example 10.12: Let B =  10 −9 18 . Find N (B).
8 −6 12
In example 9.2, we found that
 
1 0 0
rref(B) = 0 1 −2 .
0 0 0
Let  
v1
~v = v2  ,
v3
and we need to solve the equation rref(B)~v = 0:
    
1 0 0 v1 0
0 1 −2 v2  = 0
0 0 0 v3 0
       
1 0 0 0
v1 0 + v2 1 + v3 −2 = 0
0 0 0 0
   
v1 0
v2 − 2v3  = 0 .
0 0

125
So, we have v1 = 0 and v2 = 2v3 :
     
v1 0 0
~v = v2  = 2v3  = v3 2 .
v3 v3 1
 
0
Thus, N (B) contains any vectors ~v that are linear combinations of 2:
1
 
 0 
N (B) = span 2 .
1
 

Because N (B) is the span of one linearly independent vector, it is a one-dimensional


vector space.
 
1 −1
Example 10.13: Let C =  3 5 . Find N (C).
−2 6
In example 9.3, we found that
 
1 0
rref(C) = 0 1 .
0 0
Let  
v
~v = 1 ,
v2
and we need to solve the equation rref(C)~v = 0:
   
1 0   0
0 1 v1 = 0
v2
0 0 0
     
1 0 0
v1 0 + v2 1 = 0
0 0 0
   
v1 0
v2  = 0 .
0 0
So, we have v1 = 0 and v2 = 0:
   
v 0
~v = 1 = .
v2 0

Thus, N (C) contains only the 0 vector:


 
0
N (C) = .
0

126
We can see that N (C) contains only the 0 vector because all columns of C are lin-
early independent. Since it contains only the 0 vector, it is a 0-dimensional vector
space.
 
3 9 6
Example 10.14: Let D = . Find N (D).
5 15 7
In example 9.4, we found that
 
1 3 0
rref(D) = .
0 0 1
Let  
v1
~v = v2  ,
v3
and we need to solve the equation rref(D)~v = 0:
 
  v1  
1 3 0   0
v2 =
0 0 1 0
v3
       
1 3 0 0
v1 + v2 + v3 =
0 0 1 0
   
v1 + 3v2 0
= .
v3 0
So, we have v1 = −3v2 and v3 = 0:
     
v1 −3v2 −3
~v = v2  =  v2  = v2  1  .
v3 0 0
 
−3
Thus, N (D) contains any vectors ~v that are linear combinations of  1 :
0
 
 −3 
N (D) = span  1  .
0
 

Because N (D) is the span of one linearly independent vector, it is a one-dimensional


vector space.
 
1 5 3 6
Example 10.15: Let E = 2 10 6 12. Find N (E).
5 25 8 9
In example 9.5, we found that
 
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0

127
Let  
v1
v2 
v3  ,
~v =  

v4
and we need to solve the equation rref(E)~v = 0:
 
 v
1 5 0 −3  1 
  
0
0 0 1 3  v2  = 0
v3 
0 0 0 0 0
v4
         
1 5 0 −3 0
v1 0 + v2 0 + v3 1 + v4  3  = 0
0 0 0 0 0
   
v1 + 5v2 − 3v4 0
 v3 + 3v4  = 0 .
0 0

So, we have v1 = −5v2 + 3v4 and v3 = −3v4 :


       
v1 −5v2 + 3v4 −5 3
v2   v 2
 1 0
~v = 
v3  =  −3v4  = v2  0  + v4 −3 .
      

v4 v4 0 1
 
−5
1
 0  and
Thus, N (E) contains any vectors ~v that are linear combinations of  

  0
3
0
 :
−3
1    
 −5
 3 
   
1 0

N (E) = span   ,   .

 0
 −3 
0 1
 

Because N (E) is the span of two linearly independent vectors, it is a two-dimensional


vector space.

Next, let us learn about the left null space.

The left null space of a m × n matrix A is the vector space of all m-dimensional
vectors ~u such that ~u T A = 0. The vectors ~u have to be m-dimensional because
a m × n matrix can only be multiplied by a m-dimensional row vector. Because
the left null space is a vector space of m-dimensional vectors, it is a subspace of Rm .

128
The left null space is actually nothing really new. The left null space of A is
the null space of AT , so we just need to find the null space of AT if we want to find
the left null space of A.

Taking transpose of both sides of the equation

AT ~u = 0

gives
T
AT ~u = 0
because the transpose of 0 column vector is just the 0 row vector. Then, using the
formula proven in section 7.5 of chapter 7,
T T
AT ~u = ~u T AT .
T
From question 6 in the exercises of chapter 6, we know that AT = A, so
T
AT ~u = ~u T A.

Thus, the equation


AT ~u = 0
is equivalent to the equation
~u T A = 0,
so the left null space of A is the same as the null space of AT .

129
Exercises
 
2 −2 6
1. Let A1 =  6 3 0 .
12 −12 15
a) Find the column space of A1 . How many dimensions does it have?

b) Find the row space of A1 . How many dimensions does it have?

c) Find the null space of A1 . How many dimensions does it have?

d) Find the left null space of A1 . How many dimensions does it have?

 
−3 2 9
2. Let A2 = .
6 −4 −18
a) Find the column space of A2 . How many dimensions does it have?

b) Find the row space of A2 . How many dimensions does it have?

c) Find the null space of A2 . How many dimensions does it have?

d) Find the left null space of A2 . How many dimensions does it have?

 
5 −3 2 −1
3. Let A3 = −2 6 −5 1 .
7 −1 3 2
a) Find the column space of A3 . How many dimensions does it have?

b) Find the row space of A3 . How many dimensions does it have?

c) Find the null space of A3 . How many dimensions does it have?

d) Find the left null space of A3 . How many dimensions does it have?

130
Chapter 11

Inverse of Matrices

131
11.1 Invertible and Singular Matrices
A n × n square matrix A can have the inverse A−1 such that

AA−1 = A−1 A = In

where In is the n × n identity matrix. Also, if we have

A~v = ~u,

then multiplying A−1 to both sides gives

A−1 A~v = A−1 ~u


In~v = A−1 ~u
~v = A−1 ~u.

However, not all square matrices have inverses. Matrices that have inverses are
called invertible matrices, and matrices that do not have inverses are called singu-
lar matrices.

So, when is a matrix invertible? When does the inverse of a matrix exist?

As an analogue, let us think of the inverse of a function. In high school algebra, we


learned that the inverse of a function exists when the graph of that function passes
the horizontal line test, which means any horizontal line only intersects the graph
once.

Let us take an example where the function does not have an inverse: f (x) = x2 .
We can see that any horizontal line in the upper half of the xy-plane intersects the
graph y = x2 twice. So, the graph does not pass the horizontal line test, and the
function does not have an inverse. This happens because for each positive y-value,
there are two x-values such that f (x) = y. For example, (−3)2 = 32 = 9. So, f is
taking two x-values to the same y-value. The inverse f −1 is supposed to take a y-
value back to a x-value, but in this case f −1 does not know which x-value to take the
y-value back to because there are two options. For example, f −1 would not know
whether to take the y-value 9 back to the x-value 3 or −3. So, f −1 does not exist.
Remember that a function can only take one value to one value, not more than one.

We can think of a matrix A and its inverse A−1 in a similar way. As explained
earlier, if
A~v = ~u,
then
A−1 ~u = ~v
when A−1 exists. We can think of this as A is taking a vector ~v to a vector ~u, and
and A−1 is taking the vector ~u back to the vector ~v .

132
When not all columns of A are linearly independent, there can be two or more
vectors ~v such that
A~v = ~u1
for some vector ~u1 . Suppose there is a vector ~v1 such that

A~v1 = ~u1 .

As we learned in the last chapter, if not all columns of A are linearly independent,
then there are non-zero vectors ~vn in the null space of A. Let ~v10 = ~v1 + ~vn , then

A~v10 = A (~v1 + ~vn ) = A~v1 + A~vn = ~u1 + 0 = ~u1 .

So, A takes different vectors ~v1 and ~v10 to the same vector ~u1 . Then, A−1 would
not know whether to take the vector ~u1 back to the vector ~v1 or ~v10 , so A−1 does
not exist.

Thus, when not all columns of A are linearly independent, A−1 does not exist,
and A is a singular matrix. Otherwise, if all columns of A are linearly independent,
then A−1 exists, and A is an invertible matrix.

11.2 Finding Inverse of an Invertible Matrix


In the last section, we discussed that a square matrix A is invertible only when all
of its columns are linearly independent. For a n × n square matrix A, if all of the n
columns are linearly independent, then rref(A) would be a n×n square matrix with
n pivots, which is the n × n identity matrix. For example, the reduced row echelon
form of a 3 × 3 square matrix whose all three columns are linearly independent is
 
1 0 0
0 1 0 .
0 0 1

That means we can do some row operations to A to get the identity matrix. Re-
member that doing row operations to A is the same as multiplying some elementary
matrices to A. Since multiplying the product of those elementary matrices to A
gives the identity matrix, the product of those elementary matrices is A−1 . So, to
find A−1 , we need to know what the product of those elementary matrices is.

Now how do we know what the product of those elementary matrices is? Well,
we can rewrite each row operation as a matrix multiplication, keep track of each
elementary matrix, and then multiply all of the elementary matrices at the end.
However, there is a better way. We can attach the identity matrix I to the right
of A and do the same row operations to both A and I at the same time. By doing
this, we are multiplying the same elementary matrices to I, and any matrix times I
is the matrix itself, so we will know what the product of those elementary matrices
is. Let us look at some examples.

133
 
2 1
Example 11.1: Let A = . Is A invertible? If so, find A−1 .
6 −3
By reducing A to rref(A), we can see that all columns are linearly independent, so
A is invertible. To find A−1 , we first attach I2 to the right of A:
 
2 1 1 0
.
6 −3 0 1
Next, we do some row operations to reduce A to rref(A). Multiplying first row by
1
2,
1 21 1
   
2 1 1 0 0
→ 2 .
6 −3 0 1 6 −3 0 1
Adding −6 times row 1 to row 2,
1 12 1 1 1
   
0 1 0
2 → 2 2 .
6 −3 0 1 0 −6 −3 1

Multiplying row 2 by − 16 ,
1 1
1 21 1
   
0 1 0
2 → 2 2
1 .
0 −6 −3 1 0 1 2 − 61

Adding − 12 times row 2 to row 1,

1 12 12 1 1
   
0 1 0
→ 4 12 .
0 1 12 − 16 0 1 1
2 − 16
Therefore, 1 1

−1 4 12
A = 1 .
2 − 16
We can check that 1 1
   
4 12 2 1 1 0
1 = .
2 − 16 6 −3 0 1
 
1 3
Example 11.2: Let B = . Is B invertible? If so, find B −1 .
5 15
We can see that the second column is 3 times the first column, so not all columns
of B are linearly independent. So, B is not invertible. We can use reduced row
echelon form to check the linear dependence of columns, but it is not necessary in
this case where we can see quickly like this.
 
1 6 −3
Example 11.3: Let C = −2 0 −6. Is C invertible? If so, find C −1 .
3 9 1
By reducing C to rref(C), we can see that all columns are linearly independent, so
C is invertible. To find C −1 , we first attach I3 to the right of C:
 
1 6 −3 1 0 0
 −2 0 −6 0 1 0  .
3 9 1 0 0 1

134
Next, we do some row operations to reduce C to rref(C). Adding 2 times row 1 to
row 2,    
1 6 −3 1 0 0 1 6 −3 1 0 0
 −2 0 −6 0 1 0 → 0 12 −12 2 1 0 .
3 9 1 0 0 1 3 9 1 0 0 1
Adding −3 times row 1 to row 3,
   
1 6 −3 1 0 0 1 6 −3 1 0 0
 0 12 −12 2 1 0 → 0 12 −12 2 1 0 .
3 9 1 0 0 1 0 −9 10 −3 0 1
1
Multiplying row 2 by 12 ,
   
1 6 −3 1 0 0 1 6 −3 1 0 0
 0 12 −12 2 1 1
1 0 → 0 1 −1 6 12 0 .
0 −9 10 −3 0 1 0 −9 10 −3 0 1

Adding −6 times row 2 to row 1,

− 21
   
1 6 −3 1 0 0 1 0 3 0 0
 0 1 −1 1 1
0 → 0 1 −1 1 1
0 .
6 12 6 12
0 −9 10 −3 0 1 0 −9 10 −3 0 1

Adding 9 times row 2 to row 3,

0 − 12 − 12
   
1 0 3 0 1 0 3 0 0
 0 1 −1 1 1 1 1
6 12 0 → 0 1 −1 6 12 0 .
0 −9 10 −3 0 1 0 0 1 − 32 3
4 1

Adding row 3 to row 2,

− 12 − 12
   
1 0 3 0 0 1 0 3 0 0
1 1
 0 1 −1
6 12 0 → 0 1 0 − 34 5
6 1 .
0 0 1 − 32 3
4 1 0 0 1 − 23 3
4 1

Adding −3 times row 3 to row 1,

− 21 9
− 11
   
1 0 3 0 0 1 0 0 2 4 −3
 0 1 0 − 43 5
6 1 → 0 1 0 − 43 5
6 1 .
0 0 1 − 32 3
4 1 0 0 1 − 32 3
4 1

Therefore,
9
− 11
 
2 4 −3
C −1 = − 4
3
5
6 1 .
− 23 3
4 1
We can check that
9
− 11
    
2 4 −3 1 6 −3 1 0 0
− 4 5
1  −2 0 −6 = 0 1 0 .
3 6
− 32 3
4 1 3 9 1 0 0 1

135
 
−2 3 −6
Example 11.4: Let D =  10 −9 18 . Is D invertible? If so, find D−1 .
8 −6 12
In example 9.2, we found that
 
1 0 0
rref(D) = 0 1 −2 .
0 0 0
Since there are only two pivot columns, not all columns of D are linearly indepen-
dent, so D is not invertible.
 
1 0 0 0
0 2 0 0
Example 11.5: Let E =  . Is E invertible? If so, find E −1 .
0 0 3 0
0 0 0 5
By reducing E to rref(E), we can see that all columns are linearly independent, so
E is invertible. To find E −1 , we first attach I4 to the right of E:
 
1 0 0 0 1 0 0 0
 0 2 0 0 0 1 0 0 
 0 0 3 0 0 0 1 0 .
 

0 0 0 5 0 0 0 1
Next, we do some row operations to reduce E to rref(E). Multiplying row 2 by 21 ,
   
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
 0
 2 0 0 0 1 0 0  →
 0 1 0 0 0 12 0 0  .
 0 0 3 0 0 0 1 0   0 0 3 0 0 0 1 0 
0 0 0 5 0 0 0 1 0 0 0 5 0 0 0 1
Multiplying row 3 by 31 ,
   
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 1
 0
 1 0 0 0 2 0 0  
→ 0 1 0 0 0 2 0 0 
.
 0 1
0 3 0 0 0 1 0   0 0 1 0 0 0 3 0 
0 0 0 5 0 0 0 1 0 0 0 5 0 0 0 1
Multiplying row 4 by 51 ,
   
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1  0 1 0 0 1
 0
 1 0 0 0 2 0 0 
→ 0 2 0 0 
.
1  0 0 1 0 1
 0 0 1 0 0 0 3 0  0 0 3 0 
1
0 0 0 5 0 0 0 1 0 0 0 1 0 0 0 5
Therefore,  
1 0 0 0
0 1 0 0 
E −1 =  2
0 0 1 0  .

3
0 0 0 15
We can see that finding the inverse of an invertible diagonal matrix is very simple:
we just need to take the reciprocal of each main diagonal element.

136
Exercises
 
2 −2 6
1. Let A1 =  6 3 0 . Is A1 invertible? If so, find A−1
1 .
12 −12 15
 
−5 7
2. Let A2 = . Is A2 invertible? If so, find A−1
2 .
15 −21
 
2 3
3. Let A3 = . Is A3 invertible? If so, find A−1
3 .
−1 −2
 
−3 2 1
4. Let A4 =  5 −1 3 . Is A4 invertible? If so, find A−1
4 .
6 −4 −2

5. For any invertible matrices A and B, show that A−1 (B − A)B −1 = A−1 − B −1 .

6. For any invertible matrices A and B, show that (AB)−1 = B −1 A−1 .

137
Chapter 12

Determinant of Matrices

138
12.1 Properties of Determinant
In this chapter, we will learn about the determinant of a square matrix. The de-
terminant of a square matrix A is denoted as det(A). The determinant is, in some
sense, a function that assigns a number to each matrix, and it is uniquely defined by
some properties. First, we will learn those properties of determinant in this section.

Property 1: The determinant of the identity matrix is 1.

For example,  
  1 0 0
1 0
det = det 0 1 0 = 1.
0 1
0 0 1

Property 2: The determinant is linear in each row.

First, what does it mean to be linear? A function f (x) is linear if

f (x + x0 ) = f (x) + f (x0 )

and
f (nx) = nf (x)
for any number n. In high school algebra, we learned about the linear functions of
the form
f (x) = mx + b.
We can check that mx + b satisfies the conditions f (x + x0 ) = f (x) + f (x0 ) and
f (nx) = nf (x). For the determinant of a matrix, this linearity occurs in each row.
For example, we have

a + a0 b + b0
     0 0 
a b a b
det = det + det
c d c d c d

and    
a b a b
det = n det .
nc nd c d

Property 3: Switching two rows changes the sign of the determinant.

Switching two rows of a matrix once changes the sign of the determinant of the
matrix. So, switching rows an even number of times would keep the determinant
the same because (−1)n = 1 if n is even. For example, we have
   
a b c d
det = −det .
c d a b

These three properties define what a determinant is. However, from these proper-
ties, we can derive more important properties of determinant.

139
Property 4: Determinant of a diagonal matrix is the product of all elements
on the main diagonal.

Let  
d1 0 ··· 0
0 d2 ··· 0
D=. ..  .
 
.. ..
 .. . . .
0 0 ··· dn
By property 2, the determinant is linear in each row, so we can factor out d1 from
the first row, d2 from the second row,..., and dn from the n-th row:
   
d1 0 · · · 0 1 0 ··· 0
 0 d2 · · · 0  0 1 · · · 0
det  . = d1 d2 · · · dn det  . . .
   
. . . . . ... 
 .. .. .. ..   .. ..
 

0 0 · · · dn 0 0 ··· 1

= d1 d2 · · · dn det(In ).

By property 1, we know that det(In ) = 1, so


 
d1 0 · · · 0
 0 d2 · · · 0 
det  . ..  = d1 d2 · · · dn .
 
.. . .
 .. . . . 
0 0 ··· dn

Property 5: If a matrix has two or more same rows, then the determinant is 0.

If a matrix A has two same rows, then switching those two rows would still give
the same matrix A. By property 3, we know that switching two rows changes the
sign of the determinant, so we have det(A) = −det(A), which means det(A) = 0.

Property 6: Adding a multiple of another row to a row does not change the
determinant.

Say we have a matrix A with three rows:


 
~a1T
|

A= ~a2T .
 
|

~a3T
|

Then, adding k times row 2 to row 3 gives


 
~a1T
|

~a2T .
 
|


k~a2T + ~a3T
|

140
Applying linearity in the third row, we have
     
~a1T ~a1T ~a1T

|
det  ~a2T  = det  ~a2T  + det  ~a2T
       

|

T T
k~a2 + ~a3 k~a2T ~a3T
|

|
   
~a1T ~a1T

|
= k det  ~a2T  + det  ~a2T  .
   

|
~a2T ~a3T

|
By property 5, we know that
 
~a1T
|

|
det  ~a2T  = 0
 | 

|
~a2T
|

|
because there are two same rows, so
   
~a1T ~a1T
|

|
det  ~a2T  = det  ~a2T  .
   
|

k~a2T + ~a3T ~a3T |


|

|
So, we can see how adding a multiple of another row to a row does not change the
determinant.

Property 7: If there is at least one 0 row in the matrix, then the determinant is 0.

Say we have a 2 × 2 matrix with a 0 row:


 
0 0
.
c d

Since 0 = n · 0 for any number n, applying linearity in the first row gives
   
0 0 n·0 n·0
det = det
c d c d
 
0 0
= n det .
c d

So, the determinant must be 0.

Property 8: If not all rows are linearly independent, then the determinant is
0.

As explained in section 9.2 of chapter 9, if there are some linearly dependent rows,
then they will be cancelled out and become 0 rows after doing some row opera-
tions, specifically by adding multiples of some rows to other rows. By property

141
6, adding a multiple of another row to a row does not change the determinant.
So, the determinant of a matrix with linearly dependent rows is the same as the
determinant of a matrix with 0 rows. By property 7, the determinant of a matrix
with 0 rows is 0. So, the determinant of a matrix with linearly dependent rows is 0.

Property 9: For a square matrix A, det AT = det(A).

We can see why det AT = 0 when det (A) = 0. By property 8, when not all
rows of A are linearly independent, det (A) = 0. A square matrix A has the same
number of rows and columns, and the number of linearly independent rows is the
same as the number of linearly independent columns. So, if not all rows of A are
linearly independent, then not all columns of A are linearly independent. Since
columns of A are rows of AT , that means not all rows of AT are linearly indepen-
dent. So, det AT = 0 by property 8.

It is harder to see why det AT = det(A) when det(A) is non-zero. We can easily
see that the formula is true when A is a diagonal matrix. A diagonal matrix is also
symmetric, so AT = A, and det AT = det(A). If all columns (or rows) of A are
linearly independent, then A can be reduced to a diagonal matrix by doing some
row operations, and there are determinant properties corresponding to the
 row op-
erations. So, that gives an intuitive sense for why the formula det AT = det(A)
should be true. However, the precise proof of the formula is quite complicated, so
we will not prove it here.

Now note that by applying the previous properties to the rows of AT , we get
similar properties for columns of A because rows of AT are columns of A.

Property 10: The determinant is linear in each column.

Property 11: Switching two columns changes the sign of the determinant.

Property 12: If a matrix has two or more same columns, then the determinant is 0.

Property 13: Adding a multiple of another column to a column does not change
the determinant.

Property 14: If there is at least one 0 column in the matrix, then the deter-
minant is 0.

Property 15: If not all columns are linearly independent, then the determinant
is 0.

In section 11.1 of chapter 11, we learned that A is not invertible when not all
columns of A are linearly independent. Here, we know that det(A) = 0 when not
all columns of A are linearly independent. So, A is a singular matrix if det(A) = 0.

Property 16: For square matrices A and B, det(AB) = det(A)det(B).

142
We can check that this is true when det(A) = 0 or det(B) = 0, which is when
det(A)det(B) = 0. Say det(A) = 0. By property 15, det(A) = 0 when not all
columns of A are linearly independent. By using method 1 of matrix multiplica-
tion, which is multiplying A to each column of B to obtain each column of the
product AB, we can see that the columns of AB are some linear combinations of
the columns of A. So, if not all columns of A are linearly independent, then not all
columns of AB are linearly independent. By property 15, det(AB) = 0. Similarly,
we can check that det(AB) = 0 when det(B) = 0.

When det(A) 6= 0 and det(B) 6= 0, we can have an intuitive sense for why the
formula det(AB) = det(A)det(B) should
 be true just like how we had an intuitive
sense for why the formula det AT = det(A) should be true. First, we can check
that the formula det(AB) = det(A)det(B) is true when A is a diagonal matrix.
Let us use 3 × 3 matrices as an example: let
 
d1 0 0
A =  0 d2 0 
0 0 d3

and
~b T
 
|

1
B=
 ~b T .

|

2
~b T
|

By using method 2 of matrix multiplication, which is multiplying each row of A to


B to obtain each row of the product AB, we get
 ~T  
d1~b1T

b1

d1 0 0
|

AB =  0 d2 0   ~b2T  =  d2~b2T ,
   
|

0 0 d3 ~b T d3~b T
|

3 3

so
d1~b1T
 
|

det(AB) = det  d2~b T  .


 
|

2
~
d3 b3T
|

Since the determinant is linear in each row, we can factor out d1 from the first row,
d2 from the second row, and d3 from the third row:

d1~b1T ~b T
   
|

1
det  d2~b2T  = d1 d2 d3 det  ~b2T  = d1 d2 d3 det(B).
   
|

d3~b T ~b T
|

3 3

Since  
d1 0 0
det(A) = det  0 d2 0  = d1 d2 d3
0 0 d3

143
by property 4, we indeed have det(AB) = det(A)det(B). In general,
   ~b T 
d1 0 · · · 0

|
1
 ~ T 
 0 d2 · · · 0  b


|
 2 
det 

.. .. . . ..   . 

 . . . . 
 .. 

0 0 · · · dn ~b T

|
n

d1~b1 T
 

|
 d2~b2T
 
|

|

=det 

 .. 

 . 
~
dn bn T
|

|
~b T
| 

|
1
 ~ T 
 b2
|

|

=d1 d2 · · · dn det  
 .. 

 . 
~b T
|

n
~b T
   
d1 0 · · · 0 |

|
1
 0 d2 · · · 0    ~b T 
|

|
  2 
=det  . ..  det   .
 
.. . .  ..
 .. . . .  


.


0 0 · · · dn ~b T
|

Thus, the formula det(AB) = det(A)det(B) is indeed true when A is a diagonal


matrix. Then, if all columns of A are linearly independent, then A can be reduced
to a diagonal matrix by doing some row operations, and there are determinant
properties corresponding to the row operations. So, that gives an intuitive sense
for why the formula det(AB) = det(A)det(B) should be true.

12.2 Determinant Formula for 2 x 2 Matrices


In this section, we will use some properties of determinant to derive the determinant
formula for any 2 × 2 matrix  
a11 a12
.
a21 a22
Since      
a11 a12 = a11 0 + 0 a12 ,
applying linearity in the first row gives
     
a11 a12 a11 0 0 a12
det = det + det . (12.1)
a21 a22 a21 a22 a21 a22
Similarly, applying linearity in the second row gives
     
a11 0 a11 0 a11 0
det = det + det
a21 a22 a21 0 0 a22

144
and      
0 a12 0 a12 0 a12
det = det + det .
a21 a22 a21 0 0 a22
By property 14,    
a11 0 0 a12
det = det =0
a21 0 0 a22
because there is a 0 column, so
   
a11 0 a11 0
det = det , (12.2)
a21 a22 0 a22
and    
0 a12 0 a12
det = det . (12.3)
a21 a22 a21 0
Substituting (12.2) and (12.3) into (12.1),
     
a11 a12 a11 0 0 a12
det = det + det .
a21 a22 0 a22 a21 0
By property 4,  
a11 0
det = a11 a22 .
0 a22
By property 3 and property 4,
   
0 a12 a21 0
det = −det = −a21 a12 .
a21 0 0 a12
Therefore,  
a11 a12
det = a11 a22 − a21 a12 .
a21 a22
Note that when we decompose a determinant into smaller determinant matrices
using linearity in each row, non-zero determinants occurred when each row and
each column only has one element of the original matrix, such as
 
a11 0
det
0 a22
and  
0 a12
det .
a21 0
When some column or row has more than one element from the original matrix,
such as  
a11 0
det
a21 0
and  
0 a12
det ,
0 a22
there will be some 0 column or row, which makes the determinant 0. In the next
section, we will derive the determinant formula for 3 × 3 matrices in a similar
manner.

145
12.3 Determinant Formula for 3 x 3 Matrices
Remember that when we decompose a determinant into smaller determinant ma-
trices using linearity in each row, non-zero determinants occur only when each row
and each column only has one element of the original matrix. So, we just need to
care about the determinant matrices whose columns and rows each has only one
element:
 
a11 a12 a13
det a21 a22 a23 
a31 a32 a33

   
a11 0 0 a11 0 0
=det  0 a22 0  + det  0 0 a23 
0 0 a33 0 a32 0

   
0 a12 0 0 a12 0
+ det a21 0 0  + det  0 0 a23 
0 0 a33 a31 0 0

   
0 0 a13 0 0 a13
+ det  0 a22 0  + det a21 0 0  . (12.4)
a31 0 0 0 a32 0

By property 4,  
a11 0 0
det  0 a22 0  = a11 a22 a33 . (12.5)
0 0 a33
By property 3 and property 4, switching row 2 and row 3 gives
   
a11 0 0 a11 0 0
det  0 0 a23  = −det  0 a32 0  = −a11 a23 a32 . (12.6)
0 a32 0 0 0 a23

By property 3 and property 4, switching row 1 and row 2 gives


   
0 a12 0 a21 0 0
det a21 0 0  = −det  0 a12 0  = −a12 a21 a33 . (12.7)
0 0 a33 0 0 a33

By property 3 and property 4, switching row 1 and row 2 and then switching row
1 and row 3 give
   
0 a12 0 a31 0 0
det  0 0 a23  = det  0 a12 0  = a12 a23 a31 . (12.8)
a31 0 0 0 0 a23

146
By property 3 and property 4, switching row 1 and row 3 gives
   
0 0 a13 a31 0 0
det  0 a22 0  = −det  0 a22 0  = −a13 a22 a31 . (12.9)
a31 0 0 0 0 a13

By property 3 and property 4, switching row 2 and row 3 and then switching row
1 and row 3 give
   
0 0 a13 a21 0 0
det a21 0 0  = det  0 a32 0  = a13 a21 a32 . (12.10)
0 a32 0 0 0 a13

Substituting (12.5), (12.6), (12.7), (12.8), (12.9), and (12.10) into (12.4),
 
a11 a12 a13
det a21 a22 a23  =a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33
− a13 a22 a31 − a12 a21 a33 − a11 a23 a32 .

For higher order matrices such as 4 × 4 matrices or 5 × 5 matrices, we can find the
determinants in a similar manner by breaking them down into smaller determinant
matrices using linearity of determinant in each row, or we can also do some row
operations to make the matrix simpler using the determinant properties related to
row operations.

147
Exercises

1. For a square matrix A, what is det A3 in terms of det(A)?

2. For a n × n square matrix A, what is det (5A) in terms of det(A)?

3. For an invertible matrix A, what is det A−1 in terms of det(A)?




 
1 5
4. det =?
−2 7
 
8 3
5. det =?
6 1
 
2 −3 1
6. det  0 5 −6 =?
−1 −2 0
 
2 0 0 0
 6 −5 0 0 
7. det 
−8
 =? (Hint: do row operations)
1 3 0 
9 20 −12 10
 
−1 −3 2 8
 0 6 5 −3 
8. det   =? (Hint: do row operations)
 0 0 −7 −2 

0 0 0 3

148
Chapter 13

Systems of Linear Equations

149
13.1 Systems of Linear Equations as Matrix Lin-
ear Equations

In high school algebra, we learned about systems of linear equations such as


3x + 5y − 2z
 =8
x − 6y + 2z = −5 . (13.1)

7x + y + 9z = 11

The system of linear equations in (13.1) has three equations and three unknown
variables. In high school algebra, we mostly learned about systems where the
number of equations is the same as the number of unknowns. However, this is not
always necessarily the case. For example, we can have a system of equations like


5x − 8y
 =3
x + 2y = −3 . (13.2)

7x − 5y = −2

Any system of linear equations can be written as a matrix linear equation of the
form

A~x = ~b,

where A is the matrix containing all the coefficients on the left-hand side of the
system, ~x is the column vector containing all the unknown variables, and ~b is the
column vector containing all the numbers on the right-hand side of the system. For
example, the system of equations in (13.1) can be written as

   
3x + 5y − 2z 8
 x − 6y + 2z  = −5
7x + y + 9z 11
       
3x 5y −2z 8
 x  + −6y  +  2z  = −5
7x y 9z 11
       
3 5 −2 8
x 1 + y −6 + z  2  = −5
7 1 9 11
    
3 5 −2 x 8
1 −6 2  y  = −5 ,
7 1 9 z 11

150
and the system of equations in (13.2) can be written as
   
5x − 8y 3
 x + 2y  = −3
7x − 5y −2
     
5x −8y 3
 x  +  2y  = −3
7x −5y −2
     
5 −8 3
x 1 + y  2  = −3
7 −5 −2
   
5 −8   3
1 2  x = −3 .
y
7 −5 −2

In this chapter, we will learn about systems of linear equations more generally as
matrix linear equations.

13.2 Characteristic of Matrix Linear Equations


When we learned about systems of linear equations, we learned that a system can
have no solution, a unique solution (only one solution), or infinitely many solutions.

No Solution vs Has Solution(s)

Let us discuss when the equation

A~x = ~b (13.3)

has or does not have a solution. Remember from chapter 7 that multiplying a ma-
trix A by a column vector gives some linear combination of columns of A. So, the
equation (13.3) has at least one solution when ~b is a linear combination of columns
of A. Then, remember from chapter 10 that the column space of A is the space of
all linear combinations of columns of A. So, the equation (13.3) has at least one
solution when ~b is in the column space of A.

Now what condition would guarantee that the vector ~b would always be in the
column space of A? That is, what condition would guarantee that the equation
(13.3) would definitely have at least one solution? The equation (13.3) would defi-
nitely have at least one solution when the rank of A is equal to the number of rows
of A.

Say A is a m × n matrix. Remember from chapter 10 that the column space


of a m × n matrix is a subspace of Rm . If the rank of A is equal to m, which is
the number of rows, then there are m linearly independent columns. As we learned
in chapter 10, the column space of A is the span of linearly independent columns
of A, so it would be the span of m linearly independent columns, which means it

151
is a m-dimensional vector space. Thus, if the rank of A is equal to the number of
rows of A, then the column space of A would be a m-dimensional subspace of Rm ,
which is the whole Rm space itself. Since the column space of A is the whole Rm
space, any m-dimensional vector ~b would always be in the column space of A, and
the equation (13.3) would definitely have at least one solution.

On the other hand, if the rank of A is less than the number of rows of A, then the
equation (13.3) might or might not have solution depending on whether ~b is in the
column space of A.

Unique Solution vs Infinite Solutions

When the equation (13.3) has solution, it could have a unique solution or infinitely
many solutions. Let us discuss when it has a unique solution or infinitely many
solutions.

The equation (13.3) can have infinitely many solutions when there are non-zero
vectors in the null space of A. Say there is a vector ~x = ~vp that satisfies equation
(13.3):
A~vp = ~b.
Let ~vn be a non-zero vector in the null space of A. Then, ~x = ~vp + ~vn is another
vector which satisfies equation (13.3) because

A(~vp + ~vn ) = A~vp + A~vn = ~b + 0 = ~b.

As we saw in chapter 10, if there are some non-zero vectors in the null space of A,
then their span is also in the null space of A. So, there would be infinitely many
non-zero vectors in the null space of A, and we can have infinitely many solutions
for the equation (13.3).

As we learned in chapter 10, there are non-zero vectors in the null space of A
when not all columns of A are linearly independent, that is when the rank of A is
less than the number of columns of A. So, the equation (13.3) can have infinitely
many solutions if the rank of A is less than the number of columns of A. On the
other hand, the equation (13.3) can only have at most one solution if the rank of
A is equal to the number of columns of A.

13.3 Solving Matrix Linear Equations


Now that we learned about the characteristic of a matrix linear equation based on
the rank of the matrix on the left-hand side, let us learn how to solve a matrix
linear equation.

We will solve a matrix linear equation

A~x = ~b

152
by using rref(A). We can obtain rref(A) by doing some row operations to A, which
is the same as multiplying some elementary matrices to the left of A. Then, to keep
the equation the same, we need to multiply the same elementary matrices to the
left of ~b on the right-hand side as well, so we need to do the same row operations
to ~b. (We can think of a m-dimensional vector as a m × 1 matrix.)

So, to solve a matrix linear equation, we first attach the vector ~b to the right
of A, and then we do some row operations to reduce A to rref(A). At the same
time, the vector ~b will become some new vector ~b0 . Then, we will solve the equation

rref(A)~x = ~b0 ,

which is much simpler than the original equation.

Also, notice that reducing


A~x = ~b
to
rref(A)~x = ~b0
by doing row operations is like how we learned to solve systems of linear equations
in high school algebra. In high school, we learned to solve a system of linear equa-
tions by multiplying an equation by a number, which is the same as multiplying
a row of a matrix by a scalar, and adding a multiple of an equation to another
equation, which is the same as adding a multiple of a row to another row.

Let us do some examples to get familiar with this.


    
2 1 x 7
Example 13.1: Solve = .
6 −3 y 3
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:  
2 1 7
.
6 −3 3
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Multiplying first row by 12 ,

1 12 7
   
2 1 7
→ 2 .
6 −3 3 6 −3 3

Adding −6 times row 1 to row 2,

1 12 7 1 7
   
1
2 → 2 2 .
6 −3 3 0 −6 −18

Multiplying row 2 by − 16 ,
1 7 1 7
   
1 1
2 2 → 2 2 .
0 −6 −18 0 1 3

153
Adding − 21 times row 2 to row 1,
1 7
   
1 1 0 2
2 2 → .
0 1 3 0 1 3

So, the original equation is equivalent to


    
1 0 x 2
= ,
0 1 y 3

which gives    
x 2
=
y 3
when we do the multiplication on the left-hand side. This equation has a unique
solution.
    
−2 3 −6 x 8
Example 13.2: Solve  10 −9 18  y  = −2.
8 −6 12 z 5
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:  
−2 3 −6 8
 10 −9 18 −2  .
8 −6 12 5
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Multiplying row 1 by − 12 ,

1 − 32
   
−2 3 −6 8 3 −4
 10 −9 18 −2 → 10 −9
  18 −2  .
8 −6 12 5 8 −6 12 5

Adding −10 times row 1 to row 2,

1 − 32 − 32
   
3 −4 1 3 −4
 10 −9 18 −2  →  0 6 −12 38  .
8 −6 12 5 8 −6 12 5

Adding −8 times row 1 to row 3,

1 − 32 − 32
   
3 −4 1 3 −4
 0 6 −12 38  →  0 6 −12 38  .
8 −6 12 5 0 6 −12 37

Multiplying row 2 by 61 ,

− 32 − 32
   
1 3 −4 1 3 −4
19
 0 6 −12 38  →  0 1 −2 3
.
0 6 −12 37 0 6 −12 37

154
3
Adding 2 times row 2 to row 1,

− 32 11
   
1 3 −4 1 0 0 2
 0 1 −2 193
→ 0 1 −2 19
3
.
0 6 −12 37 0 6 −12 37
Adding −6 times row 2 to row 3,
11 11
   
1 0 0 2 1 0 0 2
19 19
 0 1 −2 3
→ 0 1 −2 3
.
0 6 −12 37 0 0 0 −1
So, the original equation is equivalent to
     11 
1 0 0 x 2
0 1 −2 y  =  19  ,
3
0 0 0 z −1
which gives    11 
x 2
y − 2z  =  19 
3
0 −1
when we do the multiplication on the left-hand side. Therefore, this equation has
no solution because 0 6= −1.
 
  x  
3 9 6   9
Example 13.3: Solve y = .
5 15 7 12
z
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:  
3 9 6 9
.
5 15 7 12
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Multiplying row 1 by 13 ,
   
3 9 6 9 1 3 2 3
→ .
5 15 7 12 5 15 7 12
Adding −5 times row 1 to row 2,
   
1 3 2 3 1 3 2 3
→ .
5 15 7 12 0 0 −3 −3

Multiplying row 2 by − 13 ,
   
1 3 2 3 1 3 2 3
→ .
0 0 −3 −3 0 0 1 1
Adding −2 times row 2 to row 1,
   
1 3 2 3 1 3 0 1
→ .
0 0 1 1 0 0 1 1

155
So, the original equation is equivalent to
 
  x  
1 3 0   1
y = ,
0 0 1 1
z

which gives    
x + 3y 1
=
z 1
when we do the multiplication on the left-hand side. So, we have

x + 3y = 1 ⇔ x = 1 − 3y

and
z = 1.
Therefore, the solutions are
       
x 1 − 3y 1 −3
y  =  y  = 0 + y  1 
z 1 1 0

for any real numbers y. This equation has infinitely many solutions.
   
1 −1   6
x
Example 13.4: Solve  3 5 =  10 .
y
−2 6 −16
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:  
1 −1 6
 3 5 10  .
−2 6 −16
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Adding −3 times row 1 to row 2,
   
1 −1 6 1 −1 6
 3 5 10  →  0 8 −8  .
−2 6 −16 −2 6 −16

Adding 2 times row 1 to row 3,


   
1 −1 6 1 −1 6
 0 8 −8  →  0 8 −8  .
−2 6 −16 0 4 −4

Multiplying row 2 by 81 ,
   
1 −1 6 1 −1 6
 0 8 −8  →  0 1 −1  .
0 4 −4 0 4 −4

156
Adding row 2 to row 1,
   
1 −1 6 1 0 5
 0 1 −1  →  0 1 −1  .
0 4 −4 0 4 −4

Adding −4 times row 2 to row 3,


   
1 0 5 1 0 5
 0 1 −1  →  0 1 −1  .
0 4 −4 0 0 0

So, the original equation is equivalent to


   
1 0   5
x
0 1 = −1 ,
y
0 0 0

which gives    
x 5
y  = −1
0 0
when we do the multiplication on the left-hand side. Therefore, the solution is
   
x 5
= .
y −1

This equation has a unique solution.


 
  x  
1 5 3 6   6
y
Example 13.5: Solve 2 10 6 12 
z 
 = 12.
5 25 8 9 23
w
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:  
1 5 3 6 6
 2 10 6 12 12  .
5 25 8 9 23
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Adding −2 times row 1 to row 2,
   
1 5 3 6 6 1 5 3 6 6
 2 10 6 12 12  →  0 0 0 0 0 .
5 25 8 9 23 5 25 8 9 23

Adding −5 times row 1 to row 3,


   
1 5 3 6 6 1 5 3 6 6
 0 0 0 0 0 → 0 0 0 0 0 .
5 25 8 9 23 0 0 −7 −21 −7

157
Multiplying row 3 by − 17 ,
   
1 5 3 6 6 1 5 3 6 6
 0 0 0 0 0 → 0 0 0 0 0 .
0 0 −7 −21 −7 0 0 1 3 1

Adding −3 times row 3 to row 1,


   
1 5 3 6 6 1 5 0 −3 3
 0 0 0 0 0 → 0 0 0 0 0 .
0 0 1 3 1 0 0 1 3 1

Switching row 2 and row 3,


   
1 5 0 −3 3 1 5 0 −3 3
 0 0 0 0 0 → 0 0 1 3 1 .
0 0 1 3 1 0 0 0 0 0

So, the original equation is equivalent to


 
  x  
1 5 0 −3   3
0 y  
0 1 3 
z  = 1 ,
0 0 0 0 0
w

which gives    
x + 5y − 3w 3
 z + 3w  = 1
0 0
when we do the multiplication on the left-hand side. So, we have

x + 5y − 3w = 3 ⇔ x = 3 − 5y + 3w

and
z + 3w = 1 ⇔ z = 1 − 3w.
Therefore, the solutions are
         
x 3 − 5y + 3w 3 −5 3
y   y  0 1 0
 =
 z   1 − 3w  = 1 + y  0  + w −3
      

w w 0 0 1

for any real numbers y and w. This equation has infinitely many solutions.

158
Exercises
    
2 −2 6 x −8
1. Solve  6 3 0  y  =  12 .
12 −12 15 z −27
 
  x  
−3 2 9   1
2. Solve y = .
6 −4 −18 2
z
 
  x  
5 −3 2 −1   −7
y
3. Solve −2 6 −5 1 
z 
 =  13 .
7 −1 3 2 0
w
   
−6 3   5
x
4. Solve  1 −2 = −3.
y
5 1 −2

159
Appendices

160
Appendix A

Cauchy-Schwarz Inequality

In this appendix, we will prove the Cauchy-Schwarz inequality, which we used to


prove the triangle inequality for `2 norm in chapter 4. In the proof, we will use the
following two basic facts about inequality.

Fact 1: If a ≥ b and c ≥ d, then a + c ≥ b + d.

This should be intuitive because the sum of larger numbers must be larger than
the sum of smaller numbers.

Fact 2: The square of anything is larger than or equal to 0 (non-negative).

This follows from the range of parabola: x2 ≥ 0 for all real x.

Recall that the Cauchy-Schwarz inequality states that


2
a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n ≥ (a1 b1 + a2 b2 + · · · + an bn ) .
 

Let us prove the case n = 2 first, which is


2
a21 + a22 b21 + b22 ≥ (a1 b1 + a2 b2 ) .
 

Using fact 2, we have

(x − y)2 ≥ 0
x2 + y 2 − 2xy ≥ 0
x2 + y 2 ≥ 2xy. (A.1)

Let
a1
x1 = p
a21 + a22
and
b1
y1 = p
b21 + b22

162
for the inequality (A.1), we get

a21 b2 2a1 b1
+ 2 1 2 ≥p 2 . (A.2)
a21 + a22 b1 + b2 (a1 + a22 ) (b21 + b22 )

Let
a2
x2 = p
a21 + a22

and
b2
y2 = p
b21 + b22

for the inequality (A.1), we get

a22 b2 2a2 b2
+ 2 2 2 ≥p 2 . (A.3)
a21 + a22 b1 + b2 (a1 + a22 ) (b21 + b22 )

Using fact 1, adding (A.2) and (A.3) gives

a21 + a22 b2 + b22 2a1 b1 + 2a2 b2


2 2 + 12 2 ≥p 2
a1 + a2 b1 + b2 (a1 + a22 ) (b21 + b22 )
2(a1 b1 + a2 b2 )
2≥ p 2
(a1 + a22 ) (b21 + b22 )
a1 b1 + a2 b2
1≥ p 2
(a1 + a22 ) (b21 + b22 )
q
(a21 + a22 ) (b21 + b22 ) ≥ a1 b1 + a2 b2 .

Squaring both sides, we obtain the Cauchy-Schwarz inequality for n = 2:

2
a21 + a22 b21 + b22 ≥ (a1 b1 + a2 b2 ) .
 

Similarly, the general case for any natural number n can be proven by letting

a1
x1 = p 2 ,
a1 + a22 + · · · + a2n
a2
x2 = p 2 2
,
a1 + a2 + · · · + a2n
..
.
an
xn = p 2 2
,
a1 + a2 + · · · + a2n

163
and
b1
y1 = p ,
b21 + b22 + · · · + b2n
b2
y2 = p ,
b21 + b22 + · · · + b2n
..
.
bn
yn = p .
b21 + b22 + · · · + b2n

There is also a simpler proof using dot product. In chapter 5, we learned that

~u · ~v = ||~u||2 ||~v ||2 cos θ

where θ is the smaller angle between ~u and ~v . Squaring both sides,


2 2 2
(~u · ~v ) = (||~u||2 ) (||~v ||2 ) cos2 θ.

Because cos2 θ ≤ 1,
2 2 2
(~u · ~v ) ≤ (||~u||2 ) (||~v ||2 ) .
Let 

a1
 a2 
~u =  . 
 
 .. 
an
and  
b1
 b2 
~v =  .  ,
 
 .. 
bn
we obtain the Cauchy-Schwarz inequality
2
(a1 b1 + a2 b2 + · · · + an bn ) ≤ a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n .
 

164
Appendix B

Circle in Taxicab Geometry

Before discussing the circle in Taxicab geometry, let us talk about the circle in
Euclidean geometry first. We know how a circle looks.

Figure B.1: Circle in Euclidean geometry

However, a circle only looks this way in Euclidean geometry. How a circle looks
differs depending on which type of geometry it is in, but the definition of a circle
is always the same.

Definition of circle: A circle is a set of all points equidistant from the center.

In figure B.1, we can see that the Euclidean distances between the center and
all points on the circle are the same since the Euclidean distance between two
points is defined to be the length of the line segment connecting those points.

r r
r

Figure B.2: Circle in Taxicab geometry

165
However, in Taxicab geometry, the definition of distance changes, so the shape of a
circle changes. In Taxicab geometry, a circle is a set of all points having the same
L1 distance from the center. That is why a circle in Taxicab geometry has the
shape shown in figure B.2.

We can think about this graphically as well. Let us set the center at (0, 0). The
Euclidean distance between a point (x, y) and the origin is
p
x2 + y 2 .

A circle with radius r in Euclidean geometry is a set of all points (x, y) whose
Euclidean distance from the center (0, 0) is r.
p So, to graph the circle with radius r
in Euclidean geometry, we graph the curve x2 + y 2 = r or x2 + y 2 = r2 .

y
r

x
−r r

−r

Figure B.3: Graph of x2 + y 2 = r2

In Taxicab geometry, the L1 distance between a point (x, y) and the origin is

|x| + |y|.

A circle with radius r in Taxicab geometry is a set of all points (x, y) whose L1
distance from the center (0, 0) is r. So, to graph the circle with radius r in Taxicab
geometry, we graph the curve |x| + |y| = r.

y
r

x
−r r

−r

Figure B.4: Graph of |x| + |y| = r

166
Appendix C

Beyond Vectors and


Matrices: Blades and
Tensors

In chapter 1, we learned that vectors are arrows with directions. That means vec-
tors are one-dimensional oriented objects. They are one-dimensional because they
are arrows; they are oriented because they have directions (or orientations).

Figure C.1: A vector

If we have one-dimensional oriented objects, then do we have two-dimensional ori-


ented objects? The answer is yes!

~u ∧ ~v ~v

~u

Figure C.2: A 2-blade or a bivector

We can have an oriented parallelogram as in figure C.2, and the arc with an arrow
at the end indicates the orientation of that parallelogram. Such two-dimensional
oriented object is called a 2-blade or a bivector. (A vector is a 1-blade because it
is a one-dimensional oriented object.)

167
The symbol “∧” is the wedge product, which is a type of product used in an
advanced field of algebra in mathematics called exterior algebra. It is just like how
in linear algebra we learned a type of product called dot product (in chapter 5).
The wedge product of any two vectors gives a 2-blade.

~u

~v ~v ∧ ~u

Figure C.3: ~v ∧ ~u has the opposite orientation

In figure C.2, the wedge product ~u ∧ ~v gives a parallelogram oriented counter-


clockwise. Since the orientation matters, if we change the order of wedge product,
the orientation of ~v ∧ ~u is clockwise.

For higher dimensions, if we want to have a higher-dimensional oriented object,


we just need to take the wedge product of more vectors. For example, the wedge
product of three vectors gives an oriented parallelepiped. Such a three-dimensional
oriented object is called a 3-blade or a trivector. A parallelepiped is a three-
dimensional box whose surfaces are all parallelograms.

Figure C.4: A parallelepiped (not oriented)

In general, a n-dimensional oriented object, called a n-blade, is the wedge product


of n vectors.

Other than viewing a vector as an arrow, we can also view a vector as a list of
numbers such as  
3
 5 .
−10
Then, in chapter 6, we learned that a matrix is a block of numbers with two di-
mensions: rows and columns.

168
So, we can think of vectors as one-dimensional collections of numbers and matrices
as two-dimensional collections of numbers. Then, do we have three-dimensional or
any n-dimensional collections of numbers? The answer is, again, yes! For example,
we can have a box of numbers with three dimensions.

rank-1 tensor rank-2 tensor rank-3 tensor


(vector) (matrix)

Figure C.5: Tensors

Such box of numbers with three dimensions is called a rank-3 tensor. In this sense,
a vector is a rank-1 tensor, and a matrix is a rank-2 tensor. In general, a n-
dimensional collection of numbers is a rank-n tensor.

The concepts of blades and tensors are some really advanced concepts in the al-
gebra field of mathematics, so of course they are way beyond the scope of this
book. The purpose of this appendix is to give the readers a general idea that the
concepts in linear algebra we learn in this book can be extended and that algebra
in mathematics is way more than just high school algebra.

169
Solutions to Exercise
Problems

Chapter 1
1. Vectors: A, B, D, F; Scalars: C, E

2.

y-axis

x-axis
1 2 3
(2, −1)
−1

−2

−3

The vector represents the point (2, −1)

3.

y-axis
(−3, 4)
4

x-axis
−4 −3 −2 −1

The vector represents the point (−3, 4)

171
4.
y-axis

x-axis
−3 −2 −1

−1

−2

−3
(−2, −3)

The vector represents the point (−2, −3)

5. It represents the point (1, 3, 6). Its y-coordinate is 3.

6. It represents the point (−4, 5, −2). Its x-coordinate is −4.

7. It represents the point (8, 15, −20). Its z-coordinate is −20.

8. 5 dimensions

Chapter 2
 
−5
1.  5 
−3
 
2
1
2. 
−1

−5
 
16
3.  30 
−12
 
−4
 12 
 
−16
4. 
 0 

 
 −8 
8
 
3
5.
11

172
 
0
6. −8
6

7. Let  
v1
 v2 
~v =  .  .
 
 .. 
vn

Then,
     
v1 (a + b) · v1 a · v1 + b · v 1
 v2   (a + b) · v2   a · v2 + b · v2 
(a + b) · ~v =(a + b) ·  .  =  =
     
.. ..
 ..  

.   . 
vn (a + b) · vn a · vn + b · v n
       
a · v1 b · v1 v1 v1
 a · v2   b · v2   v2   v2 
=  .  +  .  = a ·  .  + b ·  .  = a~v + b~v .
       
 ..   ..   ..   .. 
a · vn b · vn vn vn

Chapter 3
       
−5 1 4 0
1.  0  = 3 4 − 2 6 + 3 0
8 3 2 1
           
1 5 2 1 
 5 2 
1 1    
 =   − 2  , so   ∈ span   ,  0  .
0 1 1

2. 
 0  −2 −1 0 −2 −1

 
−2 4 3 −2 4 3
 

3. There  can
 only be
 at
 mostthree
 linearly independent three-dimensional vec-
5 −2 6 −1
tors, so 1,  3 , 0, and −3 are linearly dependent.
4 8 9 7

4. B. It does not contain the origin (0, 0).

5. 5

6. It is not a set of basis vectors for R2 . Basis vectors have to be linearly in-

173
dependent, but    
−3 3 −2
= .
9 2 6

7. It is the span of three linearly independent vectors, so the dimension of the


vector space is three. It is the span of three-dimensional vectors, so it is a subspace
of R3 .

8. The first two vectors are linearly independent, and the third vector is lin-
early dependent with respect to the first two. It is the span of two linearly in-
dependent vectors, so the dimension of the vector space is two. It is the span of
four-dimensional vectors, so it is a subspace of R4 .

Chapter 4
1. ||~u||1 = 10

2. ||~v ||2 = 13
 √ 
1/ √6
w
~ 1
3. ŵ = =√ w ~ = −2/√6
||w||
~ 2 6 −1/ 6

4. √ √ √
a. ||~u||2 = 10, ||~v ||2 = 21, ||~u + ~v ||2 = 11
√ √ √
b. 11 < 10 + 21

 
12
5.
−9 2
 
7

6. −1
10
2

7.
a. | − 15| + |9| = 24 cm
 
−15
b.
9 1

Chapter 5
 
1. ~v T = 2 5 0 −8

174
2. Let  
u1
 u2 
~u =  . 
 
 .. 
un
and  
v1
 v2 
~v =  .  .
 
 .. 
vn
Then,
   T  T
u1 v1 u1 + v1
 u2   v2   u2 + v2 
(~u + ~v )T =  .  +  .  = 
     
..
 ..   .. 

 . 
un vn un + vn
 
= u1 + v1 u2 + v2 · · · un + vn
vn = ~u T + ~v T .
   
= u1 u2 · · · un + v1 v2 · · ·

3. 12

4. 10

5. −9
 
2
6. arccos √ ≈ 26.57◦
5
 
9
7. arccos √ ≈ 76.56◦
10 15
8. They are orthogonal and orthonormal.

9. They are not orthogonal.

10. They are orthogonal but not orthonormal.

Chapter 6
1. 3 × 4

2. C.

3. a12 = −7, a21 = −1, a23 = 5

175
4.
a. The elements on the main diagonal are 2, 0, −9.

b. Tr(B) = −7

 
3 8 1
5. C T =  0 −5 6
−7 2 9

6. When we take the transpose, the columns become rows. Then, when we take
the transpose again, the rows become columns. So, (AT )T = A.

7.
A. Upper-triangular matrix

B. Lower-triangular matrix, upper-triangular matrix, symmetric matrix, skew-


symmetric matrix, diagonal matrix

C. Lower-triangular matrix

D. Symmetric matrix

8. Since            
1 2 1 0 2 0
−1 · 1 = −1 · −1 = 1 · −1 = 0,
−1 1 −1 1 1 1
the three vectors are orthogonal to each other. An orthogonal matrix is a matrix
whose columns are orthonormal vectors, so we need to normalize the vectors:
 √   √   
1/ √3 2/√6 0√
−1/ 3 , 1/ 6 , −1/ 2 .
√ √ √
−1/ 3 1/ 6 1/ 2
Using these three orthonormal column vectors, we can have a 3 × 3 orthogonal
matrix such as  √ √ 
1/ √3 2/√6 0√
−1/ 3 1/ 6 −1/ 2 .
√ √ √
−1/ 3 1/ 6 1/ 2
The order of the column vectors does not matter, so we can have many more
orthogonal matrices from these column vectors as well.

Chapter 7
 
−2 14
1.
19 25

176
 
−1 −2
2. −5 10 
4 −77
 
19
3.
0

 
4. −2 10 22

 
−25
5. −73
3
          
a11 a12 ··· a1n 0 a11 a12 a1n 0
 a21 a22 ··· a2n  0  a21   a22   a2n  0
6.  . ..   ..  = 0  ..  + 0  ..  + · · · + 0  ..  =  .. 
          
.. ..
 .. . . .   .   .   .   .  .
am1 am2 ··· amn 0 am1 am2 amn 0
 
4 −6 −2 6
7. −2 18 −4 52 
4 −30 6 −82
 
12 −2 6
8.
6 −10 0
 
8 8 −4
9. −1 −3 2
3 −16 −6

10. The first column of the product is


          
c1 0 ··· 0 d1 c1 0 0 c1 d1
0 c2 ··· 0  0 
  0 c2  0  0 
= d + 0 + · · · + 0  ..  =  ..  .
        
 .. .. .. ..   ..  1  ..   .. 
. . . .   .   .   .   .   . 
0 0 ··· cn 0 0 0 cn 0

The second column of the product is


          
c1 0 ··· 0 0 c1 0 0 0
0 c2 ··· 0  d2 
  0 c2   0  c2 d2 
..   ..  = 0  ..  + d2  ..  + · · · + 0  ..  =  ..  .
        
 .. .. ..
. . . .  .  . . .  . 
0 0 ··· cn 0 0 0 cn 0

177
Similar pattern continues, and the n-th column of the product is
          
c1 0 · · · 0 0 c1 0 0 0
 0 c2 · · · 0   0  0 c2  0  0 
.   .  = 0  ..  + 0  ..  + · · · + dn  ..  =  .
          
 .. .. . . ..
. . . ..   ..  . . .  . 
0 0 ··· cn dn 0 0 cn cn dn
Thus,
    
c1 0 ··· 0 d1 0 ··· 0 c1 d1 0 ··· 0
0 c2 ··· 0 0 d2 ··· 0  0 c2 d 2 ··· 0 
..  =  .. .
    
 .. .. .. ..   .. .. .. .. .. ..
. . . .  . . . .  . . . . 
0 0 ··· cn 0 0 ··· dn 0 0 ··· cn dn

11. If S is symmetric, then S T = S. To show AT A is symmetric, we need to show


T
AT A = AT A:
T
AT A = AT (AT )T = AT A.

Chapter 8
1.  
−1 5 2
−2 8 −3
a)  
7 −9 10 
3 0 6
 
1 0 0 0
0 0 0 1
b) EA = 
0

0 1 0
0 1 0 0

2.  
6 −4 2 0
a)
5 −3 6 2
 
2 0
b) EB =
0 1

3.  
3 5 0
a) 3 −5 6
1 2 −1
 
1 0 −3
b) EC = 0 1 0
0 0 1

178
4.  
6 −8
a)  1 6
−2 7
    
1 0 0 0 0 1 0 0 1
b) ED = 0 1 1 0 1 0 = 1 1 0
0 0 1 1 0 0 1 0 0

Chapter 9
1.  
1 0 0
a) rref(A1 ) = 0 1 0
0 0 1
b) Column 1, column 2, and column 3

c) 3

d) 3

e) 3

2.
− 23
 
1 −3
a) rref(A2 ) =
0 0 0
b) Column 1

c) 1

d) 1

e) 1

3.  
1 0 0 0
a) rref(A3 ) = 0 1 0 1
0 0 1 1
b) Column 1, column 2, and column 3

c) 3

179
d) 3

e) 3

4.  
1 0
a) rref(A4 ) = 0 1
0 0
b) Column 1 and column 2

c) 2

d) 2

e) 2

Chapter 10
1.      
 2 −2 6 
a) span  6  ,  3  ,  0  ; 3 dimensions
12 −12 15
 

     
 1 0 0 
b) span 0 , 1 , 0 ; 3 dimensions

0 0 1
 

 
 0 
c) 0 ; 0 dimension
0
 

 
 0 
d) 0 ; 0 dimension
0
 

2.  
−3
a) span ; 1 dimension
6
 
 1 
b) span − 23  ; 1 dimension
−3
 

180
 2   
 3 3 
c) span  1  , 0 ; 2 dimensions
0 1
 

 
2
d) span ; 1 dimension
1

3.      
 5 −3 2 
a) span −2 ,  6  , −5 ; 3 dimensions
7 −1 3
 

     

 1 0 0 
     
0 ,   , 0 ; 3 dimensions
1

b) span 0 0 1

 
0 1 1
 

 
 0 
 
 −1
c) span   ; 1 dimension
 −1 
  

1
 

 
 0 
d) 0 ; 0 dimension
0
 

Chapter 11
 5 1 1

− 42 9 21
1. Invertible; A−1
 5 1 2 
1 =  21 9 − 21 
2 1
7 0 − 21
2. Not invertible
 
2 3
3. Invertible; A−1
3 =
−1 −2
4. Not invertible

5.

A−1 (B − A)B −1 = A−1 B − A−1 A B −1




= A−1 BB −1 − A−1 AB −1
= A−1 I − IB −1
= A−1 − B −1

181
6. By definition of matrix inverse, we need to have

(AB)−1 AB = AB(AB)−1 = I.

We can check that


B −1 A−1 AB = B −1 IB = B −1 B = I
and
ABB −1 A−1 = AIA−1 = AA−1 = I.
So, (AB)−1 = B −1 A−1 .

Chapter 12
 3
1. det A3 = [det (A)] (Use property 16)

2. det (5A) = 5n det (A) (Use property 2)


1
3. det A−1 det (A) = 1 ⇔ det A−1 =
 
(Use property 16)
det (A)
4. 17

5. −10

6. −37

7. −300

8. 126

Chapter 13
   
x 1
1. y  =  2 
z −1
2. No solution
     
x 0 0
 y  3 −1
3. 
 z  = 1 + w −1
    

w 0 1
4. No solution

182
Acknowledgements

In this section, I would like to mention some people whom I would like to thank
for their help during the process of writing this book.

On Instagram, I have established an edit team consisting of a few members with


various math backgrounds. I would like to thank those members for their helpful
feedback and suggestions for the book. The following is the list of the members of
the edit team in no particular order:
• Keyvon Rashidi
• Shunhagorn Philipp Hoehn
• Niels Thijmen Keukens

• Manny Monter

I also appreciate Andrzej Kukla, the designer of the cover of this book, for his
creative ideas for the design. Andrzej Kukla also has an Instagram page, a Twitter
account, and a YouTube channel about math called Mathinity.

183
Reviews

“This is a book for beginners in linear algebra. Duc Tran aims for simple
statements of the ideas and facts that he needs. He organizes them in a co-
herent way: first the main facts about vectors and then about matrices. For
a matrix, he discusses elimination to echelon form and then the rank and the
four fundamental subspaces. The result is a short book that a student can
read.”

Dr. Gilbert Strang


Professor of Mathematics, Massachusetts Institute of Technology (MIT),
USA;
Author of Linear Algebra for Everyone (2020)

“The book adopts a visual and largely informal approach to introduce linear
algebra. It reverses the old-school order which started from linear equations or
operation on matrices; instead, the author begins with the graphical represen-
tation of vectors, whilst the determinants and linear equations are postponed
to the last two chapters. He introduces the notion of norms at an early stage,
including the `1 or 'taxicab' norm, which is not only interesting but also in-
dispensable for advanced topics. The appendices contain rough sketches of
advanced topics such as tensors.

This book will be a nice complement to the more comprehensive textbooks


on linear algebra. It will be welcomed by students who feel frustrated by the
orthodox pedagogy to this topic.”

Dr. Wen-Wei Li
Professor of Mathematical Sciences, Peking University, China

“This introductory level textbook for basic linear algebra written by Duc Tran
is for those who wish to study linear algebra for the first time. It will be espe-
cially suitable for high school students or first year undergraduates when they
have to understand basic linear algebra but do not want to go to the heavy

184
mathematics course. The author provides plenty of graphs and graphical repre-
sentations of matrices with examples, exercises and their solutions. This book
reminds the reviewer some images like AP course, self-study, linear algebra for
biology or social science majors, and linear algebra for programmers.”
Dr. Sang Geun Han
Professor of Mathematical Sciences, Korea Advanced Institute of Science and
Technology (KAIST), South Korea

185

You might also like