Duc Tran - Basic Linear Algebra - An Introduction With An Intuitive Approach (2022)
Duc Tran - Basic Linear Algebra - An Introduction With An Intuitive Approach (2022)
Preface 2
I Vectors 3
II Matrices 59
Appendices 160
Reviews 184
About the Author
When I was in 9th grade, I came to the USA and started my high school study at
Brentwood Christian School in Austin, Texas. The classes in my high school were
more relaxing and less stressful than the classes in my middle school, so I began
to understand and enjoy the things I studied at school. After a while, I realized
that I enjoyed math the most out of all the subjects. Also, there was a math team
in my high school, and participating in the math team made me love mathematics
even more. That is how I “fell in love” with mathematics.
When I was in 11th grade, I came across @daily math , a very popular math account
on Instagram. Inspired by the @daily math page, I also created a math account
on Instagram called @dvkt math with about 27,000 followers currently. Other than
sharing mathematics knowledge by the means of a math page on Instagram, I de-
cided to also write some math books. Before writing this book, I also wrote two
other books called An Introduction to Calculus: With Hyperbolic Functions, Limi-
ts, Derivatives, and More, which was published a little before my high school gradu-
ation, and Integrals and Sums Fiesta: An Integral Part of a Math Enthusiast’s L-
ife, which was published during my first year in college.
1
Preface
Linear Algebra is usually only taught in college and not covered sufficiently or not
covered at all in high school. However, Linear Algebra is actually not very compli-
cated, and high school students can start learning it as well. This book introduces
some important basic concepts of Linear Algebra with high school Algebra and
Geometry as the only prerequisites.
However, this book is by no means only restricted to high school students. Any
learners who want to get started on Linear Algebra in an intuitive manner are wel-
comed to read this book. Sometimes, in college, Linear Algebra can be introduced
in a rigorous manner without any intuition at all. For many students who are not
ready for the rigor, it would be good to have some intuition for the subject as a
base from which they can build up the rigor.
This book is divided into two parts: the first part is about vectors, and the second
part is about matrices. Part I starts by introducing what vectors are and then
goes into some important concepts related to vectors. Similarly, part II starts by
introducing what matrices are and then goes into discussing some important basic
concepts about matrices. At the end of the book, there are appendices about some
additional topics related to some topics discussed in part I or part II.
As mentioned before, this book introduces some basic concepts of Linear Alge-
bra in an intuitive manner. So, not all topics that would usually be covered in an
introductory college-level Linear Algebra course are covered in this book, and some
topics covered in this book are not dived as deep into as they would be in introduc-
tory college-level Linear Algebra courses. If you choose to continue learning more
Linear Algebra, you could learn more about the topics covered in this book as well
as more topics in Linear Algebra.
I hope you enjoy reading this book and learning Linear Algebra!
2
Part I
Vectors
3
Chapter 1
5
1.1 Introduction to Vectors
Let us start our journey of linear algebra with an introduction to vectors. What
are vectors? Basically, a vector is a list of numbers such as
1
, (1.1)
2
3
5 , (1.2)
6
etc. The vector in (1.1) has two elements 1 and 2, and the vector in (1.2) has three
elements 3, 5, and 6. A vector with one element, such as
8 ,
is simply a number, also called a scalar. There can also be vectors with more than
three elements. A vector is usually denoted as a letter with a small arrow above,
such as ~v . For example, for the vectors in (1.1) and (1.2), we can denote them as
1
~v =
2
and
3
~u = 5 .
6
is an arrow that goes v1 units in the horizontal direction (or in the x-direction of
the xy-coordinate plane) and v2 units in the vertical direction (or in the y-direction
of the xy-coordinate plane).
The arrow goes to the right if v1 is positive, and it goes to the left if v1 is negative.
For the vertical direction, The arrow goes up if v2 is positive, and it goes down if
v2 is negative.
6
v1
v2
|v2 |
|v1 |
v1
Figure 1.1: Graphical representation of if v1 > 0 and v2 > 0
v2
v1
v2
|v2 |
|v1 |
v
Figure 1.2: Graphical representation of 1 if v1 < 0 and v2 < 0
v2
v1
v2
|v2 |
|v1 |
v
Figure 1.3: Graphical representation of 1 if v1 > 0 and v2 < 0
v2
v1
v2
|v2 |
|v1 |
v
Figure 1.4: Graphical representation of 1 if v1 < 0 and v2 > 0
v2
7
An important point to note here is that direction matters. So, a vector is an object
with a direction and a magnitude. The magnitude (or length) of vectors is discussed
in chapter 4. If the arrow starts at the origin, i.e. the point (0,0), then the vector
v1
v2
represents the point (v1 , v2 ) in the xy-plane (or in the two-dimensional space), i.e
the point with x-coordinate v1 and y-coordinate v2 . We will talk about space in
more details later. For example, the vector
1
2
y-axis
(1, 2)
2
x-axis
1 2 3
1
Figure 1.5: Vector
2
z-axis
y-axis
x-axis
8
A vector
v1
v2
v3
represents the point (v1 , v2 , v3 ) in the three-dimensional space, i.e. the point with
x-coordinate v1 , y-coordinate v2 , and z-coordinate v3 .
9
Exercises
10
Chapter 2
11
2.1 Addition and Subtraction
Now that we learned what vectors are, let us learn how to do basic arithmetic
operations to vectors. When we add two vectors together, we add them element-
by-element. That is, we add the first element (from top to bottom) of the first
vector to the first element of the second vector, the second element of the first
vector to the second element of the second vector, and so on. Note that we can
only add two vectors with the same number of elements (in the same dimension).
Let us look at a few examples.
1 0
Example 2.1: + =?
4 3
1 0 1+0 1
+ = = .
4 3 4+3 7
1 5
Example 2.2: 2 + −8 =?
3 12
1 5 1+5 6
2 + −8 = 2 + (−8) = −6 .
3 12 3 + 12 15
−3 4
5 −9
1 + 10 =?
Example 2.3:
7 −7
−3 4 (−3) + 4 1
5 −9 5 + (−9) −4
+ = = .
1 10 1 + 10 11
7 −7 7 + (−7) 0
~u + ~v = ~v + ~u
and
~u + (~v + w)
~ = (~u + ~v ) + w.
~
We can prove these properties by doing some simple algebra and using the proper-
ties of addition of numbers. Let
u1
u2
~u = . ,
..
un
12
v1
v2
~v = . ,
..
vn
and
w1
w2
w
~ = . .
..
wn
Then, we have
u1 v1 u1 + v 1 v1 + u1 v1 u1
u2 v2 u2 + v2 v2 + u2 v2 u2
~u + ~v = . + . = = = .. + .. = ~v + ~u
.. ..
.. .. . . . .
un vn un + vn vn + un vn un
and
u1 v1 w1 u1 + (v1 + w1 )
u2 v2 w2 u2 + (v2 + w2 )
~u + (~v + w)
~ = . + . + . =
..
.. .. ..
.
un vn wn un + (vn + wn )
(u1 + v1 ) + w1 u1 v1 w1
(u2 + v2 ) + w2 u2 v2 w2
= = .. + .. + .. = (~u + ~v ) + w.
~
..
. . . .
(un + vn ) + wn un vn wn
13
2.2 Scalar Multiplication
−2 3 · (−2) −6
3· = = .
5 3·5 15
1
3
−2 =?
Example 2.7: −4 ·
5
0
1 (−4) · 1 −4
3 (−4) · 3 −12
−2 = (−4) · (−2) = 8 .
−4 ·
5 (−4) · 5 −20
0 (−4) · 0 0
a(~u + ~v ) = a · ~u + a · ~v .
and
v1
v2
~v = . .
..
vn
14
Then, we have
u1 v1 u1 + v 1 a(u1 + v1 )
u2 v2 u2 + v2 a(u2 + v2 )
a(~u + ~v ) = a . + . = a · =
.. ..
.. ..
. .
un vn un + vn a(un + vn )
a · u1 + a · v1 u1 v1
a · u2 + a · v2 u2 v2
= = a · .. + a · .. = a · ~u + a · ~v .
..
. . .
a · un + a · vn un vn
~v
~u + ~v
~u
and
v
~v = 1 ,
v2
we have
u1 + v1
~u + ~v = .
u2 + v2
Assuming u1 , u2 , v1 , and v2 are all positive, graphing the vectors ~u, ~v , and ~u + ~v
would give us the following picture.
15
~v v2
~u + ~v u2 + v2
u2 v1
~u
u1
u1 + v1
We can see that the components of ~u + ~v are indeed u1 + v1 and u2 + v2 . From the
graphical representation of vector addition, we can derive the graphical representa-
tion of vector subtraction ~u − ~v . Since ~v + (~u − ~v ) = ~u, the graphical representation
of vector subtraction ~u − ~v is as follows.
~u − ~v
~u
~v
a · ~v
~v
16
~v
a · ~v
then we have
a · v1
a · ~v = .
a · v2
Assuming v1 and v2 are positive and a > 1, graphing the vectors ~v and a · ~v would
give us the following picture.
a · ~v
a · v2
B
~v
v2
A
v1 C E
a · v1
We can see that 4ABC is similar to 4ADE. So, indeed to stretch a vector, we
need to stretch each component of the vector, v1 and v2 in this case.
If a < 0, the scalar multiplication a · ~v would flip the vector ~v to the opposite
direction and then stretch or shrink it depending on whether |a| > 1 or |a| < 1.
17
~v
a · ~v
18
Exercises
−4 −1
1. 0 + 5 =?
6 −9
−3 −5
5 4
9 − 10 =?
2.
−7 −2
8
3. 2 · 15 =?
−6
1
−3
4
4. −4 ·
0 =?
2
−2
1 0
5. 3 · +5· =?
2 1
5 15
6. 6 · 2 − 2 · 10 =?
0 −3
7. For any scalars a and b and vector ~v , show that (a + b) · ~v = a~v + b~v .
19
Chapter 3
20
3.1 Linear Combination and Span of Vectors
Let us talk about linear combination of vectors applying what we learned in the
last chapter. What is a linear combination of vectors? A linear combination of
vectors is a sum of scalar multiples of vectors. For example, let us look back at
questions 7 and 8 in the exercises of the last chapter:
1 0
3· +5·
2 1
1 0
is a linear combination of and , and
2 1
5 15
6 · 2 − 2 · 10
0 −3
5 15
is a linear combination of 2 and 10 because
0 −3
5 15 5 15
6 · 2 − 2 · 10 = 6 · 2 + (−2) · 10 .
0 −3 0 −3
for some scalars c1 , c2 , · · · , cn . If there is only one vector ~v , then a linear combi-
nation of ~v is just a scalar multiple of ~v : c~v .
is the set of all possible linear combinations of ~v1 , ~v2 , · · · , ~vn . If a vector ~u can be
written as a linear combination of ~v1 , ~v2 , · · · , ~vn , then we write
21
10 6 4 0
Example 3.2: Write 7 as a linear combination of 3, 5 , and 1.
−2 8 13 0
10 6 4 0
7 = 3 3 − 2 5 + 8 1 .
−2 8 13 0
12 3 2
Example 3.3: Verify that −5 ∈ span −4 , 1 .
9 0 3
12 3 2
−5 = 2 −4 + 3 1 .
9 0 3
12 3 2
That means −5 can be written as a linear combination of −4 and 1, so
9 0 3
12 3 2
−5 ∈ span −4 , 1 .
9 0 3
For example, three vectors ~v1 , ~v2 , and ~v3 are linearly independent if
22
1 0
Example 3.4: Are and linearly independent or dependent?
0 1
1 0 0
We can see that is not a scalar multiple of , which means that is
0 1 1
1
not a scalar multiple of , either. So,
0
1 0
6∈ span
0 1
and
0 1
6∈ span .
1 0
1 0
Therefore, and are linearly independent.
0 1
5 4 3
Example 3.5: Are , , and linearly independent or dependent?
−6 1 8
From example 3.1, we know that
5 4 3
∈ span , .
−6 1 8
5 4 3
Therefore, , , and are linearly dependent.
−6 1 8
6 4 0
Example 3.6: Are 3, 5 , and 1 linearly independent or dependent?
8 13 0
By doing some additions and scalar multiplications for those vectors, we can see
that none of them can be written as a linear combination of other two vectors. So,
6 4 0
3 6∈ span 5 , 1 ,
8 13 0
4 6 0
5 6∈ span 3 , 1 ,
13 8 0
and
0 6 4
1 6∈ span 3 , 5 .
0 8 13
6 4 0
Therefore, 3, 5 , and 1 are linearly independent.
8 13 0
23
12 3 2
Example 3.7: Are −5, −4, and 1 linearly independent or dependent?
9 0 3
From example 3.3, we know that
12 3 2
−5 ∈ span −4 , 1 .
9 0 3
12 3 2
Therefore, −5, −4, and 1 are linearly dependent.
9 0 3
Before going to the next section, let us talk about the formal definition of linear
independence: if the only solution to
Well, how is this definition related to the definition given at the beginning of this
section? If all of the c1 , c2 , · · · , cn are 0, then there is no way to rearrange the
expression to write one of the vectors as a linear combination of other vectors. If
at least one of the c1 , c2 , · · · , cn is non-zero, then we can rearrange the expression
to write one of the vectors as a linear combination of other vectors. For example,
assuming that c1 6= 0, we can rearrange the expression as
c2 c3 cn
~v1 = − ~v2 − ~v3 − · · · − ~vn .
c1 c1 c1
What is a vector space? For the introductory purpose of this book, we only need
to know that a vector space is a set of vectors closed under addition and scalar
multiplication. What does “closed” here mean? That means if ~u and ~v are in a
vector space, then ~u + ~v and c~u are also in the same vector space for any scalar c.
The word “closed” here means that addition and scalar multiplication do not make
the vectors go out of the vector space.
There are a few other axioms (or conditions) for a set of vectors to be a vector
space, but they are not really important for our introductory purpose now. For
example, one of the other axioms is that the vectors in a vector space must satisfy
the commutative property, ~u +~v = ~v + ~u, but we already know that vector addition
satisfies the commutative property from chapter 2. The other axioms are more
important when we consider vector spaces in a broader sense. However, in this
24
book we are only considering vector spaces of real geometric vectors with finitely
many elements which are real numbers.
An important point to note is that a vector space must have the 0 vector. A
0 vector is a vector whose elements are all 0. For example, the three-dimensional
0 vector is
0
0 .
0
When we do scalar multiplication, the scalar c can be 0. So, if ~v is in a vector
space, then
c~v = 0~v = 0
must also be in the same vector space.
Now why is the two-dimensional space a vector space? If it was not a vector
space, we would not call it “two-dimensional space.” Remember from chapter 1
that every point (x, y) can be represented by a vector
x
.
y
25
one-dimensional subspaces (lines in R3 ).
It should be easy to see that vectors lying on the same line are closed under addi-
tion and scalar multiplication if you think of the graphical representations of vector
addition and vector multiplication by a scalar. Similarly for planes in R3 , if we add
two vectors lying on the same plane together or multiply a vector lying on a plane
by a scalar, the result will be another vector lying on that same plane.
z-axis
0
0
0
y-axis
x-axis
Remember that a subspace is also a vector space. So, for the lines and planes to be
subspaces, they have to contain the origin, or the 0 vector. The 0 vector by itself
can also be a subspace or a vector space with zero dimension, but it is not a very
interesting vector space.
The vectors ~v1 , ~v2 , · · · , ~vn are called basis vectors of V . We will discuss why basis
vectors have to be linearly independent later in this section. Well, how does this
align with the definition of a vector space? How is the span of vectors a set of
vectors closed under addition and scalar multiplication? We know that the span of
~v1 , ~v2 , · · · , ~vn is the set of vectors of the form
26
and
~v = c01~v1 + c02~v2 + · · · + c0n~vn .
Then, we have
and
So, V = span {~v1 , ~v2 , · · · , ~vn } is indeed a vector space closed under addition and
scalar multiplication.
Next, let us discuss how a vector space can be written as the span of a certain
set of vectors. Let us consider the two-dimensional space R2 first. Note that any
two-dimensional vector
x
y
1 0
in R2 can be written as a linear combination of and :
0 1
x 1 0
=x +y .
y 0 1
So,
1 0
R2 = span , .
0 1
However,
1 0
,
0 1
is nottheonly set of basis vectors for R2 . For example, from
example
3.1, we know
5 4 3
that can be written as a linear combination of and . In fact, any
−6 1 8
4 3
two-dimensional vector can be written as a linear combination of and :
1 8
x 8x − 3y 4 4y − x 3
= + .
y 29 1 29 8
27
and
4 3
,
1 8
is another possible set of basis vectors for R2 . What is the common characteristic
of the two sets
1 0
,
0 1
and
4 3
, ?
1 8
Both of these sets have two linearly independent vectors. In general, any set of two
linearly independent two-dimensional vectors can be a set of basis vectors for the
two-dimensional space R2 .
Note that any vector in R3 can be written as a linear combination of these three
vectors:
x 1 0 0
y = x 0 + y 1 + z 0 .
z 0 0 1
It is important to note that there are only at most n linearly independent n-
dimensional vectors. Since any n-dimensional vector can be written as a linear
combination of n linearly independent n-dimensional vectors, if we have n + 1 or
more n-dimensional vectors, at least one of them must be able to be written as
a linear combination of the other n vectors, so they are linearly dependent. For
example, there are at most three linearly independent vectors in R3 ; there cannot
be 4 or 5 or more linearly independent vectors.
What about the basis vectors for subspaces? Let us consider the one-dimensional
subspaces in R2 , which are lines in R2 . Let us think of the graphical representa-
tion of scalar multiplication of vectors. When we multiply a vector by a scalar, we
stretch or shrink the vector and possibly flip it to the opposite direction. So, all
the scalar multiples of a vector lie on the same line with the vector. That means a
line in R2 is the span of a two-dimensional vector lying on that line.
28
y-axis
2
2
1 1
x-axis
−4 −3 −2 −1 1 2 3 4
−1
−2
For example,
2
span
1
Let us also consider the two-dimensional subspaces, or planes, in R3 . What are the
basis vectors of a two-dimensional subspace in R3 ? Let us think back to the basis
vectors of R2 for a moment because R2 is basically a plane in R3 at z = 0. A set of
basis vectors for R2 is any set of two linearly independent two-dimensional vectors.
Similarly, a plane in R3 , not necessarily the same as R2 , is the span of any two
linearly independent three-dimensional vectors lying on that plane. Note that the
basis vectors are three-dimensional because the subspace is in R3 . For example,
3 2
span −4 , 1
0 3
3 2
is the plane in R3 with the vectors −4 and 1 lying on it, which is a two-
0 3
dimensional subspace in R3 .
An important point about basis vectors is that basis vectors have to be linearly
independent. For example, as mentioned before, a plane, or a two-dimensional
29
subspace, in R3 is the span of any two linearly independent three-dimensional
vectors lying on that plane. So, why do the basis vectors have to be linearly inde-
pendent?
Let us consider the span of two linearly dependent vectors ~v1 and ~v2 . Since ~v1
and ~v2 are linearly dependent, let ~v2 = c~v1 for some scalar c. If ~u ∈ span {~v1 , ~v2 },
then ~u is a vector of the form
Thus, if ~v1 and ~v2 are linearly dependent, then span {~v1 , ~v2 } is the same as span {~v1 },
which is a one-dimensional vector space instead of a two-dimensional vector space.
That is why basis vectors have to be linearly independent.
If this section was too complicated and too wordy for you, the main points of
this section are summarized below. Although understanding more detailed expla-
nations is important, you should at least memorize these facts to familiarize yourself
with the topic.
30
Exercises
−5 1 4 0
1. Write 0 as a linear combination of 4, 6, and 0.
8 3 2 1
1
5 2
1
∈ span , 0 .
1
2. Verify that
0 −2 −1
−2 4 3
5 −2 6 −1
3. Are 1, 3 , 0, and −3 linearly independent or dependent?
4 8 9 7
−2 −3
6. Is , a set of basis vectors for R2 ? Why or why not?
6 9
31
8. What is the dimension of the vector space
1 −1 0
2 , , 2 ?
0
span 3 5 8
5 1 6
32
Chapter 4
Norms of Vectors
33
4.1 Definition and Properties of Norms
There are many vector norms. By definition, for p ≥ 1, the `p norm of a vector
v1
v2
~v = .
..
vn
is defined as p
p
||~v ||p = |v1 |p + |v2 |p + · · · + |vn |p .
For example, we have
3
Example 4.2: ~v = , ||~v ||2 =?
4
p
||~v ||2 = |3|2 + |4|2 = 5.
Let us look at the first property. Recall that a 0 vector is a vector whose ele-
ments are all 0. If at least one element, say vi , is non-zero, then the norm ||~v ||p is
larger than 0 because |vi | > 0 for any non-zero vi .
34
y-axis
3
x-axis
−3 −2 −1 1 2 3
−1
−2
−3
is when
|v1 |p = |v2 |p = · · · = |vn |p = 0,
which is when v1 = v2 = · · · = vn = 0. Remember that “if and only if” is a two-way
implication. “||~v || = 0 if and only if ~v = 0” means “||~v || = 0 if ~v = 0” and “~v = 0
if ||~v || = 0.”
Next, let us look at the second property. Since for any numbers a and b we have
|ab| = |a| · |b|,
p
||k~v ||p = p |kv1 |p + |kv2 |p + · · · + |kvn |p
p
= p |k|p |v1 |p + |k|p |v2 |p + · · · + |k|p |vn |p
p p
= p |k|p · p |v1 |p + |v2 |p + · · · + |vn |p
= |k| · ||~v ||p .
The third property is called the triangle inequality, and it is an extension of the
triangle inequality for numbers
35
Squaring the right hand side of (4.1) gives
2
(|x| + |y|) = |x|2 + |y|2 + 2|x||y| = x2 + y 2 + 2|x||y|.
When x and y have the same sign (both positive or both negative) or one of them
is 0,
2xy = 2|x||y|.
When x and y have different signs,
2xy < 2|x||y|.
Thus, we have
2xy ≤ 2|x||y|
⇔ x2 + y 2 + 2xy ≤ x2 + y 2 + 2|x||y|
2
⇔ |x + y|2 ≤ (|x| + |y|) .
y-axis
x-axis
−2 −1 1 2
By observing the graph of the function x2 , we can see that for two numbers a, b ≥ 0,
if a2 ≥ b2 , then a ≥ b. Since
|x + y| ≥ 0,
|x| + |y| ≥ 0,
and
2
|x + y|2 ≤ (|x| + |y|) ,
we can conclude that
|x + y| ≤ |x| + |y|.
1
The triangle inequality for ` norm follows directly from the triangle inequality for
numbers. Let
u1
u2
~u = . ,
..
un
36
then we have
||~u + ~v ||1 = |u1 + v1 | + |u2 + v2 | + · · · + |un + vn |
≤ |u1 | + |v1 | + |u2 | + |v2 | + · · · + |un | + |vn |
= ||~u||1 + ||~v ||1 .
The triangle inequality for `2 norm can be proven by using the triangle inequality
for numbers and an inequality called Cauchy-Schwarz inequality. Using the triangle
inequality for numbers, we get
p
||~u + ~v ||2 = |u1 + v1 |2 + |u2 + v2 |2 + · · · + |un + vn |2
q
2 2 2
≤ (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) . (4.2)
√ √
The inequality in (4.2) follows
√ from the fact that a ≥ b if a ≥ b and a, b ≥ 0, as
observed in the graph of x.
y-axis
x-axis
1 2 3 4
√
Figure 4.3: Graph of function x
By definition of `2 norm,
p p
||~u||2 + ||~v ||2 = |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 .
Squaring both sides,
p 2
2
p
(||~u||2 + ||~v ||2 ) = |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2
p 2 p 2
= |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2
p p
+ 2 |u1 |2 + |u2 |2 + · · · + |un |2 · |v1 |2 + |v2 |2 + · · · + |vn |2
37
The proof of Cauchy-Schwarz inequality is in an appendix for those who are inter-
ested. Using the Cauchy-Schwarz inequality, we have
2
≥ (|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |) .
because
|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn | ≥ 0.
Remember that the square root √ always takes non-negative
p values. √
If a number a is
negative, then we would have a2 = |a|. For example, (−2)2 = 4 = 2 = | − 2|.
Substituting (4.4) into (4.3),
2
(||~u||2 + ||~v ||2 ) ≥|u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2
+ 2|u1 ||v1 | + 2|u2 ||v2 | + · · · + 2|un ||vn |
2 2 2
= (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) .
38
So, the distance between a point (x, y) and the origin (0, 0) is
p p
(x − 0)2 + (y − 0)2 = x2 + y 2 .
Hence, for a two-dimensional vector
v
~v = 1 ,
v2
the `2 norm p q
||~v ||2 = |v1 |2 + |v2 |2 = v12 + v22
is the distance between the point (v1 , v2 ) and the origin (0, 0). Recall from chapter
1 that the vector ~v is a vector going from (0, 0) to (v1 , v2 ). So, the distance between
(0, 0) and (v1 , v2 ) is the length of the vector ~v , and thus the `2 norm of a vector is
the length of that vector. This is also a direct consequence from the Pythagoras’
theorem.
(v1 , v2 )
p
||v||2 = v12 + v22
v2
(0, 0) v1
39
is the length of ~v in Rn .
Now let us take a look at the triangle inequality again. In geometry, we learned the
triangle inequality which states that the sum of lengths of two sides of a triangle
is greater than the length of the third side.
C
A
This is actually exactly the same as the triangle inequality for `2 norm. Recall
from chapter 2 that the graphical representation of vector addition is as follows.
~v
~u
~u + ~v
From what we learned about the graphical meaning of `2 norm, we know that ~u
has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 . The triangle
inequality for `2 norm states that
40
which means all three vectors ~u, ~v , and ~u + ~v lie on the same line. When they do
not lie on the same line and form a triangle like in figure 4.6, the triangle inequality
for `2 norm would give us
What are unit vectors? Well, as the word “unit” suggests, a unit vector is a vector
with length one. A unit vector is usually denoted as a letter with a small hat above,
such as v̂. Using what we learned about `2 norm, if v̂ is a unit vector, then ||v̂||2 = 1.
Let us look at a few examples of unit vectors, especially the important ones re-
lated to basis of vector spaces. In the last chapter, we learned that a possible set
of basis vectors for R2 is
1 0
, .
0 1
There is something special about these vectors: they are unit vectors because
1
= 0 = 1.
0 1
2 2
These basis vectors also represent the points (1, 0) and (0, 1) on the x-axis and
the y-axis, respectively. These basis vectors are so special that they have special
notations to denote them:
1
ı̂ =
0
and
0
̂ = .
1
A set of basis vectors consisting of unit vectors representing points on the coordinate
axes like this is called the standard basis. Similarly to R2 , the standard basis of
R3 is
1 0 0
0 , 1 , 0 .
0 0 1
41
Note that these are unit vectors because
1 0 0
0 = 1 = 0 = 1.
0 0 1
2 2 2
We can factor out ||~v1||2 , which is a scalar, from the norm by using the second prop-
erty of norms. Let us look at a few examples.
1
Example 4.3: Normalize the vector ~u = .
−2
The length of ~u is p √
||~u||2 = 12 + (−2)2 = 5.
The normalized vector û is
√
~u 1 1 1/ √5
û = =√ = .
||~u||2 5 −2 −2/ 5
3
Example 4.4: Normalize the vector ~v = 0.
4
42
The length of ~v is p
||~v ||2 = 32 + 02 + 42 = 5.
The normalized vector v̂ is
3 3/5
~v 1
v̂ = = 0 = 0 .
||~v ||2 5
4 4/5
The `1 norm represents the distance in Taxicab geometry, one of the non-Euclidean
geometries. The distance in Taxicab geometry is called “L1 distance.” The distance
represented by the `2 norm, which we discussed in the last section, is the distance
in Euclidean geometry, which is the type of geometry that we learn in high school
geometry. Now what is the difference between L1 distance and Euclidean distance?
q(x2 , y2 )
p(x1 , y1 )
In Euclidean geometry, the Euclidean distance between two points p(x1 , y1 ) and
q(x2 , y2 ) is the length of the line segment connecting those two points, and we have
the familiar formula
q
2 2
d(p, q) = (x2 − x1 ) + (y2 − y1 ) ,
where d(p, q) denotes the Euclidean distance between p and q. In Euclidean geom-
etry, there is only one unique path for the distance between two points.
43
q(x2 , y2 )
p(x1 , y1 )
In Taxicab geometry, the L1 distance between two points p(x1 , y1 ) and q(x2 , y2 ) is
the shortest distance from p to q (or q to p) moving only horizontally and verti-
cally. If you play chess, this is like how the rooks in chess move. Unlike Euclidean
distance, L1 distance has many possible paths, as shown in figure 4.8.
q(x2 , y2 )
|y2 − y1 |
p(x1 , y1 )
|x2 − x1 |
Since we are moving horizontally and vertically, the formula for L1 distance be-
tween p(x1 , y1 ) and q(x2 , y2 ) can be obtained by adding the absolute differences of
their coordinates:
d1 (p, q) = |x2 − x1 | + |y2 − y1 |,
where d1 (p, q) denotes the L1 distance between p and q. Thus, for a vector
v
~v = 1 ,
v2
the `1 norm
||~v ||1 = |v1 | + |v2 |
is the L1 distance between the point (v1 , v2 ) and the origin (0, 0).
In higher dimensions, the L1 distance is the shortest distance when moving only
44
in directions along the coordinate axes. For example, in three dimensions, the L1
distance is the shortest distance when moving only in x-direction, y-direction, and
z-direction, and we have the formula
for the L1 distance between p(x1 , y1 , z1 ) and q(x2 , y2 , z2 ). So, for a three-dimensional
vector
v1
~v = v2 ,
v3
the `1 norm
||~v ||1 = |v1 | + |v2 | + |v3 |
1
is the L distance between the point (v1 , v2 , v3 ) and the origin (0, 0, 0).
There is another Taxicab geometry fun fact in an appendix. You should check
it out if you find Taxicab geometry interesting.
45
Exercises
2
1. ~u = −3. ||~u||1 =?
5
5
2. ~v = . ||~v ||2 =?
12
1
~ = −2.
3. Normalize the vector w
−1
1 2
4. Let ~u = −3 and ~v = 4.
0 1
a. Find ||~u||2 , ||~v ||2 , and ||~u + ~v ||2 .
5. Express the distance∗ of the point (12, −9) from the origin as a vector norm.
6. Express the distance of the point (7, −1, 10) from the origin as a vector norm.
7. Imagine a rook moving from the origin to the point (−15, 9) on the xy-plane
and assume that each grid is 1 cm by 1 cm.
∗ The word “distance” in this book is assumed to be Euclidean distance unless implied by
46
Chapter 5
47
5.1 Transpose of Vectors
Before discussing dot product, let us talk about the transpose of vectors briefly. It
is a very simple concept. When we take the transpose of a vector, we rotate the
vector so that it becomes horizontal: a vertical vector becomes a horizontal vector.
The transpose of a vector ~v is denoted as ~v T . Let
v1
v2
~v = . ,
..
vn
then we have
~v T = v1
v2 ··· vn .
Let us look at a few examples.
−2
Example 5.1: ~u = , ~u T =?
7
~u T = −2
7 .
5
Example 5.2: ~v = 0 , ~v T =?
−11
~v T = 5
0 −11 .
Transpose of vectors does not have a geometrical meaning in particular. A vector ~v
and its transpose ~v T both represent the same point; it is just that ~v T is a horizontal
vector (or a row vector) while ~v is a vertical vector (or a column vector).
48
1 0
Example 5.4: · =?
0 1
1 0
· = 1 · 0 + 0 · 1 = 0.
0 1
3 2
Example 5.5: −5 · 4 =?
6 3
3 2
−5 · 4 = 3 · 2 + (−5) · 4 + 6 · 3 = 4.
6 3
−1 −1
Example 5.6: 2 · −3 =?
1 5
−1 −1
2 · −3 = (−1) · (−1) + 2 · (−3) + 1 · 5 = 0.
1 5
The dot product of vectors can also be written as multiplication of a row vector
and a column vector:
~u · ~v = ~u T ~v .
For example,
2 −1 −1
· = 2 5 .
5 4 4
This fact is important when we learn the multiplication of matrices in part II.
Next, let us discuss the properties that the dot product satisfies. Like the mul-
tiplication of numbers, the dot product satisfies the commutative property,
~u · ~v = ~v · ~u,
~u · (~v + w)
~ = ~u · ~v + ~u · w.
~
Let
u1
u2
~u = . ,
..
un
v1
v2
~v = . ,
..
vn
49
and
w1
w2
~ = . .
w
..
wn
Then, using the commutative property and the distributive property of multiplica-
tion of numbers, we have
u1 v1
u2 v2
~u · ~v = . · . = u1 v1 + u2 v2 + · · · + un vn
.. ..
un vn
v1 u1
v2 u2
= v1 u1 + v2 u2 + · · · + vn un = . · . = ~v · ~u
.. ..
vn un
and
u1 v1 w1 u1 v1 + w1
u2 v2 w2 u2 v2 + w2
~ = . · . + . = . ·
~u · (~v + w)
..
.. .. .. ..
.
un vn wn un vn + wn
= u1 v1 + u1 w1 + u2 v2 + u2 w2 + · · · + un vn + un wn
u1 v1 u1 w1
u2 v2 u2 w2
= . · . + . · . = ~u · ~v + ~u · w.
~
.. .. .. ..
un vn un wn
The dot product also satisfies associative property with a scalar. For any vectors
~u and ~v and any scalar a,
(a~u) · ~v = a(~u · ~v ).
50
Using the distributive property of multiplication of numbers,
u1 v1 a u1 v1
u2 v2 a u2 v2
(a~u) · ~v = a . · . = . · . = a u1 v1 + a u2 v2 + · · · + a un vn
.. .. .. ..
un vn a un vn
u1 v1
u2 v2
= a (u1 v1 + u2 v2 + · · · + un vn ) = a . · . = a(~u · ~v ).
.. ..
un vn
There is an interesting relation between the `2 norm and the dot product:
2
~v · ~v = (||~v ||2 )
because
v1 v1
v 2 v2
~v · ~v = . · .
.. ..
vn vn
= v12
+ + · · · + vn2
v22
q 2
= 2 2 2
v1 + v2 + · · · + vn
2
= (||~v ||2 ) .
The dot product ~u · ~v is the length of the vector obtained after projecting ~u on
to ~v and then multiplying by the length of ~v . Also, we know that the dot product
is commutative, so the order does not matter. We can also project ~v on to ~u and
then multiply by the length of ~u.
Let us look at the projection part first. Remember that the vector ~u has length
||~u||2 . By using the trigonometric ratio in right triangles, we get that the length of
projection of ~u on to ~v is ||~u||2 cos θ, where θ is the smaller angle between ~u and ~v .
51
~u
||~u||2
θ
~v
||~u||2 cos θ
Then, after the projection, we need to multiply by the length of ~v , which is ||~v ||2 .
So, we have the formula
~u · ~v = ||~u||2 ||~v ||2 cos θ.
We can also use this formula to find the measure of the smaller angle between two
vectors. Rearranging the formula above, we have
~u · ~v
θ = cos−1 .
||~u||2 ||~v ||2
Let us look at a few examples.
−1 2
Example 5.7: Find the measure of the smaller angle between and .
3 4
First, we need to find the dot product of the two vectors and the length of each
vector:
−1 2
· = (−1) · 2 + 3 · 4 = 10,
3 4
√
−1 p
2 2
3 = (−1) + 3 = 10,
2
and
√
2 p
2 2
4 = 2 + 4 = 2 5.
2
Therefore, the measure of the smaller angle θ between those two vectors is
10 1
θ = cos−1 √ √ = cos−1 √ = 45◦ .
10 · 2 5 2
1 0
Example 5.8: Find the measure of the smaller angle between 2 and 1 .
−1 −1
First, we need to find the dot product of the two vectors and the length of each
vector:
1 0
2 · 1 = 1 · 0 + 2 · 1 + (−1) · (−1) = 3,
−1 −1
52
1
p √
2 = 12 + 22 + (−1)2 = 6,
−1
2
and
0
p √
1 = 02 + 12 + (−1)2 = 2.
−1
2
Therefore, the measure of the smaller angle θ between those two vectors is
√ !
3 3
θ = cos−1 √ √ = cos−1 = 30◦ .
6· 2 2
√
0 3
Example 5.9: Find the measure of the smaller angle between and .
2 1
First, we need to find the dot product of the two vectors and the length of each
vector:
√
√
0 3
· = 0 · 3 + 2 · 1 = 2,
2 1
0 p
2 2
2 = 0 + 2 = 2,
2
and
√ 2
√ q
3
=
1 3 + 12 = 2.
2
Therefore, the measure of the smaller angle θ between those two vectors is
2 1
θ = cos−1 = cos−1 = 60◦ .
2·2 2
Note that when ~u = ~v , the angle θ between them is 0 degree, and we have
2
~v · ~v = ||~v ||2 ||~v ||2 cos (0◦ ) = (||~v ||2 )
53
vectors here, the definition of orthogonality of vectors discussed here still applies
to non-geometric vectors in general.
We can determine whether two non-zero vectors are orthogonal by using the dot
product. If the dot product of two non-zero vectors is equal to 0, then those two
non-zero vectors are orthogonal. Let us see why this is true.
There are two ways to see why this is true. First, we can use the formula ob-
tained in the last section:
When two vectors ~u and ~v are orthogonal, the angle θ between them is 90◦ . So,
~u + ~v
~v
~u
Another way is by using the Pythagoras’ theorem. When two vectors ~u and ~v are
orthogonal, we have the graph for the vector addition ~u + ~v as in figure 5.2.
Recall that ~u has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 .
By Pythagoras’ theorem, we have
2 2 2
(||~u||2 ) + (||~v ||2 ) = (||~u + ~v ||2 ) .
Using the fact that the dot product of a vector with itself is the square of the `2
2
norm of that vector, i.e. ~v · ~v = (||~v ||2 ) , we have
~u · ~u + ~v · ~v = (~u + ~v ) · (~u + ~v ).
Thus, we get
~u · ~u + ~v · ~v = ~u · ~u + ~v · ~v + 2~u · ~v .
54
Subtracting ~u · ~u + ~v · ~v from both sides, we obtain
2~u · ~v = 0
~u · ~v = 0
Let us look at a few examples of orthogonal vectors. From example 5.4, we know
that
1 0
· = 0,
0 1
1 0
so and are orthogonal. From example 5.6, we know that
0 1
−1 −1
2 · −3 = 0,
1 5
−1 −1
so 2 and −3 are orthogonal.
1 5
The concept of orthogonal vectors can be extended to more than two vectors.
When there are more than two vectors, the vectors are orthogonal if each vector is
orthogonal to each of the other vectors. For example, let us look at the standard
basis vectors for R3 :
1 0 0
0 , 1 , 0 .
0 0 1
In the last chapter, we learned that the standard basis vectors are unit vectors.
Here, we saw that the standard basis vectors are orthogonal as well. Vectors that
are both unit and orthogonal like these are called orthonormal vectors. If we have a
set of orthogonal vectors, we can always obtain orthonormal vectors by normalizing
each vector. For example, we know that
−1
2
1
and
−1
−3
5
55
are orthogonal, so we can obtain orthonormal vectors by normalizing
−1
2
1
and
−1
−3 .
5
56
Exercises
2
5 T
0 , ~v =?
1. ~v =
−8
1 9
3. −3 · 7 =?
−4 −6
8
4. −1 6 =?
3
1
5. 9 −2 4 −5 =?
−7
1 −1
6. Find the measure of the smaller angle between and .
3 7
−2 3
7. Find the measure of the smaller angle between 1 and −5.
5 4
√ √
1/√2 −1/√ 2
8. Are and orthogonal? If yes, are they orthonormal?
1/ 2 1/ 2
1 −1 1
9. Are 4, −2, and −1 orthogonal? If yes, are they orthonormal?
3 3 1
57
1 −1 1 1
−1 −1
, and −1 orthogonal? If yes, are they orthonor-
1
10. Are
, ,
−1
1 1 1
1 1 1 −1
mal?
58
Part II
Matrices
59
Chapter 6
61
6.1 Introduction to Matrices
In chapter 1, we learned that a vector is a list of numbers. A matrix is bigger; it is
a block of numbers such as
−1 2
, (6.1)
3 −4
−3 0 5
, (6.2)
7 −2 6
−9 8 −6
1 0 7 , (6.3)
−5 −1 10
etc. A matrix is usually denoted as an uppercase letter:
−1 2
A= .
3 −4
We can also think of a matrix as a list of column (vertical) vectors or a list of row
(horizontal) vectors. For example,
| | |
−3 0 5
= ~v1 ~v2 ~v3
7 −2 6
| | |
where
−3
~v1 = ,
7
0
~v2 = ,
−2
5
~v3 = ,
6
and
~u1T
|
−9 8 −6
1 0 7 = ~u2T
|
−5 −1 10
~u3T
|
where
−9
~u1 = 8 ,
−6
1
~u2 = 0 ,
7
−5
~u3 = −1 .
10
62
Similarly to vectors, matrices also have dimensions. We describe the dimensions of
a matrix like how we describe the dimensions of paper. A m × n matrix is a matrix
with m rows and n columns. For example, the matrix in (6.2) is a 2 × 3 matrix.
When m = n, a n × n matrix is a square matrix, obviously because it is a square.
For example, the matrix in (6.1) is a 2 × 2 square matrix, and the matrix in (6.3)
is a 3 × 3 square matrix.
Remember that we always put the row number before the column number, whether
they are dimensions of matrices or subscripts of matrix elements.
First, let us talk about the main diagonal of a square matrix. The main diago-
nal of a matrix is the diagonal segment from the top left corner to the bottom right
corner of the matrix. In other words, the main diagonal of a matrix A contains
the elements aij such that the row number i is the same as the column number j
(i = j). For example, the main diagonal of a 4 × 4 matrix
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
Now what is the trace of a matrix? The trace of a matrix A, denoted as Tr(A), is
the sum of all the elements on the main diagonal of A. For example,
a11 a12 a13
Tr a21 a22 a23 = a11 + a22 + a33 .
a31 a32 a33
63
In general, for a n × n square matrix
a11 a12 ··· a1n
a21 a22 ··· a2n
A= . .. ,
.. ..
.. . . .
an1 an2 ··· ann
the main diagonal of A contains the elements a11 , a22 , · · · , ann , and
Next, let us discuss the transpose of a matrix. This concept also applies to non-
square matrices. In chapter 5, we learned the transpose of vectors, which are
basically matrices with only one column. When we take the transpose of a matrix,
we take the transpose of each column of that matrix. So, column 1 of A becomes
row 1 of AT , column 2 of A becomes row 2 of AT , and so on. Also, note that
as columns of A become rows of AT , the rows of A also become columns of AT .
So, taking the transpose of a matrix is basically flipping columns and rows of that
matrix.
For example, if
a11 a12
a21 a22
A=
a31
,
a32
a41 a42
then
a11 a21 a31 a41
AT = .
a12 a22 a32 a42
If
b b12 b13
B = 11 ,
b21 b22 b23
then
b11 b21
B T = b12 b22 .
b13 b23
So, in general, the transpose of a m × n matrix is a n × m matrix. The matrix A
above is a 4 × 2 matrix, and AT is a 2 × 4 matrix; B is a 2 × 3 matrix, and B T is
a 3 × 2 matrix.
64
then
c11 c21 c31
C T = c12 c22 c32 .
c13 c23 c33
So, we can see that when we take the transpose of a square matrix, we are basically
flipping the elements across the main diagonal. Perhaps, it would be easier to see
with numbers than with symbols: if
1 −3 6
M = −2 5 0 ,
9 −1 7
then
1 −2 9
MT = −3 5 −1 .
6 0 7
Notice how the main diagonal stays the same and the other numbers are reflecting
across the main diagonal.
Triangular Matrices
There are two types of triangular matrices: lower-triangular matrices and upper-
triangular matrices.
and
−2 0 0 0
5 8 0 0
0 −1 0 0
7 0 9 −6
are lower-triangular matrices. The elements on the main diagonal and below are
not necessarily non-zero.
65
In general, a lower-triangular matrix is of the form
l11 0 ··· 0
l21 l22 ··· 0
L= . .
.. .. ..
.. . . .
ln1 ln2 ··· lnn
Note that lij = 0 whenever i < j.
Symmetric Matrices
A symmetric matrix is a matrix whose elements above the main diagonal are the
same as the elements below the main diagonal. So, the elements of the matrix are
symmetric across the main diagonal. For example,
1 −3
,
−3 5
4 3 −2
3 0 7 ,
−2 7 −1
and
0 1 2 3
1 4 5 6
2 5 7 8
3 6 8 9
66
are symmetric matrices. In general, if we denote the elements of a symmetric matrix
S as sij , then we have sij = sji . For example, for
s11 s12 s13 4 3 −2
S = s21 s22 s23 = 3 0 7 ,
s31 s32 s33 −2 7 −1
Note that when we take the transpose of a symmetric matrix, we end up with
the same matrix. In other words, if S is a symmetric matrix, then S T = S. For
example,
T
4 3 −2 4 3 −2
3 0 7 = 3 0 7 .
−2 7 −1 −2 7 −1
Skew-symmetric Matrices
and
0 −6 8
6 0 1
−8 −1 0
are skew-symmetric matrices. In general, if we denote the elements of a skew-
symmetric matrix A as aij , then we have aij = −aji . For example, for
a a12 0 3
A = 11 = ,
a21 a22 −3 0
Diagonal Matrices
67
A diagonal matrix is a matrix whose elements not on the main diagonal are all
0. For example,
1 0 0
0 2 0
0 0 3
and
−5 0 0 0
0 8 0 0
0 0 0 0
0 0 0 6
are diagonal matrices. The elements on the main diagonal are not necessarily non-
zero.
Identity Matrix
The identity matrix is a special case of diagonal matrices. It is the diagonal matrix
whose elements on the main diagonal are all 1:
1 0 ··· 0
0 1 · · · 0
I = . . . .
.. .. . . ...
0 0 ··· 1
For example, the 3 × 3 identity matrix is
1 0 0
I3 = 0 1 0 .
0 0 1
68
The identity matrix is the matrix version of number 1. We will see why when
learning about the matrix multiplication in the next chapter.
Orthogonal Matrices
Last but not least, orthogonal matrices are also an important type of matrices.
An orthogonal matrix is a matrix whose columns are orthonormal. That means
the column vectors must be orthogonal to each other and have unit length. The
name “orthogonal” is somewhat misleading because the columns are not just or-
thogonal; they are orthonormal.
For example,
1 0 0
0 1 0
0 0 1
is an orthogonal matrix because the column vectors
1 0 0
0 , 1 , 0
0 0 1
are orthonormal.
69
Exercises
3 0 −6 7
1. What are the dimensions of the matrix 5 −2 11 9 .
8 12 −1 10
3. Let
3 −7 8
A=
−1 0 5
4. Let
2 −3 6
B = −1 0 5 .
8 12 −9
a. From top to bottom, what are the elements on the main diagonal of B?
b. Tr(B) =?
3 0 −7
5. C = 8 −5 2 . C T =?
1 6 9
70
0 0 0 −2 0 0
1 2 1 10
A. B. 0 0 0 C. 7 8 0 D.
0 3 10 3
0 0 0 5 −3 1
1 2 0
8. Given −1, 1, and −1, check that these vectors are orthogonal, and
−1 1 1
use these three column vectors to construct a 3 × 3 orthogonal matrix.
71
Chapter 7
72
7.1 Addition and Subtraction
In this chapter, we will learn how to do arithmetic operations for matrices. First,
in this section, we will learn how to add or subtract matrices.
1 2 3 −3 0 9 1 + (−3) 2+0 3+9 −2 2 12
+ = = .
4 5 6 7 −6 1 4+7 5 + (−6) 6+1 11 −1 7
10 −3 2 −1
Example 7.2: + =?
6 0 −6 −7
10 −3 2 −1 10 + 2 −3 + (−1) 12 −4
+ = = .
6 0 −6 −7 6 + (−6) 0 + (−7) 0 −7
12 −11 −10 8
Example 7.3: −3 5 + −7 6 =?
9 2 −15 1
12 −11 −10 8 12 + (−10) −11 + 8 2 −3
−3 5 + −7 6 = −3 + (−7) 5 + 6 = −10 11 .
9 2 −15 1 9 + (−15) 2+1 −6 3
Like the addition of numbers and vectors, matrix addition satisfies the commutative
and associative properties. That is, for matrices A, B, and C with the same
dimensions,
A+B =B+A
and
A + (B + C) = (A + B) + C.
73
−5 3 2 5 −3
2
Example 7.4: 0 1 −6 4 =?
7 − 7
6 −8 9 −9 10
9
−5 3 2 2 5 −3 −5 − 2 3−5 2 − (−3)
0 1 7 − 7 −6 4 = 0 − 7 1 − (−6) 7−4
6 −8 9 9 −9 10 6 − 9 −8 − (−9) 9 − 10
−7 −2 5
= −7 7 3 .
−3 1 −1
3 6 9 0 5 11 −1 0
Example 7.5: − =?
7 −2 −8 12 10 5 −7 8
3 6 9 0 5 11 −1 0 3 − 5 6 − 11 9 − (−1) 0−0
− =
7 −2 −8 12 10 5 −7 8 7 − 10 −2 − 5 −8 − (−7) 12 − 8
−2 −5 10 0
= .
−3 −7 −1 4
As mentioned before in the last chapter, matrix multiplication with scalars is the
same as vector multiplication with scalars: we multiply the scalar to each element
of the matrix. Let us look at a few examples.
−1 3
Example 7.6: 2 =?
5 2
−1 3 2 · (−1) 2 · 3 −2 6
2 = = .
5 2 2·5 2·2 10 4
3 0 −2 1
Example 7.7: −5 −1 6 4 −5 =?
7 −3 0 2
3 0 −2 1 −5 · 3 −5 · 0 −5 · (−2) −5 · 1
−5 −1 6 4 −5 = −5 · (−1) −5 · 6 −5 · 4 −5 · (−5)
7 −3 0 2 −5 · 7 −5 · (−3) −5 · 0 −5 · 2
−15 0 10 −5
= 5 −30 −20 25 .
−35 15 0 −10
74
3 −5
Example 7.8: − 0 2 =?
−6 8
3 −5 −3 −(−5) −3 5
− 0 2 = −0 −2 = 0 −2 .
−6 8 −(−6) −8 6 −8
Matrix multiplication with vectors is a little bit more complicated. First, let us
learn how to multiply a matrix with a column vector. When we multiply a matrix
with a column vector, we multiply the first element (from top to bottom) of the
vector to the first column (from left to right) of the matrix, the second element of
the vector to the second column of the matrix, and so on, then we add them all up
together:
v1
| | | v2
~a1 ~a2 · · · ~an
.. = v1~a1 + v2~a2 + · · · + vn~an .
| | | .
vn
It is important to note that we can only multiply a matrix with a column vector
whose dimension is the same as the number of columns of the matrix. Let us look
at a few examples.
3 −5 6
Example 7.9: =?
−2 6 2
3 −5 6 3 −5 18 −10 8
=6 +2 = + = .
−2 6 2 −2 6 −12 12 0
5
1 7 3
Example 7.10: −1 =?
0 −2 4
2
5
1 7 3 1 7 3 5 −7 6 4
−1 = 5 + (−1) +2 = + + = .
0 −2 4 0 −2 4 0 2 8 10
2
−1 −5
2 −3 3 =?
Example 7.11:
8 6 −2
7 9
−1 −5 −1 −5 −3 10 7
2 −3 3 2 −3 6 6 12
8 + (−2) 6 = 24 + −12 = 12 .
= 3
8 6 −2
7 9 7 9 21 −18 3
75
From the examples, we see that the column vectors are always put to the right of
the matrices. We cannot change the order and put the column vectors to the left
because matrix multiplication is not commutative.
Also, note that when we multiply a matrix with a column vector, we are adding
scalar multiples of each column of the matrix. So, multiplying a matrix with a
column vector gives a linear combination of the columns of the matrix.
Next, let us discuss matrix multiplication with row vectors. It is very similar
to matrix multiplication with column vectors. We multiply each element of the
row vector to each row of the matrix, and then add them all up together:
~a1T
|
|
~a2T
|
Addition and scalar multiplication of row vectors are exactly the same as addition
and scalar multiplication of column vectors. It is important to note that we can
only multiply a matrix with a row vector whose dimension is the same as the num-
ber of rows of the matrix. Let us take a look at some examples.
2 4
Example 7.12: 3 −5 =?
1 3
2 4
3 −5 = 3 2 4 + (−5) 1 3 = 6 12 + −5 −15 = 1 −3 .
1 3
6 0
Example 7.13: 1 2 3 −3 8 =?
7 −5
6 0
1 2 3 −3 8 = 1 6 0 + 2 −3 8 + 3 7 −5
7 −5
= 6 0 + −6 16 + 21 −15
= 21 1 .
1 0 5
Example 7.14: 3 5 =?
0 1 −3
1 0 5
3 5 = 3 1 0 5 + 5 0 1 −3
0 1 −3
= 3 0 15 + 0 5 −15
= 3 5 0 .
76
From the examples, we see that the row vectors are always put to the left of the
matrices. Also, note that multiplying a matrix with a row vector gives a linear
combination of the rows of the matrix.
First, we need to know the condition that the dimensions of two matrices A and
B must satisfy to be able to do the matrix multiplication AB. To be able to do
the matrix multiplication AB, the number of columns of A must be the same as
the number of rows of B. That is, we can only multiply a m × n matrix with a
n × p matrix. Also, you should notice from the examples below that the product
of a m × n matrix with a n × p matrix is a m × p matrix.
Second, you should remember that matrix multiplication is not commutative. That
is, generally
AB 6= BA.
If A is m × n and B is n × p, then we can do the multiplication AB. However, if we
change the order, the multiplication BA is impossible to do because the number of
columns of B, p, is not the same as the number of rows of A, m. If A is a m × n
matrix and B is a n × m matrix, then we can still do the multiplication BA, but
the result will be different from AB.
Third, as mentioned in the last chapter, the identity matrix is the matrix ver-
sion of number 1. In high school algebra, we learned that any number times 1 is
the number itself. Similarly, any m × n matrix multiplied by the n × n identity
matrix is the same m × n matrix itself. That is,
AIn = A.
You should observe these important points while looking at the examples below.
There are three ways to multiply two matrices, and you can choose whichever way
you like the best although you should be familiar with all three. All three methods
produce the same final results; they are just three different ways to carry out the
multiplication.
AB = C
77
is to multiply A with each column of B. The first column of C is A times the first
column of B; the second column of C is A times the second column of B; and so
on:
| | | | | |
A ~b1 ~b2 · · · ~bp = A~b1 A~b2 · · · A~bp .
| | | | | |
Let us do a few examples.
2 3 5 −4
Example 7.15: =?
0 −1 −2 1
First, we find the first column of the product:
2 3 5 2 3 4
=5 + (−2) = .
0 −1 −2 0 −1 2
Therefore,
2 3 5 −4 4 −5
= .
0 −1 −2 1 2 −1
5 −4 2 3
Example 7.16: =?
−2 1 0 −1
First, we find the first column of the product:
5 −4 2 5 −4 10
=2 +0 = .
−2 1 0 −2 1 −4
Therefore,
5 −4 2 3 10 19
= .
−2 1 0 −1 −4 −7
3 1 4 3 −5
Example 7.17: 2 7 2 −2 1 =?
0 5 8 2 −4
First, we find the first column of the product:
3 1 4 3 3 1 4 15
2 7 2 −2 = 3 2 + (−2) 7 + 2 2 = −4 .
0 5 8 2 0 5 8 6
78
Then, we find the second column of the product:
3 1 4 −5 3 1 4 −30
2 7 2 1 = −5 2 + 1 7 + (−4) 2 = −11 .
0 5 8 −4 0 5 8 −27
Therefore,
3 1 4 3 −5 15 −30
2 7 2 −2 1 = −4 −11 .
0 5 8 2 −4 6 −27
1 2
1 −2 0 2
Example 7.18: −3 −5 =?
3 1 −1 0
−1 6
First, we find the first column of the product:
1 2 1 2 7
−3 −5 1
= 1 −3 + 3 −5 = −18 .
3
−1 6 −1 6 17
Then, we find the second column of the product:
1 2 1 2 0
−3 −5 −2
= (−2) −3 + 1 −5 = 1 .
1
−1 6 −1 6 8
Next, we find the third column of the product:
1 2 1 2 −2
−3 −5 0 = 0 −3 + (−1) −5 = 5 .
−1
−1 6 −1 6 −6
Lastly, we find the fourth column of the product:
1 2 1 2 2
−3 −5 2 = 2 −3 + 0 −5 = −6 .
0
−1 6 −1 6 −2
Therefore,
1 2 7 0 −2 2
−3 1 −2 0 2
−5 = −18 1 5 −6 .
3 1 −1 0
−1 6 17 8 −6 −2
1 0 0
−2 3 0
Example 7.19: 0 1 0 =?
1 5 −8
0 0 1
First, we find the first column of the product:
1
−2 3 0 −2 3 0 −2
0 =1 +0 +0 = .
1 5 −8 1 5 −8 1
0
79
Then, we find the second column of the product:
0
−2 3 0 −2 3 0 3
1 =0 +1 +0 = .
1 5 −8 1 5 −8 5
0
Next, we find the third column of the product:
0
−2 3 0 −2 3 0 0
0 =0 +0 +1 = .
1 5 −8 1 5 −8 −8
1
Therefore,
1 0 0
−2 3 0 −2 3 0
0 1 0 =
.
1 5 −8 1 5 −8
0 0 1
1
|
T
~a2 ~a2T B
|
B =
.. .
..
. .
T
~am ~a T B
|
80
Next, we find the third row of the product:
3 2 1
0 5 0 0 −1 0 = 0 3 2 1 + 5 0 −1 0 +0 0 3 −5
0 3 −5
= 0 −5 0 .
Therefore,
2 −1 0 3 2 1 6 5 2
1 3 −2 0 −1 0 = 3 −7 11 .
0 5 0 0 3 −5 0 −5 0
−1 6 12 −10 0
Example 7.21: =?
2 −5 5 −2 −3
First, we find the first row of the product:
12 −10 0
−1 6 = (−1) 12 −10 0 + 6 5 −2 −3
5 −2 −3
= 18 −2 −18 .
Therefore,
−1 6 12 −10 0 18 −2 −18
= .
2 −5 5 −2 −3 −1 −10 15
−3 5
1 −1 2
Example 7.22: 7 −2 =?
3 4 −2
1 0
First, we find the first row of the product:
−3 5
1 −1 2 7 −2 = 1 −3 5 + (−1) 7 −2 + 2 1 0
1 0
= −8 7 .
81
Therefore,
−3 5
1 −1 2 −8 7
7 −2 =
.
3 4 −2 17 7
1 0
−3 5
1 −1 2
Example 7.23: 7 −2 =?
3 4 −2
1 0
First, we find the first row of the product:
1 −1 2
−3 5 = (−3) 1 −1 2 + 5 3 4 −2
3 4 −2
= 12 23 −16 .
Then, we find the second row of the product:
1 −1 2
7 −2 = 7 1 −1 2 + (−2) 3 4 −2
3 4 −2
= 1 −15 18 .
Next, we find the third row of the product:
1 −1 2
1 0 = 1 1 −1 2 + 0 3 4 −2
3 4 −2
= 1 −1 2 .
Therefore,
−3 5 12 23 −16
7 −2 1 −1 2
=1 −15 18 .
3 4 −2
1 0 1 −1 2
1 0 0 0 1 2
0 1 0 0 −3 4
Example 7.24:
0 0 1 0 5 −6 =?
0 0 0 1 −7 −8
First, we find the first row of the product:
1 2
−3 4
1 0 0 0 5 −6 = 1 1 2 + 0 −3 4 +0 5 −6 + 0 −7 −8
−7 −8
= 1 2 .
Then, we find the second row of the product:
1 2
−3 4
0 1 0 0 5 −6 = 0 1 2 + 1 −3 4 +0 5 −6 + 0 −7 −8
−7 −8
= −3 4 .
82
Next, we find the third row of the product:
1 2
−3 4
0 0 1 0 = 0 1 2 + 0 −3 4 +1 5 −6 + 0 −7 −8
5 −6
−7 −8
= 5 −6 .
Therefore,
1 0 0 0 1 2 1 2
0 1 0 0 −3 4 −3 4
= .
0 0 1 0 5 −6 5 −6
0 0 0 1 −7 −8 −7 −8
AB = C
| | |
~a2T ~a T ~b1 ~a2T ~b2 ··· ~a2T ~bp
|
In other words, the row i column j element of C is i-th row of A times j-th column
of B. Also, recall from chapter 5 that
~u T ~v = ~u · ~v .
83
Let
−1 2 1 0
= C.
2 5 0 1
For the row 1 column 1 element, we have
1 −1 1
c11 = −1 2 = · = (−1) · 1 + 2 · 0 = −1.
0 2 0
Therefore,
−1 2 1 0 c11 c12 −1 2
= = .
2 5 0 1 c21 c22 2 5
−2 3 6 −1 2
Example 7.26: =?
5 −4 3 1 −4
Let
−2 3 6 −1 2
= C.
5 −4 3 1 −4
For the row 1 column 1 element, we have
6 −2 6
c11 = −2 3 = · = (−2) · 6 + 3 · 3 = −3.
3 3 3
84
For the row 2 column 2 element, we have
−1 5 −1
c22 = 5 −4 = · = 5 · (−1) + (−4) · 1 = −9.
1 −4 1
For the row 2 column 3 element, we have
2 5 2
c23 = 5 −4 = · = 5 · 2 + (−4) · (−4) = 26.
−4 −4 −4
Therefore,
−2 3 6 −1 2 c c12 c13 −3 5 −16
= 11 = .
5 −4 3 1 −4 c21 c22 c23 18 −9 26
A(B + C) = AB + AC
and
(AB)C = A(BC).
In this section, we will prove these properties of matrix multiplication.
Distributive Property
Let
| | |
A = ~a1 ~a2 · · · ~an ,
| | |
u1
u2
~u = . ,
..
un
and
v1
v2
~v = . .
..
vn
85
Then, we have
u1 v1
| | | u2 v2
A(~u + ~v ) = ~a1 ~a2 ··· ~an . + .
.. ..
| | |
un vn
u1 + v1
| | | u2 + v2
= ~a1 ~a2 ··· ~an
..
| | | .
u n + vn
u1 v1
| | | u2 | | | v2
= ~a1 ~a2 ··· ~an . + ~a1 ~a2 ··· ~an .
. .
| | | . | | | .
un vn
= A~u + A~v .
Next, we can use this result to prove the more general distributive property:
A(B + C) = AB + AC
for any m × n matrix A and n × p matrices B and C. In this proof, we will use
method 1 of matrix multiplication, which is multiplying the first matrix to each
column of the second matrix to obtain each column of the product.
Let
| | |
B = ~b1 ~b2 ··· ~bp
| | |
and
| | |
C = ~c1 ~c2 ··· ~cp .
| | |
86
Then, we have
| | | | | |
A(B + C) = A ~b1 ~b2 · · · ~bp + ~c1 ~c2 · · · ~cp
| | | | | |
| | |
~ ~ ~
= A b1 + ~c1 b2 + ~c2 · · · bp + ~cp
| | |
| | |
= A(~b1 + ~c1 ) A(~b2 + ~c2 ) · · · A(~bp + ~cp )
| | |
| | |
= A~b1 + A~c1 A~b2 + A~c2 · · · A~bp + A~cp
| | |
| | | | | |
= A~b1 A~b2 · · · A~bp + A~c1 A~c2 · · · A~cp
| | | | | |
| | | | | |
= A ~b1 ~b2 · · · ~bp + A ~c1 ~c2 · · · ~cp
| | | | | |
= AB + AC.
(A + B)C = AC + BC.
Associative Property
Now let us prove the associative property of matrix multiplication. Before proving
the associative property of matrix multiplication in general, let us first prove that
(AB)~v = A(B~v )
Let
| | |
B = ~b1 ~b2 ··· ~bp
| | |
87
and
v1
v2
~v = . .
..
vp
Then, we have
v1
| | | v2
(AB)~v = A ~b1 ~b2 ··· ~bp
..
.
| | |
vp
v1
| | | v2
~
= Ab1 ~
Ab2 ··· ~
Abp ..
.
| | |
vp
v1
| | | v2
= A ~b1 ~b2 ··· ~bp
..
.
| | |
vp
= A(B~v ).
Note that the fourth step, where we factored out A, follows from the distributive
property of matrix multiplication. Next, we can use this result to prove the more
general associative property:
(AB)C = A(BC)
Let
| | |
C = ~c1 ~c2 ··· ~cr .
| | |
88
Then, we have
| | |
(AB)C = (AB) ~c1 ~c2 · · · ~cr
| | |
| | |
= (AB)~c1 (AB)~c2 · · · (AB)~cr
| | |
| | |
= A(B~c1 ) A(B~c2 ) · · · A(B~cr )
| | |
| | |
= A B~c1 B~c2 · · · B~cr
| | |
| | |
= A B ~c1 ~c2 · · · ~cr
| | |
= A(BC).
(AB)T = B T AT ,
Let
| | |
A = ~a1 ~a2 ··· ~an
| | |
and
v1
v2
~v = . .
..
vn
89
Then, we have
T
v1
| | | v2
(A~v )T = ~a1 ~a2 ··· ~an .
..
| | |
vn
T
= (v1~a1 + v2~a2 + · · · + vn~an ) .
|
~a2T
|
= v1 v2 · · · vn .. .
.
~anT
|
|
Note that
T
v1
v2
~v T
= . = v1 v2 ··· vn
..
vn
and
~a1T
|
T
| | |
~a2T
|
AT = ~a1 ~a2
··· ~an =
..
| | | .
~anT
|
because the transpose turns columns into rows. Therefore, we proved that
(A~v )T = ~v T AT .
(AB)T = B T AT .
Let
| | |
B = ~b1 ~b2 ··· ~bp .
| | |
90
Then, we have
T
| | |
(AB)T = A ~b1 ~b2 · · · ~bp
| | |
T
| | |
= A~b1 A~b2 · · · A~bp (7.1)
| | |
(A~b1 ) T
|
(A~b2 ) T
|
= ..
.
~
(Abp ) T
|
|
~b T AT
|
|
1
~T T
b2 A
|
|
= ..
.
~b T AT
|
p
~b T
|
1
~T
b2
|
T
= .. A
(7.2)
.
~b T
|
p
T T
=B A .
91
Exercises
1 −2 −1 4
1. 3 +5 =?
3 0 2 5
2 −3 3 −4
2. 10 3 1 − 7 5 0 =?
6 −7 8 1
3
10 −9 7
3. 2 =?
−5 6 3
1
5 3 1
4. −2 4 =?
2 4 6
1 −3 7 11
5. −5 0 6 5 =?
2 1 8 −3
92
10. Show that the product of two diagonal matrices is a diagonal matrix:
c1 0 · · · 0 d1 0 · · · 0 c1 d1 0 ··· 0
0 c2 · · · 0 0 d 2 · · · 0 0 c2 d 2 · · · 0
.. = .. .. .
.. .. . . .. .. .. . . .. . .
. . . . . . . . . . . .
0 0 · · · cn 0 0 · · · dn 0 0 · · · cn dn
11. For any matrix A, show that AT A is a symmetric matrix. (Hint: a symmetric
matrix S is a matrix such that S T = ?)
93
Chapter 8
94
8.1 Three Types of Row Operations
In this chapter, we will learn about the row operations, and then we will learn an
application of it in the next chapter. Simply speaking, row operations are some-
thing we do to the rows of a matrix. There are three types of row operations.
Switching Rows
When doing row operations on a matrix, we can switch two rows with each other.
Let us look at some examples.
1 5
Example 8.1: Switch row 1 and row 2 of .
2 6
5 1 2 6
→ .
6 2 1 5
3 −1 0
Example 8.2: Switch row 1 and row 3 of 2 −2 8 .
1 5 10
3 −1 0 1 5 10
2 −2 8 → 2 −2 8 .
1 5 10 3 −1 0
1 2 3 −5
Example 8.3: Switch row 2 and row 3 of −2 1 0 6 .
5 7 8 −9
1 2 3 −5 1 2 3 −5
−2 1 0 6 → 5 7 8 −9 .
5 7 8 −9 −2 1 0 6
The second type of row operations is multiplying a row by a non-zero scalar. When
we multiply row i1 by k, we replace the original row i1 with a new row vector
obtained by multiplying the original row i1 by k. Let us do some examples.
1 2
Example 8.4: Multiply row 1 by 3 for the matrix .
−5 3
We replace row 1 with 3 times row 1:
1 2 →3 1 2 = 3 6 .
So,
1 2 3 6
→ .
−5 3 −5 3
95
3 6 −9
Example 8.5: Multiply row 2 by −5 for the matrix .
1 −2 2
We replace row 2 with −5 times row 2:
1 −2 2 → (−5) 1 −2 2 = −5 10 −10 .
So,
3 6 −9 3 −9
6
→ .
1 −2 2 −5 −10
10
1 5 12
Example 8.6: Multiply row 3 by 2 for the matrix −9 −3 10 .
6 3 −2
We replace row 3 with 2 times row 3:
6 3 −2 → 2 6 3 −2 = 12 6 −4 .
So,
1 5 12 1 5 12
−9 −3 10 → −9 −3 10 .
6 3 −2 12 6 −4
96
1 −1 5
Example 8.9: Add row 1 to row 3 for the matrix 3 7 −8.
2 10 6
Adding row 1 to row 3 gives
1 −1 5 + 2 10 6 = 3 9 11 .
So, we will replace row 3 with this new row vector:
1 −1 5 1 −1 5
3 7 −8 → 3 7 −8 .
2 10 6 3 9 11
Switching Rows
Let us look back at an example we did in the last section. In example 8.1, we
saw that switching row 1 and row 2 of
1 5
(8.1)
2 6
gives
2 6
.
1 5
Now consider the matrix multiplication
0 1 1 5
.
1 0 2 6
The first row of the product is
1 5
0 1 =0 1 5 +1 2 6 = 2 6 ,
2 6
so we can see that the original row 2 is now moved to the position of row 1. The
second row of the product is
1 5
1 0 =1 1 5 +0 2 6 = 1 5 ,
2 6
so the original row 1 is now moved to the position of row 2. Thus,
0 1 1 5 2 6
= ,
1 0 2 6 1 5
97
which is the same as the matrix obtained after switching row 1 and row 2 of the
original matrix in (8.1). Do you notice anything particular about the matrix
0 1
1 0
which we are multiplying at the left? It is the 2 × 2 identity matrix with row 1 and
row 2 switched:
1 0 0 1
→ .
0 1 1 0
Next, let us look at another example we did in the last section. In example 8.3, we
saw that switching row 2 and row 3 of
1 2 3 −5
−2 1 0 6 (8.2)
5 7 8 −9
gives
1 2 3 −5
5 7 8 −9 .
−2 1 0 6
Now consider the matrix multiplication
1 0 0 1 2 3 −5
0 0 1 −2 1 0 6 .
0 1 0 5 7 8 −9
The first row of the product is
1 2 3 −5
1 0 0 −2 1 0 6
5 7 8 −9
=1 1 2 3 −5 + 0 −2 1 0 6 +0 5 7 8 −9
= 1 2 3 −5 ,
so row 1 stays the same. The second row of the product is
1 2 3 −5
0 0 1 −2 1 0 6
5 7 8 −9
=0 1 2 3 −5 + 0 −2 1 0 6 + 1 5 7 8 −9
= 5 7 8 −9 ,
so the original row 3 is now moved to the position of row 2. The third row of the
product is
1 2 3 −5
0 1 0 −2 1 0 6
5 7 8 −9
=0 1 2 3 −5 + 1 −2 1 0 6 + 0 5 7 8 −9
= −2 1 0 6 ,
98
so the original row 2 is now moved to the position of row 3. Thus,
1 0 0 1 2 3 −5 1 2 3 −5
0 0 1 −2 1 0 6 = 5 7 8 −9 ,
0 1 0 5 7 8 −9 −2 1 0 6
which is the same as the matrix obtained after switching row 2 and row 3 of the
original matrix in (8.2). Again, do you notice anything particular about the matrix
1 0 0
0 0 1
0 1 0
which we are multiplying at the left? It is the 3 × 3 identity matrix with row 2 and
row 3 switched:
1 0 0 1 0 0
0 1 0 → 0 0 1 .
0 0 1 0 1 0
So, here is the general rule: when we switch row i1 and row i2 of a m × n matrix
A, it is the same as multiplying a m × m matrix E1 to the left of A, where E1 is
the m × m identity matrix with row i1 and row i2 switched.
0 1 0 0
to the left of the 4 × 3 matrix.
Next, let us see how this type of row operations can be expressed as a matrix
multiplication. In example 8.5, we saw that multiplying row 2 by −5 for the ma-
trix
3 6 −9
(8.3)
1 −2 2
gives
3 6 −9
.
−5 10 −10
Now consider the matrix multiplication
1 0 3 6 −9
.
0 −5 1 −2 2
The first row of the product is
3 6 −9
1 0 =1 3 6 −9 + 0 1 −2 2 = 3 6 −9 ,
1 −2 2
99
so row 1 stays the same. The second row of the product is
3 6 −9
0 −5 = 0 3 6 −9 + (−5) 1 −2 2 = −5 10 −10 ,
1 −2 2
so row 2 is multiplied by −5. Thus,
1 0 3 6 −9 3 6 −9
= ,
0 −5 1 −2 2 −5 10 −10
which is the same as the matrix obtained after multiplying row 2 by −5 for the
original matrix in (8.3). Do you notice anything particular about the matrix
1 0
0 −5
which we are multiplying at the left? It is the 2 × 2 identity matrix with row 2
multiplied by −5:
1 0 1 0
→ .
0 1 0 −5
Next, let us look at another example from the last section. In example 8.6, we saw
that multiplying row 3 by 2 for the matrix
1 5 12
−9 −3 10 (8.4)
6 3 −2
gives
1 5 12
−9 −3 10 .
12 6 −4
Now consider the matrix multiplication
1 0 0 1 5 12
0 1 0 −9 −3 10 .
0 0 2 6 3 −2
The first row of the product is
1 5 12
1 0 0 −9 −3 10 = 1 1 5 12 + 0 −9 −3 10 + 0 6 3 −2
6 3 −2
= 1 5 12 ,
100
so row 2 stays the same. The third row of the product is
1 5 12
0 0 2 −9 −3 10 = 0 1 5 12 + 0 −9 −3 10 + 2 6 3 −2
6 3 −2
= 12 6 −4 ,
So, we have the following general rule: when we multiply row i1 by k for a m × n
matrix A, it is the same as multiplying a m × m matrix E2 to the left of A, where
E2 is the m × m identity matrix with row i1 multiplied by k.
0 0 0 1
to the left of the 4 × 2 matrix.
Now let us see how to express this type of row operations as a matrix multipli-
cation. In example 8.7, we saw that adding 2 times row 1 to row 2 for the matrix
−1 2
(8.5)
3 −5
gives
−1 2
.
1 −1
101
Row 1 stays the same, so we have
−1 2
1 −1 2 +0 3 −5 = 1 0 .
3 −5
Row 2 is replaced with 2 times the original row 1 added to the original row 2, so
we have
−1 2
2 −1 2 + 1 3 −5 = 2 1 .
3 −5
Thus, we have
1 0 −1 2 −1 2
= .
2 1 3 −5 1 −1
So, multiplying
1 0
2 1
to the left of the original matrix in (8.5) is the same as adding 2 times row 1 to
row 2 for that matrix. Do you notice anything particular about the matrix which
we are multiplying at the left? It is the 2 × 2 identity matrix with 2 times row 1
added to row 2:
1 0 1 0
→ .
0 1 2 1
Next, let us look at example 8.8. In that example, we saw that adding −3 times
row 2 to row 3 for the matrix
−2 3
0 5 (8.6)
7 10
gives
−2 3
0 5 .
7 −5
Row 1 stays the same, so we have
−2 3
1 −2 3 +0 0 5 +0 7 10 = 1 0 0 0 5 .
7 10
Row 3 is replaced with −3 times the original row 2 added to the original row 3, so
we have
−2 3
0 −2 3 + (−3) 0 5 + 1 7 10 = 0 −3 1 0 5 .
7 10
102
Thus, we have
1 0 0 −2 3 −2 3
0 1 0 0 5= 0 5 .
0 −3 1 7 10 7 −5
So, multiplying
1 0 0
0 1 0
0 −3 1
to the left of the original matrix in (8.6) is the same as adding −3 times row 2 to
row 3 for that matrix. Again, do you notice anything particular about the matrix
which we are multiplying at the left? It is the 3 × 3 identity matrix with −3 times
row 2 added to row 3:
1 0 0 1 0 0
0 1 0 → 0 1 0 .
0 0 1 0 −3 1
Here is the general rule: when we add k times row i1 to row i2 for a m × n matrix
A, it is the same as multiplying a m × m matrix E3 to the left of A, where E3 is
the m × m identity matrix with k times row i1 added to row i2 .
As you have seen, all of the matrices which we multiply to the left of other matrices
to perform row operations on those matrices are identity matrices being done the
same row operations on. These matrices which perform row operations on other
matrices when multiplied to the left are called elementary matrices.
For example, if we switch row 2 and row 3 and then multiply row 2 by 5 for a
3 × 2 matrix B, it is the same as multiplying
1 0 0
0 0 1 ,
0 1 0
103
which switches row 2 and row 3, and then
1 0 0
0 5 0 ,
0 0 1
104
Exercises
−1 5 2
3 0 6
1. Let A = .
7 −9 10
−2 8 −3
a) What is the resulting matrix after switching row 2 and row 4?
3 −2 1 0
2. Let B = .
5 −3 6 2
a) What is the resulting matrix after multiplying row 1 by 2?
6 11 −3
3. Let C = 3 −5 6 .
1 2 −1
a) What is the resulting matrix after adding −3 times row 3 to row 1?
−2 7
4. Let D = 3 −1.
6 −8
a) What is the resulting matrix after switching row 1 and row 3 and then
adding row 3 to row 2?
105
Chapter 9
106
9.1 Reduced Row Echelon Form
Any matrix can be reduced to the reduced row echelon form, and the reduced row
echelon form of a matrix A is denoted as rref(A). In this section, we will learn
how to reduce a matrix to reduced row echelon form by applying row operations
we learned in the last chapter.
1. Start with the non-zero number at the top left corner (row 1 column 1 posi-
tion). If the number at the top left corner is 0, then switch rows so that the
number at the top left corner is a non-zero number.
2. Then, do row operations on the matrix so that all elements below on the same
column become 0 and the number we started with becomes 1.
3. Go down to the next row and start with the first non-zero number (from left
to right) on that row. If all elements on that row are 0, go down one more
row to see if there is a non-zero number and start with that number.
4. Then, do row operations on the matrix so that all elements above and below
on the same column become 0 and the new number we started with becomes
1.
5. Repeat step 3-4 until we reach the last row. Then, if there are any rows with
all 0’s, switch rows so that the rows with all 0’s come to the bottom.
6. The resulting matrix is reduced row echelon form of A. The columns with 1
and 0’s at other places are called pivot columns, and the 1’s on pivot columns
are pivots.
1 12 1
1
→ 2 .
6 −3 0 −6
Next, we go down to row 2 and start with the first non-zero number on row 2,
which is −6. Then, we need to do row operations so that that −6 becomes 1 and
107
1
the 2 above becomes 0.
Multiplying row 2 by − 16 ,
1 1
1 1
2 → 2 .
0 −6 0 1
Adding − 12 times row 2 to row 1,
1
1 1 0
2 → .
0 1 0 1
Therefore,
1 0
rref(A) = .
0 1
Column 1 and column 2 are pivot columns.
−2 3 −6
Example 9.2: Let B = 10 −9 18 , rref(B) =?
8 −6 12
First, we start with the number at the top left corner, which is −2. Then, we
need to do row operations so that that −2 becomes 1 and the 10 and 8 below
become 0.
Multiplying row 1 by − 21 ,
− 32
−2 3 −6 1 3
10 −9 18 → 10 −9 18 .
8 −6 12 8 −6 12
− 32 − 32
1 3 1 3
10 −9 18 → 0 6 −12 .
8 −6 12 8 −6 12
1 − 32 − 32
3 1 3
0 6 −12 → 0 6 −12 .
8 −6 12 0 6 −12
Next, we go down to row 2 and start with the first non-zero number on row 2,
which is 6. Then, we need to do row operations so that that 6 becomes 1 and the
− 23 above and the 6 below become 0.
Multiplying row 2 by 61 ,
− 32 − 32
1 3 1 3
0 6 −12 → 0 1 −2 .
0 6 −12 0 6 −12
108
3
Adding 2 times row 2 to row 1,
− 32
1 3 1 0 0
0 1 −2 → 0 1 −2 .
0 6 −12 0 6 −12
Next, we go down to row 3. Since row 3, the last row, is all 0’s, there is nothing
more to do. Therefore,
1 0 0
rref(B) = 0 1 −2 .
0 0 0
Column 1 and column 2 are pivot columns.
1 −1
Example 9.3: Let C = 3 5 , rref(C) =?
−2 6
First, we start with the number at the top left corner, which is 1. Then, we need
to do row operations so that that 1 becomes 1 and the 3 and −2 below become 0.
(Obviously, since the 1 at the top left corner is already 1, we do not need to do any
row operations for that part.)
Next, we go down to row 2 and start with the first non-zero number on row 2,
which is 8. Then, we need to do row operations so that that 8 becomes 1 and the
−1 above and the 4 below become 0.
Multiplying row 2 by 18 ,
1 −1 1 −1
0 8 → 0 1 .
0 4 0 4
109
Adding row 2 to row 1,
1 −1 1 0
0 1 → 0 1 .
0 4 0 4
Adding −4 times row 2 to row 3,
1 0 1 0
0 1 → 0 1 .
0 4 0 0
Next, we go down to row 3. Since row 3, the last row, is all 0’s, there is nothing
more to do. Therefore,
1 0
rref(C) = 0 1 .
0 0
Column 1 and column 2 are pivot columns.
3 9 6
Example 9.4: Let D = , rref(D) =?
5 15 7
First, we start with the number at the top left corner, which is 3. Then, we
need to do row operations so that that 3 becomes 1 and the 5 below becomes 0.
Multiplying row 1 by 13 ,
3 9 6 1 3 2
→ .
5 15 7 5 15 7
Next, we go down to row 2 and start with the first non-zero number on row 2,
which is −3. Then, we need to do row operations so that that −3 becomes 1 and
the 2 above becomes 0.
Multiplying row 2 by − 13 ,
1 3 2 1 3 2
→ .
0 0 −3 0 0 1
Therefore,
1 3 0
rref(D) = .
0 0 1
110
Column 1 and column 3 are pivot columns.
1 5 3 6
Example 9.5: Let E = 2 10 6 12, rref(E) =?
5 25 8 9
First, we start with the number at the top left corner, which is 1. Then, we
need to do row operations so that that 1 becomes 1 and the 2 and 5 below become
0. (Obviously, since the 1 at the top left corner is already 1, we do not need to do
any row operations for that part.)
Next, we go down to row 2. Since row 2 is all 0’s, we go down to row 3 and start
with the first non-zero number on row 3, which is −7. Then, we need to do row
operations so that that −7 becomes 1 and the 3 above becomes 0.
Multiplying row 3 by − 17 ,
1 5 3 6 1 5 3 6
0 0 0 0 → 0 0 0 0 .
0 0 −7 −21 0 0 1 3
Since row 2 in the middle is a 0 row, we switch row 2 and row 3 so that the 0 row
comes to the bottom:
1 5 0 −3 1 5 0 −3
0 0 0 0 → 0 0 1 3 .
0 0 1 3 0 0 0 0
Therefore,
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0
Column 1 and column 3 are pivot columns.
111
9.2 Rank of Matrices
Notice that when we reduce a matrix to reduced row echelon form, each pivot is
in a different column and a different row from other pivots. In other words, each
row and each column only have at most one pivot. Also, note that the rows that
do not have pivots are 0 rows.
The number and positions of the pivots of reduced row echelon form of a ma-
trix play important roles in linear algebra. In this section, we will focus on the
meaning of the number of pivots, which is the rank of the matrix. For example, if
the reduced row echelon form of a matrix A has two pivots, then we say that the
matrix A has rank 2.
The rank of a matrix is the number of linearly independent rows of that matrix.
For example, let us say we have a matrix of five rows. If row 1, row 2, and row 4 are
linearly independent, and row 3 and row 5 can be written as linear combinations
of row 1, row 2, and row 4, then that matrix has three linearly independent rows,
and its rank is 3.
It should not be too hard to understand why the number of pivots of reduced
row echelon form is equal to the number of linearly independent rows. Let us think
of an example. Say we have a matrix of three rows with row 1 and row 2 being
linearly independent and row 3 being linearly dependent with respect to row 1 and
row 2, specifically row 3 equals row 1 plus row 2. Recall that we obtain reduced row
echelon form by doing row operations on the matrix. By adding −1 times row 1 to
row 3 and then adding −1 times row 2 to row 3, row 3 in the reduced row echelon
form would be a 0 row, and only row 1 and row 2 of the reduced row echelon form
would have pivots. In general, the linearly dependent rows will be cancelled out
when doing row operations, and the linearly independent rows will remain non-zero
and contain pivots.
There is also a nice fact that the number of linearly independent rows of a matrix
is equal to the number of linearly independent columns of that matrix. However,
the proof of that is slightly complicated, so we will not prove it here.
1 0
rref(A) = .
0 1
112
Since there are two pivots, the rank of A is 2.
−2 3 −6
Example 9.7: Let B = 10 −9 18 . How many linearly independent rows
8 −6 12
are there in B?
Since there are two pivots, B has two linearly independent rows.
1 −1
Example 9.8: Let C = 3 5 . How many linearly independent columns
−2 6
are there in C?
Since there are two pivots, C has two linearly independent columns.
3 9 6
Example 9.9: Let D = . What is the rank of D?
5 15 7
In example 9.4, we found that
1 3 0
rref(D) = .
0 0 1
Since there are two pivots, E has two linearly independent columns.
113
Exercises
2 −2 6
1. Let A1 = 6 3 0 .
12 −12 15
a) rref(A1 ) =?
−3 2 9
2. Let A2 = .
6 −4 −18
a) rref(A2 ) =?
5 −3 2 −1
3. Let A3 = −2 6 −5 1 .
7 −1 3 2
a) rref(A3 ) =?
114
d) How many linearly independent rows are there in A3 ?
−6 3
4. Let A4 = 1 −2.
5 1
a) rref(A4 ) =?
115
Chapter 10
116
10.1 What are Four Fundamental Subspaces?
The four fundamental subspaces are some vector spaces related to a matrix which
are subspaces of some n-dimensional space Rn . The four fundamental subspaces
of a matrix are column space, row space, null space, and left null space. Given an
m × n matrix A,
• The column space is the vector space of all linear combinations of columns of
A and is a subspace of Rm ;
• The row space is the vector space of all linear combinations of rows of A and
is a subspace of Rn ;
• The null space is the vector space of all n-dimensional vectors ~v such that
A~v = 0 and is a subspace of Rn ;
• The left null space is the vector space of all m-dimensional vectors ~u such
that ~u T A = 0 and is a subspace of Rm .
Now let us learn about these four spaces in more details. If you are not yet very
familiar with the concept of vector spaces, I recommend that you review chapter 3
before moving on with this chapter.
In the last chapter, we learned how to determine how many linearly independent
columns there are in a matrix. In this section, we will learn how to determine
which columns are linearly independent so that we can find the basis vectors for
the column space of that matrix.
The proof that the positions of pivot columns of rref(A) are the same as the posi-
tions of linearly independent columns of A is slightly complicated, so we will not
prove it here. Instead, think of it intuitively like this: because the pivot columns of
rref(A) are linearly independent, the columns of A at the same positions are also
linearly independent. It should not be too difficult to see why the pivot columns
are linearly independent. Each pivot column has 0’s and one 1, and the 1’s of
117
different pivot columns are at different places. For example, there is only one pivot
column with 1 as its first element, and there is only at most one pivot column with
1 as its second element, and so on. As an example, let us consider the following
vectors which are some possible pivot columns of the reduced row echelon form of
a matrix:
1 0 0
0 1 0
, , .
0 0 1
0 0 0
Is there any non-zero linear combination of these vectors which gives the 0 vector?
In other words, is it possible to have
1 0 0 0
0 1 0 0
0 + c2 0 + c3 1 = 0
c1
0 0 0 0
Now let us do some examples of finding the column space of a matrix as the span
of linearly independent columns.
2 1
Example 10.1: Let A = . Find C(A).
6 −3
In example 9.1, we found that
1 0
rref(A) = .
0 1
Since column 1 and column 2 of rref(A) are the pivot columns, column 1 and column
2 of A are linearly independent, and thus
2 1
C(A) = span , .
6 −3
118
Since column 1 and column 2 of rref(B) are the pivot columns, column 1 and
column 2 of B are linearly independent, and thus
−2 3
C(B) = span 10 , −9 .
8 −6
Since column 1 and column 2 of rref(C) are the pivot columns, column 1 and
column 2 of C are linearly independent, and thus
1 −1
C(C) = span 3 , 5 .
−2 6
Since column 1 and column 3 of rref(D) are the pivot columns, column 1 and
column 3 of D are linearly independent, and thus
3 6
C(D) = span , .
5 7
119
In example 9.5, we found that
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0
Since column 1 and column 3 of rref(E) are the pivot columns, column 1 and
column 3 of E are linearly independent, and thus
1 3
C(E) = span 2 , 6 .
5 8
There are a few more things to note about the column space before we move on to
the row space.
First, the column space of a m × n matrix is a subspace of Rm . Note that each col-
umn of a m × n matrix has m elements, so it is a m-dimensional vector. Remember
from chapter 3 that the span of m-dimensional linearly independent vectors is a
subspace of Rm . So, the column space of a m × n matrix is a subspace of Rm . For
example, the column space of a 3 × 2 matrix is a subspace of R3 .
Second, the column space of A is the same as the row space of AT . As we will
learn in the next section, the row space of a matrix is the vector space of all linear
combinations of rows of that matrix. When we take the transpose of a matrix, the
columns of A become rows of AT . So, linear combinations of columns of A are the
same as linear combinations of rows of AT .
However, for R(A), we do not need to use linearly independent rows of A as the
basis vectors although we can. Instead, we can write R(A) as the span of non-zero
rows of rref(A).
120
A, and the non-zero rows of rref(A) are linearly independent because they would
have been cancelled out to become 0 rows if they were linearly dependent instead.
If we have a set of linearly independent vectors and another set of other linearly
independent vectors which can be written as linear combinations of the vectors of
the first set, then the span of the second set is the same as the span of the first set.
Let us see why this is true with an example with two vectors. Say we have a set of
two linearly independent vectors ~v1 and ~v2 and another set of linearly independent
vectors ~u1 and ~u2 , where ~u1 and ~u2 can be written as some linear combinations of
~v1 and ~v2 , i.e.
~u1 = c1~v1 + c2~v2
and
~u2 = c3~v1 + c4~v2
for some scalars c1 , c2 , c3 , and c4 . Then, for some scalars a and b, a vector w
~ in
the span of ~u1 and ~u2 is a vector of the form
w
~ = a~u1 + b~u2
= a(c1~v1 + c2~v2 ) + b(c3~v1 + c4~v2 )
= ac1~v1 + ac2~v2 + bc3~v1 + bc4~v2
= (ac1 + bc3 )~v1 + (ac2 + bc4 )~v2 ,
which is a linear combination of ~v1 and ~v2 in the span of ~v1 and ~v2 as well.
Thus, since the non-zero rows of rref(A) are linearly independent and are some
linear combinations of the linearly independent rows of A, the span of non-zero
rows of rref(A) is the same as the span of linearly independent rows of A, and we
can write R(A) as the span of non-zero rows of rref(A).
121
In example 9.2, we found that
1 0 0
rref(B) = 0 1 −2 .
0 0 0
122
1 5 3 6
Example 10.10: Let E = 2 10 6 12. Find R(E).
5 25 8 9
In example 9.5, we found that
1 5 0 −3
rref(E) = 0 0 1 3 .
0 0 0 0
There are a few more things to note about the row space before we move on to the
null and left null spaces.
First, the row space of a m × n matrix is a subspace of Rn . Note that each row of
a m × n matrix has n elements, so it is a n-dimensional vector. Remember from
chapter 3 that the span of n-dimensional linearly independent vectors is a subspace
of Rn . So, the row space of a m × n matrix is a subspace of Rn . For example, the
row space of a 4 × 5 matrix is a subspace of R5 .
Second, the row space of A is the same as the column space of AT . As we learned
in the last section, the column space of a matrix is the vector space of all linear
combinations of columns of that matrix. When we take the transpose of a matrix,
the rows of A become columns of AT . So, linear combinations of rows of A are the
same as linear combinations of columns of AT .
We know that any matrix times the 0 vector is the 0 vector, so ~v = 0 is in the
null space of any matrix A. The more interesting question is when there can be
non-zero vectors ~v such that A~v = 0. There can be non-zero vectors ~v such that
A~v = 0 when not all columns of A are linearly independent, i.e. when A has some
linearly dependent column(s).
123
Let
| | |
A = ~a1 ~a2 ··· ~an
| | |
and
v1
v2
~v = . .
..
vn
Then,
v1
| | | v2
~a1 ~a2 ··· ~an . = v1~a1 + v2~a2 + · · · + vn~an ,
.
| | | .
vn
so the equation A~v = 0 becomes
From the definition of linear independence, we know that the only solution to this
is v1 = v2 = · · · = vn = 0 if the column vectors ~a1 , ~a2 , · · · , ~an are all linearly in-
dependent. So, if all columns of A are linearly independent, then the only solution
to A~v = 0 is ~v = 0. Otherwise, if not all columns of A are linearly independent,
then not all of v1 , v2 , · · · , vn are 0, so there can be non-zero vectors ~v such that
A~v = 0.
To find the vectors ~v in N (A), we need to solve the equation A~v = 0. How-
ever, solving A~v = 0 directly could be complicated, so we can solve the equation
rref(A)~v = 0 instead because these two equations are equivalent.
Reduced row echelon form rref(A) is obtained by doing row operations to A, and
recall from chapter 8 that doing row operations to A is the same as multiplying
some elementary matrices to A. In high school algebra, we learned that to keep
an equation the same, we need to multiply both sides of the equation by the same
thing. For the equation A~v = 0, multiplying the elementary matrices to both sides
gives rref(A)~v = 0 because the elementary matrices times A is rref(A), and the
elementary matrices times 0 is still 0 because any matrix times the 0 vector is the
0 vector.
124
Let
v
~v = 1 ,
v2
and we need to solve the equation rref(A)~v = 0:
1 0 v1 0
=
0 1 v2 0
1 0 0
v1 + v2 =
0 1 0
v1 0
= .
v2 0
So, we have v1 = 0 and v2 = 0:
v 0
~v = 1 = .
v2 0
Thus, N (A) contains only the 0 vector:
0
N (A) = .
0
We can see that N (A) contains only the 0 vector because all columns of A are lin-
early independent. Since it contains only the 0 vector, it is a 0-dimensional vector
space.
−2 3 −6
Example 10.12: Let B = 10 −9 18 . Find N (B).
8 −6 12
In example 9.2, we found that
1 0 0
rref(B) = 0 1 −2 .
0 0 0
Let
v1
~v = v2 ,
v3
and we need to solve the equation rref(B)~v = 0:
1 0 0 v1 0
0 1 −2 v2 = 0
0 0 0 v3 0
1 0 0 0
v1 0 + v2 1 + v3 −2 = 0
0 0 0 0
v1 0
v2 − 2v3 = 0 .
0 0
125
So, we have v1 = 0 and v2 = 2v3 :
v1 0 0
~v = v2 = 2v3 = v3 2 .
v3 v3 1
0
Thus, N (B) contains any vectors ~v that are linear combinations of 2:
1
0
N (B) = span 2 .
1
126
We can see that N (C) contains only the 0 vector because all columns of C are lin-
early independent. Since it contains only the 0 vector, it is a 0-dimensional vector
space.
3 9 6
Example 10.14: Let D = . Find N (D).
5 15 7
In example 9.4, we found that
1 3 0
rref(D) = .
0 0 1
Let
v1
~v = v2 ,
v3
and we need to solve the equation rref(D)~v = 0:
v1
1 3 0 0
v2 =
0 0 1 0
v3
1 3 0 0
v1 + v2 + v3 =
0 0 1 0
v1 + 3v2 0
= .
v3 0
So, we have v1 = −3v2 and v3 = 0:
v1 −3v2 −3
~v = v2 = v2 = v2 1 .
v3 0 0
−3
Thus, N (D) contains any vectors ~v that are linear combinations of 1 :
0
−3
N (D) = span 1 .
0
127
Let
v1
v2
v3 ,
~v =
v4
and we need to solve the equation rref(E)~v = 0:
v
1 5 0 −3 1
0
0 0 1 3 v2 = 0
v3
0 0 0 0 0
v4
1 5 0 −3 0
v1 0 + v2 0 + v3 1 + v4 3 = 0
0 0 0 0 0
v1 + 5v2 − 3v4 0
v3 + 3v4 = 0 .
0 0
v4 v4 0 1
−5
1
0 and
Thus, N (E) contains any vectors ~v that are linear combinations of
0
3
0
:
−3
1
−5
3
1 0
N (E) = span , .
0
−3
0 1
The left null space of a m × n matrix A is the vector space of all m-dimensional
vectors ~u such that ~u T A = 0. The vectors ~u have to be m-dimensional because
a m × n matrix can only be multiplied by a m-dimensional row vector. Because
the left null space is a vector space of m-dimensional vectors, it is a subspace of Rm .
128
The left null space is actually nothing really new. The left null space of A is
the null space of AT , so we just need to find the null space of AT if we want to find
the left null space of A.
AT ~u = 0
gives
T
AT ~u = 0
because the transpose of 0 column vector is just the 0 row vector. Then, using the
formula proven in section 7.5 of chapter 7,
T T
AT ~u = ~u T AT .
T
From question 6 in the exercises of chapter 6, we know that AT = A, so
T
AT ~u = ~u T A.
129
Exercises
2 −2 6
1. Let A1 = 6 3 0 .
12 −12 15
a) Find the column space of A1 . How many dimensions does it have?
d) Find the left null space of A1 . How many dimensions does it have?
−3 2 9
2. Let A2 = .
6 −4 −18
a) Find the column space of A2 . How many dimensions does it have?
d) Find the left null space of A2 . How many dimensions does it have?
5 −3 2 −1
3. Let A3 = −2 6 −5 1 .
7 −1 3 2
a) Find the column space of A3 . How many dimensions does it have?
d) Find the left null space of A3 . How many dimensions does it have?
130
Chapter 11
Inverse of Matrices
131
11.1 Invertible and Singular Matrices
A n × n square matrix A can have the inverse A−1 such that
AA−1 = A−1 A = In
A~v = ~u,
However, not all square matrices have inverses. Matrices that have inverses are
called invertible matrices, and matrices that do not have inverses are called singu-
lar matrices.
So, when is a matrix invertible? When does the inverse of a matrix exist?
Let us take an example where the function does not have an inverse: f (x) = x2 .
We can see that any horizontal line in the upper half of the xy-plane intersects the
graph y = x2 twice. So, the graph does not pass the horizontal line test, and the
function does not have an inverse. This happens because for each positive y-value,
there are two x-values such that f (x) = y. For example, (−3)2 = 32 = 9. So, f is
taking two x-values to the same y-value. The inverse f −1 is supposed to take a y-
value back to a x-value, but in this case f −1 does not know which x-value to take the
y-value back to because there are two options. For example, f −1 would not know
whether to take the y-value 9 back to the x-value 3 or −3. So, f −1 does not exist.
Remember that a function can only take one value to one value, not more than one.
We can think of a matrix A and its inverse A−1 in a similar way. As explained
earlier, if
A~v = ~u,
then
A−1 ~u = ~v
when A−1 exists. We can think of this as A is taking a vector ~v to a vector ~u, and
and A−1 is taking the vector ~u back to the vector ~v .
132
When not all columns of A are linearly independent, there can be two or more
vectors ~v such that
A~v = ~u1
for some vector ~u1 . Suppose there is a vector ~v1 such that
A~v1 = ~u1 .
As we learned in the last chapter, if not all columns of A are linearly independent,
then there are non-zero vectors ~vn in the null space of A. Let ~v10 = ~v1 + ~vn , then
So, A takes different vectors ~v1 and ~v10 to the same vector ~u1 . Then, A−1 would
not know whether to take the vector ~u1 back to the vector ~v1 or ~v10 , so A−1 does
not exist.
Thus, when not all columns of A are linearly independent, A−1 does not exist,
and A is a singular matrix. Otherwise, if all columns of A are linearly independent,
then A−1 exists, and A is an invertible matrix.
That means we can do some row operations to A to get the identity matrix. Re-
member that doing row operations to A is the same as multiplying some elementary
matrices to A. Since multiplying the product of those elementary matrices to A
gives the identity matrix, the product of those elementary matrices is A−1 . So, to
find A−1 , we need to know what the product of those elementary matrices is.
Now how do we know what the product of those elementary matrices is? Well,
we can rewrite each row operation as a matrix multiplication, keep track of each
elementary matrix, and then multiply all of the elementary matrices at the end.
However, there is a better way. We can attach the identity matrix I to the right
of A and do the same row operations to both A and I at the same time. By doing
this, we are multiplying the same elementary matrices to I, and any matrix times I
is the matrix itself, so we will know what the product of those elementary matrices
is. Let us look at some examples.
133
2 1
Example 11.1: Let A = . Is A invertible? If so, find A−1 .
6 −3
By reducing A to rref(A), we can see that all columns are linearly independent, so
A is invertible. To find A−1 , we first attach I2 to the right of A:
2 1 1 0
.
6 −3 0 1
Next, we do some row operations to reduce A to rref(A). Multiplying first row by
1
2,
1 21 1
2 1 1 0 0
→ 2 .
6 −3 0 1 6 −3 0 1
Adding −6 times row 1 to row 2,
1 12 1 1 1
0 1 0
2 → 2 2 .
6 −3 0 1 0 −6 −3 1
Multiplying row 2 by − 16 ,
1 1
1 21 1
0 1 0
2 → 2 2
1 .
0 −6 −3 1 0 1 2 − 61
1 12 12 1 1
0 1 0
→ 4 12 .
0 1 12 − 16 0 1 1
2 − 16
Therefore, 1 1
−1 4 12
A = 1 .
2 − 16
We can check that 1 1
4 12 2 1 1 0
1 = .
2 − 16 6 −3 0 1
1 3
Example 11.2: Let B = . Is B invertible? If so, find B −1 .
5 15
We can see that the second column is 3 times the first column, so not all columns
of B are linearly independent. So, B is not invertible. We can use reduced row
echelon form to check the linear dependence of columns, but it is not necessary in
this case where we can see quickly like this.
1 6 −3
Example 11.3: Let C = −2 0 −6. Is C invertible? If so, find C −1 .
3 9 1
By reducing C to rref(C), we can see that all columns are linearly independent, so
C is invertible. To find C −1 , we first attach I3 to the right of C:
1 6 −3 1 0 0
−2 0 −6 0 1 0 .
3 9 1 0 0 1
134
Next, we do some row operations to reduce C to rref(C). Adding 2 times row 1 to
row 2,
1 6 −3 1 0 0 1 6 −3 1 0 0
−2 0 −6 0 1 0 → 0 12 −12 2 1 0 .
3 9 1 0 0 1 3 9 1 0 0 1
Adding −3 times row 1 to row 3,
1 6 −3 1 0 0 1 6 −3 1 0 0
0 12 −12 2 1 0 → 0 12 −12 2 1 0 .
3 9 1 0 0 1 0 −9 10 −3 0 1
1
Multiplying row 2 by 12 ,
1 6 −3 1 0 0 1 6 −3 1 0 0
0 12 −12 2 1 1
1 0 → 0 1 −1 6 12 0 .
0 −9 10 −3 0 1 0 −9 10 −3 0 1
− 21
1 6 −3 1 0 0 1 0 3 0 0
0 1 −1 1 1
0 → 0 1 −1 1 1
0 .
6 12 6 12
0 −9 10 −3 0 1 0 −9 10 −3 0 1
0 − 12 − 12
1 0 3 0 1 0 3 0 0
0 1 −1 1 1 1 1
6 12 0 → 0 1 −1 6 12 0 .
0 −9 10 −3 0 1 0 0 1 − 32 3
4 1
− 12 − 12
1 0 3 0 0 1 0 3 0 0
1 1
0 1 −1
6 12 0 → 0 1 0 − 34 5
6 1 .
0 0 1 − 32 3
4 1 0 0 1 − 23 3
4 1
− 21 9
− 11
1 0 3 0 0 1 0 0 2 4 −3
0 1 0 − 43 5
6 1 → 0 1 0 − 43 5
6 1 .
0 0 1 − 32 3
4 1 0 0 1 − 32 3
4 1
Therefore,
9
− 11
2 4 −3
C −1 = − 4
3
5
6 1 .
− 23 3
4 1
We can check that
9
− 11
2 4 −3 1 6 −3 1 0 0
− 4 5
1 −2 0 −6 = 0 1 0 .
3 6
− 32 3
4 1 3 9 1 0 0 1
135
−2 3 −6
Example 11.4: Let D = 10 −9 18 . Is D invertible? If so, find D−1 .
8 −6 12
In example 9.2, we found that
1 0 0
rref(D) = 0 1 −2 .
0 0 0
Since there are only two pivot columns, not all columns of D are linearly indepen-
dent, so D is not invertible.
1 0 0 0
0 2 0 0
Example 11.5: Let E = . Is E invertible? If so, find E −1 .
0 0 3 0
0 0 0 5
By reducing E to rref(E), we can see that all columns are linearly independent, so
E is invertible. To find E −1 , we first attach I4 to the right of E:
1 0 0 0 1 0 0 0
0 2 0 0 0 1 0 0
0 0 3 0 0 0 1 0 .
0 0 0 5 0 0 0 1
Next, we do some row operations to reduce E to rref(E). Multiplying row 2 by 21 ,
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
0
2 0 0 0 1 0 0 →
0 1 0 0 0 12 0 0 .
0 0 3 0 0 0 1 0 0 0 3 0 0 0 1 0
0 0 0 5 0 0 0 1 0 0 0 5 0 0 0 1
Multiplying row 3 by 31 ,
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 1
0
1 0 0 0 2 0 0
→ 0 1 0 0 0 2 0 0
.
0 1
0 3 0 0 0 1 0 0 0 1 0 0 0 3 0
0 0 0 5 0 0 0 1 0 0 0 5 0 0 0 1
Multiplying row 4 by 51 ,
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 0 1 0 0 1
0
1 0 0 0 2 0 0
→ 0 2 0 0
.
1 0 0 1 0 1
0 0 1 0 0 0 3 0 0 0 3 0
1
0 0 0 5 0 0 0 1 0 0 0 1 0 0 0 5
Therefore,
1 0 0 0
0 1 0 0
E −1 = 2
0 0 1 0 .
3
0 0 0 15
We can see that finding the inverse of an invertible diagonal matrix is very simple:
we just need to take the reciprocal of each main diagonal element.
136
Exercises
2 −2 6
1. Let A1 = 6 3 0 . Is A1 invertible? If so, find A−1
1 .
12 −12 15
−5 7
2. Let A2 = . Is A2 invertible? If so, find A−1
2 .
15 −21
2 3
3. Let A3 = . Is A3 invertible? If so, find A−1
3 .
−1 −2
−3 2 1
4. Let A4 = 5 −1 3 . Is A4 invertible? If so, find A−1
4 .
6 −4 −2
5. For any invertible matrices A and B, show that A−1 (B − A)B −1 = A−1 − B −1 .
137
Chapter 12
Determinant of Matrices
138
12.1 Properties of Determinant
In this chapter, we will learn about the determinant of a square matrix. The de-
terminant of a square matrix A is denoted as det(A). The determinant is, in some
sense, a function that assigns a number to each matrix, and it is uniquely defined by
some properties. First, we will learn those properties of determinant in this section.
For example,
1 0 0
1 0
det = det 0 1 0 = 1.
0 1
0 0 1
f (x + x0 ) = f (x) + f (x0 )
and
f (nx) = nf (x)
for any number n. In high school algebra, we learned about the linear functions of
the form
f (x) = mx + b.
We can check that mx + b satisfies the conditions f (x + x0 ) = f (x) + f (x0 ) and
f (nx) = nf (x). For the determinant of a matrix, this linearity occurs in each row.
For example, we have
a + a0 b + b0
0 0
a b a b
det = det + det
c d c d c d
and
a b a b
det = n det .
nc nd c d
Switching two rows of a matrix once changes the sign of the determinant of the
matrix. So, switching rows an even number of times would keep the determinant
the same because (−1)n = 1 if n is even. For example, we have
a b c d
det = −det .
c d a b
These three properties define what a determinant is. However, from these proper-
ties, we can derive more important properties of determinant.
139
Property 4: Determinant of a diagonal matrix is the product of all elements
on the main diagonal.
Let
d1 0 ··· 0
0 d2 ··· 0
D=. .. .
.. ..
.. . . .
0 0 ··· dn
By property 2, the determinant is linear in each row, so we can factor out d1 from
the first row, d2 from the second row,..., and dn from the n-th row:
d1 0 · · · 0 1 0 ··· 0
0 d2 · · · 0 0 1 · · · 0
det . = d1 d2 · · · dn det . . .
. . . . . ...
.. .. .. .. .. ..
0 0 · · · dn 0 0 ··· 1
= d1 d2 · · · dn det(In ).
Property 5: If a matrix has two or more same rows, then the determinant is 0.
If a matrix A has two same rows, then switching those two rows would still give
the same matrix A. By property 3, we know that switching two rows changes the
sign of the determinant, so we have det(A) = −det(A), which means det(A) = 0.
Property 6: Adding a multiple of another row to a row does not change the
determinant.
A= ~a2T .
|
~a3T
|
~a2T .
|
k~a2T + ~a3T
|
140
Applying linearity in the third row, we have
~a1T ~a1T ~a1T
|
det ~a2T = det ~a2T + det ~a2T
|
T T
k~a2 + ~a3 k~a2T ~a3T
|
|
~a1T ~a1T
|
= k det ~a2T + det ~a2T .
|
~a2T ~a3T
|
By property 5, we know that
~a1T
|
|
det ~a2T = 0
|
|
~a2T
|
|
because there are two same rows, so
~a1T ~a1T
|
|
det ~a2T = det ~a2T .
|
|
So, we can see how adding a multiple of another row to a row does not change the
determinant.
Property 7: If there is at least one 0 row in the matrix, then the determinant is 0.
Since 0 = n · 0 for any number n, applying linearity in the first row gives
0 0 n·0 n·0
det = det
c d c d
0 0
= n det .
c d
Property 8: If not all rows are linearly independent, then the determinant is
0.
As explained in section 9.2 of chapter 9, if there are some linearly dependent rows,
then they will be cancelled out and become 0 rows after doing some row opera-
tions, specifically by adding multiples of some rows to other rows. By property
141
6, adding a multiple of another row to a row does not change the determinant.
So, the determinant of a matrix with linearly dependent rows is the same as the
determinant of a matrix with 0 rows. By property 7, the determinant of a matrix
with 0 rows is 0. So, the determinant of a matrix with linearly dependent rows is 0.
Property 9: For a square matrix A, det AT = det(A).
We can see why det AT = 0 when det (A) = 0. By property 8, when not all
rows of A are linearly independent, det (A) = 0. A square matrix A has the same
number of rows and columns, and the number of linearly independent rows is the
same as the number of linearly independent columns. So, if not all rows of A are
linearly independent, then not all columns of A are linearly independent. Since
columns of A are rows of AT , that means not all rows of AT are linearly indepen-
dent. So, det AT = 0 by property 8.
It is harder to see why det AT = det(A) when det(A) is non-zero. We can easily
see that the formula is true when A is a diagonal matrix. A diagonal matrix is also
symmetric, so AT = A, and det AT = det(A). If all columns (or rows) of A are
linearly independent, then A can be reduced to a diagonal matrix by doing some
row operations, and there are determinant properties corresponding to the
row op-
erations. So, that gives an intuitive sense for why the formula det AT = det(A)
should be true. However, the precise proof of the formula is quite complicated, so
we will not prove it here.
Now note that by applying the previous properties to the rows of AT , we get
similar properties for columns of A because rows of AT are columns of A.
Property 11: Switching two columns changes the sign of the determinant.
Property 12: If a matrix has two or more same columns, then the determinant is 0.
Property 13: Adding a multiple of another column to a column does not change
the determinant.
Property 14: If there is at least one 0 column in the matrix, then the deter-
minant is 0.
Property 15: If not all columns are linearly independent, then the determinant
is 0.
In section 11.1 of chapter 11, we learned that A is not invertible when not all
columns of A are linearly independent. Here, we know that det(A) = 0 when not
all columns of A are linearly independent. So, A is a singular matrix if det(A) = 0.
142
We can check that this is true when det(A) = 0 or det(B) = 0, which is when
det(A)det(B) = 0. Say det(A) = 0. By property 15, det(A) = 0 when not all
columns of A are linearly independent. By using method 1 of matrix multiplica-
tion, which is multiplying A to each column of B to obtain each column of the
product AB, we can see that the columns of AB are some linear combinations of
the columns of A. So, if not all columns of A are linearly independent, then not all
columns of AB are linearly independent. By property 15, det(AB) = 0. Similarly,
we can check that det(AB) = 0 when det(B) = 0.
When det(A) 6= 0 and det(B) 6= 0, we can have an intuitive sense for why the
formula det(AB) = det(A)det(B) should
be true just like how we had an intuitive
sense for why the formula det AT = det(A) should be true. First, we can check
that the formula det(AB) = det(A)det(B) is true when A is a diagonal matrix.
Let us use 3 × 3 matrices as an example: let
d1 0 0
A = 0 d2 0
0 0 d3
and
~b T
|
1
B=
~b T .
|
2
~b T
|
AB = 0 d2 0 ~b2T = d2~b2T ,
|
0 0 d3 ~b T d3~b T
|
3 3
so
d1~b1T
|
2
~
d3 b3T
|
Since the determinant is linear in each row, we can factor out d1 from the first row,
d2 from the second row, and d3 from the third row:
d1~b1T ~b T
|
1
det d2~b2T = d1 d2 d3 det ~b2T = d1 d2 d3 det(B).
|
d3~b T ~b T
|
3 3
Since
d1 0 0
det(A) = det 0 d2 0 = d1 d2 d3
0 0 d3
143
by property 4, we indeed have det(AB) = det(A)det(B). In general,
~b T
d1 0 · · · 0
|
1
~ T
0 d2 · · · 0 b
|
2
det
.. .. . . .. .
. . . .
..
0 0 · · · dn ~b T
|
n
d1~b1 T
|
d2~b2T
|
|
=det
..
.
~
dn bn T
|
|
~b T
|
|
1
~ T
b2
|
|
=d1 d2 · · · dn det
..
.
~b T
|
n
~b T
d1 0 · · · 0 |
|
1
0 d2 · · · 0 ~b T
|
|
2
=det . .. det .
.. . . ..
.. . . .
.
0 0 · · · dn ~b T
|
144
and
0 a12 0 a12 0 a12
det = det + det .
a21 a22 a21 0 0 a22
By property 14,
a11 0 0 a12
det = det =0
a21 0 0 a22
because there is a 0 column, so
a11 0 a11 0
det = det , (12.2)
a21 a22 0 a22
and
0 a12 0 a12
det = det . (12.3)
a21 a22 a21 0
Substituting (12.2) and (12.3) into (12.1),
a11 a12 a11 0 0 a12
det = det + det .
a21 a22 0 a22 a21 0
By property 4,
a11 0
det = a11 a22 .
0 a22
By property 3 and property 4,
0 a12 a21 0
det = −det = −a21 a12 .
a21 0 0 a12
Therefore,
a11 a12
det = a11 a22 − a21 a12 .
a21 a22
Note that when we decompose a determinant into smaller determinant matrices
using linearity in each row, non-zero determinants occurred when each row and
each column only has one element of the original matrix, such as
a11 0
det
0 a22
and
0 a12
det .
a21 0
When some column or row has more than one element from the original matrix,
such as
a11 0
det
a21 0
and
0 a12
det ,
0 a22
there will be some 0 column or row, which makes the determinant 0. In the next
section, we will derive the determinant formula for 3 × 3 matrices in a similar
manner.
145
12.3 Determinant Formula for 3 x 3 Matrices
Remember that when we decompose a determinant into smaller determinant ma-
trices using linearity in each row, non-zero determinants occur only when each row
and each column only has one element of the original matrix. So, we just need to
care about the determinant matrices whose columns and rows each has only one
element:
a11 a12 a13
det a21 a22 a23
a31 a32 a33
a11 0 0 a11 0 0
=det 0 a22 0 + det 0 0 a23
0 0 a33 0 a32 0
0 a12 0 0 a12 0
+ det a21 0 0 + det 0 0 a23
0 0 a33 a31 0 0
0 0 a13 0 0 a13
+ det 0 a22 0 + det a21 0 0 . (12.4)
a31 0 0 0 a32 0
By property 4,
a11 0 0
det 0 a22 0 = a11 a22 a33 . (12.5)
0 0 a33
By property 3 and property 4, switching row 2 and row 3 gives
a11 0 0 a11 0 0
det 0 0 a23 = −det 0 a32 0 = −a11 a23 a32 . (12.6)
0 a32 0 0 0 a23
By property 3 and property 4, switching row 1 and row 2 and then switching row
1 and row 3 give
0 a12 0 a31 0 0
det 0 0 a23 = det 0 a12 0 = a12 a23 a31 . (12.8)
a31 0 0 0 0 a23
146
By property 3 and property 4, switching row 1 and row 3 gives
0 0 a13 a31 0 0
det 0 a22 0 = −det 0 a22 0 = −a13 a22 a31 . (12.9)
a31 0 0 0 0 a13
By property 3 and property 4, switching row 2 and row 3 and then switching row
1 and row 3 give
0 0 a13 a21 0 0
det a21 0 0 = det 0 a32 0 = a13 a21 a32 . (12.10)
0 a32 0 0 0 a13
Substituting (12.5), (12.6), (12.7), (12.8), (12.9), and (12.10) into (12.4),
a11 a12 a13
det a21 a22 a23 =a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33
− a13 a22 a31 − a12 a21 a33 − a11 a23 a32 .
For higher order matrices such as 4 × 4 matrices or 5 × 5 matrices, we can find the
determinants in a similar manner by breaking them down into smaller determinant
matrices using linearity of determinant in each row, or we can also do some row
operations to make the matrix simpler using the determinant properties related to
row operations.
147
Exercises
1. For a square matrix A, what is det A3 in terms of det(A)?
1 5
4. det =?
−2 7
8 3
5. det =?
6 1
2 −3 1
6. det 0 5 −6 =?
−1 −2 0
2 0 0 0
6 −5 0 0
7. det
−8
=? (Hint: do row operations)
1 3 0
9 20 −12 10
−1 −3 2 8
0 6 5 −3
8. det =? (Hint: do row operations)
0 0 −7 −2
0 0 0 3
148
Chapter 13
149
13.1 Systems of Linear Equations as Matrix Lin-
ear Equations
3x + 5y − 2z
=8
x − 6y + 2z = −5 . (13.1)
7x + y + 9z = 11
The system of linear equations in (13.1) has three equations and three unknown
variables. In high school algebra, we mostly learned about systems where the
number of equations is the same as the number of unknowns. However, this is not
always necessarily the case. For example, we can have a system of equations like
5x − 8y
=3
x + 2y = −3 . (13.2)
7x − 5y = −2
Any system of linear equations can be written as a matrix linear equation of the
form
A~x = ~b,
where A is the matrix containing all the coefficients on the left-hand side of the
system, ~x is the column vector containing all the unknown variables, and ~b is the
column vector containing all the numbers on the right-hand side of the system. For
example, the system of equations in (13.1) can be written as
3x + 5y − 2z 8
x − 6y + 2z = −5
7x + y + 9z 11
3x 5y −2z 8
x + −6y + 2z = −5
7x y 9z 11
3 5 −2 8
x 1 + y −6 + z 2 = −5
7 1 9 11
3 5 −2 x 8
1 −6 2 y = −5 ,
7 1 9 z 11
150
and the system of equations in (13.2) can be written as
5x − 8y 3
x + 2y = −3
7x − 5y −2
5x −8y 3
x + 2y = −3
7x −5y −2
5 −8 3
x 1 + y 2 = −3
7 −5 −2
5 −8 3
1 2 x = −3 .
y
7 −5 −2
In this chapter, we will learn about systems of linear equations more generally as
matrix linear equations.
A~x = ~b (13.3)
has or does not have a solution. Remember from chapter 7 that multiplying a ma-
trix A by a column vector gives some linear combination of columns of A. So, the
equation (13.3) has at least one solution when ~b is a linear combination of columns
of A. Then, remember from chapter 10 that the column space of A is the space of
all linear combinations of columns of A. So, the equation (13.3) has at least one
solution when ~b is in the column space of A.
Now what condition would guarantee that the vector ~b would always be in the
column space of A? That is, what condition would guarantee that the equation
(13.3) would definitely have at least one solution? The equation (13.3) would defi-
nitely have at least one solution when the rank of A is equal to the number of rows
of A.
151
is a m-dimensional vector space. Thus, if the rank of A is equal to the number of
rows of A, then the column space of A would be a m-dimensional subspace of Rm ,
which is the whole Rm space itself. Since the column space of A is the whole Rm
space, any m-dimensional vector ~b would always be in the column space of A, and
the equation (13.3) would definitely have at least one solution.
On the other hand, if the rank of A is less than the number of rows of A, then the
equation (13.3) might or might not have solution depending on whether ~b is in the
column space of A.
When the equation (13.3) has solution, it could have a unique solution or infinitely
many solutions. Let us discuss when it has a unique solution or infinitely many
solutions.
The equation (13.3) can have infinitely many solutions when there are non-zero
vectors in the null space of A. Say there is a vector ~x = ~vp that satisfies equation
(13.3):
A~vp = ~b.
Let ~vn be a non-zero vector in the null space of A. Then, ~x = ~vp + ~vn is another
vector which satisfies equation (13.3) because
As we saw in chapter 10, if there are some non-zero vectors in the null space of A,
then their span is also in the null space of A. So, there would be infinitely many
non-zero vectors in the null space of A, and we can have infinitely many solutions
for the equation (13.3).
As we learned in chapter 10, there are non-zero vectors in the null space of A
when not all columns of A are linearly independent, that is when the rank of A is
less than the number of columns of A. So, the equation (13.3) can have infinitely
many solutions if the rank of A is less than the number of columns of A. On the
other hand, the equation (13.3) can only have at most one solution if the rank of
A is equal to the number of columns of A.
A~x = ~b
152
by using rref(A). We can obtain rref(A) by doing some row operations to A, which
is the same as multiplying some elementary matrices to the left of A. Then, to keep
the equation the same, we need to multiply the same elementary matrices to the
left of ~b on the right-hand side as well, so we need to do the same row operations
to ~b. (We can think of a m-dimensional vector as a m × 1 matrix.)
So, to solve a matrix linear equation, we first attach the vector ~b to the right
of A, and then we do some row operations to reduce A to rref(A). At the same
time, the vector ~b will become some new vector ~b0 . Then, we will solve the equation
rref(A)~x = ~b0 ,
1 12 7
2 1 7
→ 2 .
6 −3 3 6 −3 3
1 12 7 1 7
1
2 → 2 2 .
6 −3 3 0 −6 −18
Multiplying row 2 by − 16 ,
1 7 1 7
1 1
2 2 → 2 2 .
0 −6 −18 0 1 3
153
Adding − 21 times row 2 to row 1,
1 7
1 1 0 2
2 2 → .
0 1 3 0 1 3
which gives
x 2
=
y 3
when we do the multiplication on the left-hand side. This equation has a unique
solution.
−2 3 −6 x 8
Example 13.2: Solve 10 −9 18 y = −2.
8 −6 12 z 5
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:
−2 3 −6 8
10 −9 18 −2 .
8 −6 12 5
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Multiplying row 1 by − 12 ,
1 − 32
−2 3 −6 8 3 −4
10 −9 18 −2 → 10 −9
18 −2 .
8 −6 12 5 8 −6 12 5
1 − 32 − 32
3 −4 1 3 −4
10 −9 18 −2 → 0 6 −12 38 .
8 −6 12 5 8 −6 12 5
1 − 32 − 32
3 −4 1 3 −4
0 6 −12 38 → 0 6 −12 38 .
8 −6 12 5 0 6 −12 37
Multiplying row 2 by 61 ,
− 32 − 32
1 3 −4 1 3 −4
19
0 6 −12 38 → 0 1 −2 3
.
0 6 −12 37 0 6 −12 37
154
3
Adding 2 times row 2 to row 1,
− 32 11
1 3 −4 1 0 0 2
0 1 −2 193
→ 0 1 −2 19
3
.
0 6 −12 37 0 6 −12 37
Adding −6 times row 2 to row 3,
11 11
1 0 0 2 1 0 0 2
19 19
0 1 −2 3
→ 0 1 −2 3
.
0 6 −12 37 0 0 0 −1
So, the original equation is equivalent to
11
1 0 0 x 2
0 1 −2 y = 19 ,
3
0 0 0 z −1
which gives 11
x 2
y − 2z = 19
3
0 −1
when we do the multiplication on the left-hand side. Therefore, this equation has
no solution because 0 6= −1.
x
3 9 6 9
Example 13.3: Solve y = .
5 15 7 12
z
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:
3 9 6 9
.
5 15 7 12
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Multiplying row 1 by 13 ,
3 9 6 9 1 3 2 3
→ .
5 15 7 12 5 15 7 12
Adding −5 times row 1 to row 2,
1 3 2 3 1 3 2 3
→ .
5 15 7 12 0 0 −3 −3
Multiplying row 2 by − 13 ,
1 3 2 3 1 3 2 3
→ .
0 0 −3 −3 0 0 1 1
Adding −2 times row 2 to row 1,
1 3 2 3 1 3 0 1
→ .
0 0 1 1 0 0 1 1
155
So, the original equation is equivalent to
x
1 3 0 1
y = ,
0 0 1 1
z
which gives
x + 3y 1
=
z 1
when we do the multiplication on the left-hand side. So, we have
x + 3y = 1 ⇔ x = 1 − 3y
and
z = 1.
Therefore, the solutions are
x 1 − 3y 1 −3
y = y = 0 + y 1
z 1 1 0
for any real numbers y. This equation has infinitely many solutions.
1 −1 6
x
Example 13.4: Solve 3 5 = 10 .
y
−2 6 −16
First, we attach the vector on the right-hand side to the matrix on the left-hand
side:
1 −1 6
3 5 10 .
−2 6 −16
Then, we do some row operations to reduce the matrix to its reduced row echelon
form. Adding −3 times row 1 to row 2,
1 −1 6 1 −1 6
3 5 10 → 0 8 −8 .
−2 6 −16 −2 6 −16
Multiplying row 2 by 81 ,
1 −1 6 1 −1 6
0 8 −8 → 0 1 −1 .
0 4 −4 0 4 −4
156
Adding row 2 to row 1,
1 −1 6 1 0 5
0 1 −1 → 0 1 −1 .
0 4 −4 0 4 −4
which gives
x 5
y = −1
0 0
when we do the multiplication on the left-hand side. Therefore, the solution is
x 5
= .
y −1
157
Multiplying row 3 by − 17 ,
1 5 3 6 6 1 5 3 6 6
0 0 0 0 0 → 0 0 0 0 0 .
0 0 −7 −21 −7 0 0 1 3 1
which gives
x + 5y − 3w 3
z + 3w = 1
0 0
when we do the multiplication on the left-hand side. So, we have
x + 5y − 3w = 3 ⇔ x = 3 − 5y + 3w
and
z + 3w = 1 ⇔ z = 1 − 3w.
Therefore, the solutions are
x 3 − 5y + 3w 3 −5 3
y y 0 1 0
=
z 1 − 3w = 1 + y 0 + w −3
w w 0 0 1
for any real numbers y and w. This equation has infinitely many solutions.
158
Exercises
2 −2 6 x −8
1. Solve 6 3 0 y = 12 .
12 −12 15 z −27
x
−3 2 9 1
2. Solve y = .
6 −4 −18 2
z
x
5 −3 2 −1 −7
y
3. Solve −2 6 −5 1
z
= 13 .
7 −1 3 2 0
w
−6 3 5
x
4. Solve 1 −2 = −3.
y
5 1 −2
159
Appendices
160
Appendix A
Cauchy-Schwarz Inequality
This should be intuitive because the sum of larger numbers must be larger than
the sum of smaller numbers.
(x − y)2 ≥ 0
x2 + y 2 − 2xy ≥ 0
x2 + y 2 ≥ 2xy. (A.1)
Let
a1
x1 = p
a21 + a22
and
b1
y1 = p
b21 + b22
162
for the inequality (A.1), we get
a21 b2 2a1 b1
+ 2 1 2 ≥p 2 . (A.2)
a21 + a22 b1 + b2 (a1 + a22 ) (b21 + b22 )
Let
a2
x2 = p
a21 + a22
and
b2
y2 = p
b21 + b22
a22 b2 2a2 b2
+ 2 2 2 ≥p 2 . (A.3)
a21 + a22 b1 + b2 (a1 + a22 ) (b21 + b22 )
2
a21 + a22 b21 + b22 ≥ (a1 b1 + a2 b2 ) .
Similarly, the general case for any natural number n can be proven by letting
a1
x1 = p 2 ,
a1 + a22 + · · · + a2n
a2
x2 = p 2 2
,
a1 + a2 + · · · + a2n
..
.
an
xn = p 2 2
,
a1 + a2 + · · · + a2n
163
and
b1
y1 = p ,
b21 + b22 + · · · + b2n
b2
y2 = p ,
b21 + b22 + · · · + b2n
..
.
bn
yn = p .
b21 + b22 + · · · + b2n
There is also a simpler proof using dot product. In chapter 5, we learned that
Because cos2 θ ≤ 1,
2 2 2
(~u · ~v ) ≤ (||~u||2 ) (||~v ||2 ) .
Let
a1
a2
~u = .
..
an
and
b1
b2
~v = . ,
..
bn
we obtain the Cauchy-Schwarz inequality
2
(a1 b1 + a2 b2 + · · · + an bn ) ≤ a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n .
164
Appendix B
Before discussing the circle in Taxicab geometry, let us talk about the circle in
Euclidean geometry first. We know how a circle looks.
However, a circle only looks this way in Euclidean geometry. How a circle looks
differs depending on which type of geometry it is in, but the definition of a circle
is always the same.
Definition of circle: A circle is a set of all points equidistant from the center.
In figure B.1, we can see that the Euclidean distances between the center and
all points on the circle are the same since the Euclidean distance between two
points is defined to be the length of the line segment connecting those points.
r r
r
165
However, in Taxicab geometry, the definition of distance changes, so the shape of a
circle changes. In Taxicab geometry, a circle is a set of all points having the same
L1 distance from the center. That is why a circle in Taxicab geometry has the
shape shown in figure B.2.
We can think about this graphically as well. Let us set the center at (0, 0). The
Euclidean distance between a point (x, y) and the origin is
p
x2 + y 2 .
A circle with radius r in Euclidean geometry is a set of all points (x, y) whose
Euclidean distance from the center (0, 0) is r.
p So, to graph the circle with radius r
in Euclidean geometry, we graph the curve x2 + y 2 = r or x2 + y 2 = r2 .
y
r
x
−r r
−r
In Taxicab geometry, the L1 distance between a point (x, y) and the origin is
|x| + |y|.
A circle with radius r in Taxicab geometry is a set of all points (x, y) whose L1
distance from the center (0, 0) is r. So, to graph the circle with radius r in Taxicab
geometry, we graph the curve |x| + |y| = r.
y
r
x
−r r
−r
166
Appendix C
In chapter 1, we learned that vectors are arrows with directions. That means vec-
tors are one-dimensional oriented objects. They are one-dimensional because they
are arrows; they are oriented because they have directions (or orientations).
~u ∧ ~v ~v
~u
We can have an oriented parallelogram as in figure C.2, and the arc with an arrow
at the end indicates the orientation of that parallelogram. Such two-dimensional
oriented object is called a 2-blade or a bivector. (A vector is a 1-blade because it
is a one-dimensional oriented object.)
167
The symbol “∧” is the wedge product, which is a type of product used in an
advanced field of algebra in mathematics called exterior algebra. It is just like how
in linear algebra we learned a type of product called dot product (in chapter 5).
The wedge product of any two vectors gives a 2-blade.
~u
~v ~v ∧ ~u
Other than viewing a vector as an arrow, we can also view a vector as a list of
numbers such as
3
5 .
−10
Then, in chapter 6, we learned that a matrix is a block of numbers with two di-
mensions: rows and columns.
168
So, we can think of vectors as one-dimensional collections of numbers and matrices
as two-dimensional collections of numbers. Then, do we have three-dimensional or
any n-dimensional collections of numbers? The answer is, again, yes! For example,
we can have a box of numbers with three dimensions.
Such box of numbers with three dimensions is called a rank-3 tensor. In this sense,
a vector is a rank-1 tensor, and a matrix is a rank-2 tensor. In general, a n-
dimensional collection of numbers is a rank-n tensor.
The concepts of blades and tensors are some really advanced concepts in the al-
gebra field of mathematics, so of course they are way beyond the scope of this
book. The purpose of this appendix is to give the readers a general idea that the
concepts in linear algebra we learn in this book can be extended and that algebra
in mathematics is way more than just high school algebra.
169
Solutions to Exercise
Problems
Chapter 1
1. Vectors: A, B, D, F; Scalars: C, E
2.
y-axis
x-axis
1 2 3
(2, −1)
−1
−2
−3
3.
y-axis
(−3, 4)
4
x-axis
−4 −3 −2 −1
171
4.
y-axis
x-axis
−3 −2 −1
−1
−2
−3
(−2, −3)
8. 5 dimensions
Chapter 2
−5
1. 5
−3
2
1
2.
−1
−5
16
3. 30
−12
−4
12
−16
4.
0
−8
8
3
5.
11
172
0
6. −8
6
7. Let
v1
v2
~v = . .
..
vn
Then,
v1 (a + b) · v1 a · v1 + b · v 1
v2 (a + b) · v2 a · v2 + b · v2
(a + b) · ~v =(a + b) · . = =
.. ..
..
. .
vn (a + b) · vn a · vn + b · v n
a · v1 b · v1 v1 v1
a · v2 b · v2 v2 v2
= . + . = a · . + b · . = a~v + b~v .
.. .. .. ..
a · vn b · vn vn vn
Chapter 3
−5 1 4 0
1. 0 = 3 4 − 2 6 + 3 0
8 3 2 1
1 5 2 1
5 2
1 1
= − 2 , so ∈ span , 0 .
0 1 1
2.
0 −2 −1 0 −2 −1
−2 4 3 −2 4 3
3. There can
only be
at
mostthree
linearly independent three-dimensional vec-
5 −2 6 −1
tors, so 1, 3 , 0, and −3 are linearly dependent.
4 8 9 7
5. 5
6. It is not a set of basis vectors for R2 . Basis vectors have to be linearly in-
173
dependent, but
−3 3 −2
= .
9 2 6
8. The first two vectors are linearly independent, and the third vector is lin-
early dependent with respect to the first two. It is the span of two linearly in-
dependent vectors, so the dimension of the vector space is two. It is the span of
four-dimensional vectors, so it is a subspace of R4 .
Chapter 4
1. ||~u||1 = 10
2. ||~v ||2 = 13
√
1/ √6
w
~ 1
3. ŵ = =√ w ~ = −2/√6
||w||
~ 2 6 −1/ 6
4. √ √ √
a. ||~u||2 = 10, ||~v ||2 = 21, ||~u + ~v ||2 = 11
√ √ √
b. 11 < 10 + 21
12
5.
−9 2
7
6. −1
10
2
7.
a. | − 15| + |9| = 24 cm
−15
b.
9 1
Chapter 5
1. ~v T = 2 5 0 −8
174
2. Let
u1
u2
~u = .
..
un
and
v1
v2
~v = . .
..
vn
Then,
T T
u1 v1 u1 + v1
u2 v2 u2 + v2
(~u + ~v )T = . + . =
..
.. ..
.
un vn un + vn
= u1 + v1 u2 + v2 · · · un + vn
vn = ~u T + ~v T .
= u1 u2 · · · un + v1 v2 · · ·
3. 12
4. 10
5. −9
2
6. arccos √ ≈ 26.57◦
5
9
7. arccos √ ≈ 76.56◦
10 15
8. They are orthogonal and orthonormal.
Chapter 6
1. 3 × 4
2. C.
175
4.
a. The elements on the main diagonal are 2, 0, −9.
b. Tr(B) = −7
3 8 1
5. C T = 0 −5 6
−7 2 9
6. When we take the transpose, the columns become rows. Then, when we take
the transpose again, the rows become columns. So, (AT )T = A.
7.
A. Upper-triangular matrix
C. Lower-triangular matrix
D. Symmetric matrix
8. Since
1 2 1 0 2 0
−1 · 1 = −1 · −1 = 1 · −1 = 0,
−1 1 −1 1 1 1
the three vectors are orthogonal to each other. An orthogonal matrix is a matrix
whose columns are orthonormal vectors, so we need to normalize the vectors:
√ √
1/ √3 2/√6 0√
−1/ 3 , 1/ 6 , −1/ 2 .
√ √ √
−1/ 3 1/ 6 1/ 2
Using these three orthonormal column vectors, we can have a 3 × 3 orthogonal
matrix such as √ √
1/ √3 2/√6 0√
−1/ 3 1/ 6 −1/ 2 .
√ √ √
−1/ 3 1/ 6 1/ 2
The order of the column vectors does not matter, so we can have many more
orthogonal matrices from these column vectors as well.
Chapter 7
−2 14
1.
19 25
176
−1 −2
2. −5 10
4 −77
19
3.
0
4. −2 10 22
−25
5. −73
3
a11 a12 ··· a1n 0 a11 a12 a1n 0
a21 a22 ··· a2n 0 a21 a22 a2n 0
6. . .. .. = 0 .. + 0 .. + · · · + 0 .. = ..
.. ..
.. . . . . . . . .
am1 am2 ··· amn 0 am1 am2 amn 0
4 −6 −2 6
7. −2 18 −4 52
4 −30 6 −82
12 −2 6
8.
6 −10 0
8 8 −4
9. −1 −3 2
3 −16 −6
177
Similar pattern continues, and the n-th column of the product is
c1 0 · · · 0 0 c1 0 0 0
0 c2 · · · 0 0 0 c2 0 0
. . = 0 .. + 0 .. + · · · + dn .. = .
.. .. . . ..
. . . .. .. . . . .
0 0 ··· cn dn 0 0 cn cn dn
Thus,
c1 0 ··· 0 d1 0 ··· 0 c1 d1 0 ··· 0
0 c2 ··· 0 0 d2 ··· 0 0 c2 d 2 ··· 0
.. = .. .
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
0 0 ··· cn 0 0 ··· dn 0 0 ··· cn dn
Chapter 8
1.
−1 5 2
−2 8 −3
a)
7 −9 10
3 0 6
1 0 0 0
0 0 0 1
b) EA =
0
0 1 0
0 1 0 0
2.
6 −4 2 0
a)
5 −3 6 2
2 0
b) EB =
0 1
3.
3 5 0
a) 3 −5 6
1 2 −1
1 0 −3
b) EC = 0 1 0
0 0 1
178
4.
6 −8
a) 1 6
−2 7
1 0 0 0 0 1 0 0 1
b) ED = 0 1 1 0 1 0 = 1 1 0
0 0 1 1 0 0 1 0 0
Chapter 9
1.
1 0 0
a) rref(A1 ) = 0 1 0
0 0 1
b) Column 1, column 2, and column 3
c) 3
d) 3
e) 3
2.
− 23
1 −3
a) rref(A2 ) =
0 0 0
b) Column 1
c) 1
d) 1
e) 1
3.
1 0 0 0
a) rref(A3 ) = 0 1 0 1
0 0 1 1
b) Column 1, column 2, and column 3
c) 3
179
d) 3
e) 3
4.
1 0
a) rref(A4 ) = 0 1
0 0
b) Column 1 and column 2
c) 2
d) 2
e) 2
Chapter 10
1.
2 −2 6
a) span 6 , 3 , 0 ; 3 dimensions
12 −12 15
1 0 0
b) span 0 , 1 , 0 ; 3 dimensions
0 0 1
0
c) 0 ; 0 dimension
0
0
d) 0 ; 0 dimension
0
2.
−3
a) span ; 1 dimension
6
1
b) span − 23 ; 1 dimension
−3
180
2
3 3
c) span 1 , 0 ; 2 dimensions
0 1
2
d) span ; 1 dimension
1
3.
5 −3 2
a) span −2 , 6 , −5 ; 3 dimensions
7 −1 3
1 0 0
0 , , 0 ; 3 dimensions
1
b) span 0 0 1
0 1 1
0
−1
c) span ; 1 dimension
−1
1
0
d) 0 ; 0 dimension
0
Chapter 11
5 1 1
− 42 9 21
1. Invertible; A−1
5 1 2
1 = 21 9 − 21
2 1
7 0 − 21
2. Not invertible
2 3
3. Invertible; A−1
3 =
−1 −2
4. Not invertible
5.
= A−1 BB −1 − A−1 AB −1
= A−1 I − IB −1
= A−1 − B −1
181
6. By definition of matrix inverse, we need to have
(AB)−1 AB = AB(AB)−1 = I.
Chapter 12
3
1. det A3 = [det (A)] (Use property 16)
5. −10
6. −37
7. −300
8. 126
Chapter 13
x 1
1. y = 2
z −1
2. No solution
x 0 0
y 3 −1
3.
z = 1 + w −1
w 0 1
4. No solution
182
Acknowledgements
In this section, I would like to mention some people whom I would like to thank
for their help during the process of writing this book.
• Manny Monter
I also appreciate Andrzej Kukla, the designer of the cover of this book, for his
creative ideas for the design. Andrzej Kukla also has an Instagram page, a Twitter
account, and a YouTube channel about math called Mathinity.
183
Reviews
“This is a book for beginners in linear algebra. Duc Tran aims for simple
statements of the ideas and facts that he needs. He organizes them in a co-
herent way: first the main facts about vectors and then about matrices. For
a matrix, he discusses elimination to echelon form and then the rank and the
four fundamental subspaces. The result is a short book that a student can
read.”
“The book adopts a visual and largely informal approach to introduce linear
algebra. It reverses the old-school order which started from linear equations or
operation on matrices; instead, the author begins with the graphical represen-
tation of vectors, whilst the determinants and linear equations are postponed
to the last two chapters. He introduces the notion of norms at an early stage,
including the `1 or 'taxicab' norm, which is not only interesting but also in-
dispensable for advanced topics. The appendices contain rough sketches of
advanced topics such as tensors.
Dr. Wen-Wei Li
Professor of Mathematical Sciences, Peking University, China
“This introductory level textbook for basic linear algebra written by Duc Tran
is for those who wish to study linear algebra for the first time. It will be espe-
cially suitable for high school students or first year undergraduates when they
have to understand basic linear algebra but do not want to go to the heavy
184
mathematics course. The author provides plenty of graphs and graphical repre-
sentations of matrices with examples, exercises and their solutions. This book
reminds the reviewer some images like AP course, self-study, linear algebra for
biology or social science majors, and linear algebra for programmers.”
Dr. Sang Geun Han
Professor of Mathematical Sciences, Korea Advanced Institute of Science and
Technology (KAIST), South Korea
185