Linear Algebra
Linear Algebra
LINEAR ALGEBRA
An open text by Peter Selinger
Based on the original text by Lyryx Learning and Ken Kuttler
CONTRIBUTIONS
Ken Kuttler, Brigham Young University
LICENSE
Creative Commons License (CC BY): This text, including the art and illustrations, are available under
the Creative Commons license (CC BY), allowing anyone to reuse, revise, remix and redistribute the text.
To view a copy of this license, visit https://
reative
ommons.org/li
enses/by/4.0/
Revision history
Current revision: Dal 2018 A
This printing: version 487897b of January 10, 2022
Extensive edits, additions, and revisions have been completed by Peter Selinger and other contributors.
Prior extensive edits, additions, and revisions were made by the editorial staff at Lyryx Learning.
All new content (text and images) is released under the same license as noted above.
• P. Selinger, M.B. Langlois: Re-ordered chapters. Extensive revisions to Chapters 1–8. Added
Dal 2017 B new sections on fields, cryptography, geometric interpretation of linear transformations, recur-
rences, systems of linear differential equations.
• Lyryx: Front matter has been updated including cover, copyright, and revision pages.
2017 A • I. Farah: contributed edits and revisions, particularly the proofs in the Properties of Determinants
II: Some Important Proofs section.
• Lyryx: The text has been updated with the addition of subsections on Resistor Networks and the
2016 B Matrix Exponential based on original material by K. Kuttler.
• Lyryx: New example on Random Walks developed.
• Lyryx: The layout and appearance of the text has been updated, including the title page and newly
2016 A
designed back cover.
• Lyryx: The content was modified and adapted with the addition of new material and several
2015 A images throughout.
• Lyryx: Additional examples and proofs were added to existing material throughout.
• Original text by K. Kuttler of Brigham Young University. That version is used under
Creative Commons license CC BY (https://
reative
ommons.org/li
enses/by/3.0/)
2012 A
made possible by funding from The Saylor Foundation’s Open Textbook Challenge. See
Elementary Linear Algebra for more information and the original version.
Contents
Preface 1
2 Vectors in Rn 61
2.1 Points and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3 Scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Length of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.6 The dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.6.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.6.2 The Cauchy-Schwarz and triangle inequalities . . . . . . . . . . . . . . . . . . . . 81
2.6.3 The geometric significance of the dot product . . . . . . . . . . . . . . . . . . . . 82
2.6.4 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6.5 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.7 The cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.7.1 Right-handed systems of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.7.2 Geometric description of the cross product . . . . . . . . . . . . . . . . . . . . . 91
2.7.3 Algebraic definition of the cross product . . . . . . . . . . . . . . . . . . . . . . . 92
2.7.4 The box product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
v
vi CONTENTS
4 Matrices 121
4.1 Definition and equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3 Scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.1 Multiplying a matrix and a vector . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.4.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.4.3 Properties of matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.5 Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5.1 Definition and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.5.2 Computing inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.5.3 Using the inverse to solve a system of equations . . . . . . . . . . . . . . . . . . . 147
4.5.4 Properties of the inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.5.5 Right and left inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.6 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.6.1 Elementary matrices and row operations . . . . . . . . . . . . . . . . . . . . . . . 153
4.6.2 Inverses of elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.6.3 Elementary matrices and reduced echelon forms . . . . . . . . . . . . . . . . . . 156
4.6.4 Writing an invertible matrix as a product of elementary matrices . . . . . . . . . . 158
4.6.5 More properties of inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.7 The transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.8 Matrix arithmetic modulo p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.9 Application: Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7 Determinants 241
7.1 Determinants of 2 × 2- and 3 × 3-matrices . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.2 Minors and cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.3 The determinant of a triangular matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.4 Determinants and row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.5 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.6 Application: A formula for the inverse of a matrix . . . . . . . . . . . . . . . . . . . . . 258
7.7 Application: Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Index 541
Preface
Matrix Theory and Linear Algebra is an introduction to linear algebra for students in the first or second
year of university. The book contains enough material for a 2-semester course. Major topics of linear
algebra are presented in detail, and many applications are given. Although it is not a proof-oriented book,
proofs of most important theorems are provided.
Each section begins with a list of desired outcomes which a student should be able to achieve upon
completing the chapter. Throughout the text, examples and diagrams are given to reinforce ideas and
provide guidance on how to approach various problems. Students are encouraged to work through the
suggested exercises provided at the end of each section. Selected solutions to these exercises are given at
the end of the text.
Open text
This is an open text, licensed under the Creative Commons “CC BY 4.0” license. This means, among
other things, that you are permitted to copy and redistribute this textbook in any medium or format. For
example, you can download this textbook for free, print copies for yourself or others, or share it on the
internet.
The license also permits making changes. This is ideal for instructors who would like to add their
own material, change notations, or add more examples or exercises. If you make revisions, please
send them to me so that I can consider incorporating them in future versions of this book. Please see
https://
reative
ommons.org/li
enses/by/4.0/ for details of the licensing terms.
This textbook has a website at https://www.mathstat.dal.
a/~selinger/linear-algebra/ .
There, you can find the most up-to-date version. The website also contains supplementary material, a link
to the source code and license, options for purchasing a printed version of this book, and more.
Reporting typos
Like all books, this book likely contains some typos and other errors. However, since it is an open text,
typos can easily be fixed and an updated version posted online. It is my intention to fix all typos. If
you find a typo (no matter how small), please report it to me at [email protected]. Thanks to the
following people who have already reported typos: Yaser Alkayale, Hassaan Asif, Courtney Baumgartner,
Kieran Bhaskara, Serena Drouillard, Robert Earle, Warren Fisher, Esa Hannila, Melissa Huggan, Xiaoyu
Jia, Arman Kerimbek, Peter Lake, Marie-Andrée Langlois, Brenda Le, Sarah Li, Ian MacIntosh, Li Wei
Men, Deklan Mengering, Dallas Sawtell, Alain Schaerer, Yi Shu, Bruce Smith, Asmita Sodhi, Michael St
Denis, Daniele Turchetti, Liu Yuhao, and Ziqi Zhang.
1
1. Systems of linear equations
Outcomes
A. Relate the types of solution sets of a system of two (three) variables to the intersections of
lines in a plane (the intersection of planes in 3-dimensional space)
As you may remember, linear equations like 2x + 3y = 6 can be graphed as straight lines in the coordinate
plane. We say that this equation is in two variables, in this case x and y. Suppose you have two such
equations, each of which can be graphed as a straight line, and consider the resulting graph of two lines.
What would it mean if there exists a point of intersection between the two lines? This point, which lies
on both graphs, gives x and y values for which both equations are true. In other words, this point gives the
ordered pair (x, y) that satisfies both equations. If the point (x, y) is a point of intersection, we say that (x, y)
is a solution to the two equations. In linear algebra, we often are concerned with finding the solution(s)
to a system of equations, if such solutions exist. First, we consider graphical representations of solutions
and later we will consider the algebraic methods for finding solutions.
When looking for the intersection of two lines in the plane, several situations may arise. The follow-
ing picture demonstrates the possible situations when considering two equations (two lines in the plane)
involving two variables.
y y y
x x x
One solution No solutions Infinitely many solutions
In the first diagram, there is a unique point of intersection, which means that there is only one (unique)
solution to the two equations. In the second, there are no points of intersection and no solution. There is
no solution because the two lines are parallel and they never intersect. The third situation that can occur,
as demonstrated in diagram three, is that the two lines are really the same line. For example, x + y = 1
and 2x + 2y = 2 are two equations that yield the same line when graphed. In this case there are infinitely
many points that are solutions of these two equations, as every ordered pair which is on the graph of the
line satisfies both equations.
When considering linear systems of equations, there are always three possibilities for the number of
solutions: there is exactly one solution, there are infinitely many solutions, or there is no solution. When
3
4 Systems of linear equations
we speak of solving a system of equations, we usually mean finding all of its solutions. This can mean
finding one solution (if the solution is unique), finding infinitely many solutions, or finding that there is no
solution.
x+y = 3
y − x = 5.
Solution. Through graphing the above equations and identifying the point of intersection, we can find the
solution(s). Remember that we must have either one solution, infinitely many, or no solutions at all. The
following graph shows the two equations, as well as the intersection. Remember, the point of intersection
represents the solution of the two equations, or the (x, y) which satisfy both equations. In this case, there
is one point of intersection at (−1, 4) which means we have one unique solution, x = −1, y = 4.
y
6 y−x = 5
5
(x, y) = (−1, 4) 4
3
2 x+y = 3
1
x
−4 −3 −2 −1 1
♠
In the above example, we investigated the intersection point of two equations in two variables, x and y.
Now we will consider the graphical solutions of three equations in two variables.
Consider a system of three equations in two variables. Again, these equations can be graphed as
straight lines in the plane, so that the resulting graph contains three straight lines. Recall the three possi-
bilities for the number of solutions: no solution, one solution, and infinitely many solutions. With three
lines, there are more complex ways of achieving these situations. For example, you can imagine the case
of three intersecting lines having no common point of intersection. Perhaps you can also imagine three
intersecting lines which do intersect at a single point. These two situations are illustrated below.
y y
x x
No solution One solution
1.1. Geometric view of systems of equations 5
Consider the first picture above. While all three lines intersect with one another, there is no common
point of intersection where all three lines meet at one point. Hence, there is no solution to the three
equations. Remember, a solution is a point (x, y) which satisfies all three equations. In the case of the
second picture, the lines intersect at a common point. This means that there is one solution to the three
equations whose graphs are the given lines. You should take a moment now to draw the graph of a system
which results in three parallel lines. Next, try the graph of three identical lines. Which type of solution is
represented in each of these graphs?
We have now considered the graphical solutions of systems of two equations in two variables, as well
as three equations in two variables. However, there is no reason to limit our investigation to equations in
two variables. We will now consider equations in three variables.
You may recall that equations in three variables, such as 2x + 4y − 5z = 8, form a plane. Above, we
were looking for intersections of lines in order to identify any possible solutions. When graphically solving
systems of equations in three variables, we look for intersections of planes. These points of intersection
give the (x, y, z) that satisfy all the equations in the system. What types of solutions are possible when
working with three variables? Consider the following picture involving two planes, which are given by
two equations in three variables.
Notice how these two planes intersect in a line. This means that the points (x, y, z) on this line satisfy
both equations in the system. Since the line contains infinitely many points, this system has infinitely
many solutions.
It could also happen that the two planes fail to intersect. However, is it possible to have two planes
intersect at a single point? Take a moment to attempt drawing this situation, and convince yourself that it
is not possible! This means that when we have only two equations in three variables, there is no way to
have a unique solution! Hence, the only possibilities for the number of solutions of two equations in three
variables are no solution or infinitely many solutions.
Now imagine adding a third plane. In other words, consider three equations in three variables. What
types of solutions are now possible? Consider the following diagram.
6 Systems of linear equations
In this diagram, there is no point which lies in all three planes. There is no intersection between all
three planes so there is no solution. The picture illustrates the situation in which the line of intersection
of the new plane with one of the original planes forms a line parallel to the line of intersection of the first
two planes. However, in three dimensions, it is possible for two lines to fail to intersect even though they
are not parallel. Such lines are called skew lines.
Recall that when working with two equations in three variables, it was not possible to have a unique
solution. Is it possible when considering three equations in three variables? In fact, it is possible, and we
demonstrate this situation in the following picture.
In this case, the three planes have a single point of intersection. Can you think of other possibilities?
Another is that the three planes could intersect in a line, resulting in infinitely many solutions, as in the
following diagram.
We have now seen how three equations in three variables can have no solution, a unique solution, or
intersect in a line resulting in infinitely many solutions. It is also possible that all three equations describe
the same plane, which also leads to infinitely many solutions.
You can see that when working with equations in three variables, there are many more possibilities for
achieving solutions (or no solutions) than when working with two variables. It may prove enlightening to
spend time imagining (and drawing) many possible scenarios, and you should take some time to try a few.
You should also take some time to imagine (and draw) graphs of systems in more than three variables.
Equations like x + y − 2z + 4w = 8 with more than three variables are often called hyperplanes. You
may soon realize that it is tricky to draw the graphs of hyperplanes! In fact, most people cannot visualize
more than three dimensions. Fortunately, through the tools of linear algebra, we can examine systems
of equations in four variables, five variables, or even hundreds or thousands of variables, without ever
needing to graph them. Instead we will use algebra to manipulate and solve these systems of equations.
We will introduce these algebraic tools in the following sections.
1.2. Algebraic view of systems of equations 7
Exercises
Exercise 1.1.1 Graphically, find the point (x, y) which lies on both of the lines x + 3y = 1 and 4x − y = 3.
That is, graph each line and see where they intersect.
Exercise 1.1.2 Graphically, find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1. That
is, graph each line and see where they intersect.
Exercise 1.1.3 You have a system of k equations in two variables, k ≥ 2. Explain the geometric signifi-
cance of
(a) No solution.
(b) A unique solution.
(c) An infinite number of solutions.
Exercise 1.1.4 Draw a picture of three planes such that no two of the planes are parallel, but the three
planes have no common intersection.
Outcomes
A. Recognize the difference between a linear equation and a non-linear equation.
B. Determine whether a tuple of real numbers is a solution for a system of linear equations.
We have taken an in-depth look at graphical representations of systems of equations, as well as how to
find possible solutions graphically. Our attention now turns to working with systems algebraically.
a1 x1 + a2 x2 + . . . + an xn = b.
Here, a1 , . . . , an are real numbers called the coefficients of the equation, b is a real number called
the constant term of the equation, and x1 , . . . , xn are variables.
8 Systems of linear equations
Real numbers, such as the coefficients a1 , . . . , an , the constant term b, or the values of the variables
x1 , . . . , xn , will also be called scalars. For now, the word “scalar” is just a synonym for “real number”.
Later, in Section 1.8, we will discover other kinds of scalars.
2x + 3y = 5
2
√ + 3y = 5
2x
2 x + 3y = 5
√
( 2)x + 3y = 52
Solution. The equation 2x + 3y = 5 is linear. The equation 2x√2 + 3y = 5 is not linear, because it contains
the square of a variable instead of a variable. The equation 2 x + 3y = 5 is also√ not linear, because the
2
√ is applied to one of the variables. On the other hand, the equation ( 2)x + 3y = 5 is linear,
square root
because 2 and 52 are real numbers, and can therefore be used as coefficients and constant terms. ♠
We also permit minor notational variants of linear equations. The equation 2x − 3y = 5 is linear although
Definition 1.2 does not mention subtraction, because it can be regarded as just another notation for 2x +
(−3)y = 5. Similarly, the equation 2x = 5 + 3y can be regarded as linear, because it can be easily rewritten
as 2x − 3y = 5 by bringing all the variables (and their coefficients) to the left-hand side. When we need to
emphasize that some linear equation is literally of the form a1 x1 + a2 x2 + . . . + an xn = b, we say that the
equation is in standard form. Thus, the standard form of the equation 2x = 5 + 3y is 2x + (−3)y = 5.
A solution to a linear equation is an assignment of real numbers to the variables, making the equation
true. More precisely, if r1 , . . . , rn are real numbers, the assignment x1 = r1 , . . . , xn = rn is a solution to the
equation in Definition 1.2 if the real number a1 r1 + a2 r2 + . . . + an rn is equal to the real number b. To save
space, we often write solutions in tuple notation1 as (x1 , . . . , xn ) = (r1 , . . . , rn ). When there is no doubt
about the order of the variables, we also often simply write the solution as (r1 , . . . , rn ).
Solution. The assignment (x, y, z) = (1, 1, 0) is a solution because 2(1) + 3(1) − 4(0) = 5. The assignment
(x, y, z) = (0, 3, 1) is also a solution, because 2(0) + 3(3) − 4(1) = 5. On the other hand, (x, y, z) = (1, 1, 1)
is not a solution, because 2(1) + 3(1) − 4(1) = 1 6= 5. ♠
1 The terminology “tuple” arose as follows. A collection of two items is called a “pair”, a collection of three items is called
a “triple”, followed by “quadruple”, “quintuple”, “sextuple”, and so on. You have to know Latin to know what the next ones
are called. To avoid these Latin terms, mathematicians started saying 4-tuple, 5-tuple, 6-tuple and so on, and more generally,
n-tuple for an ordered collection of n items. When n doesn’t matter or is clear from the context, we often just say “tuple”.
1.2. Algebraic view of systems of equations 9
where ai j and bi are scalars (i.e., real numbers). The above is a system of m equations in the n
variables, x1 , x2 . . . , xn . As before, the numbers ai j are called the coefficients and the numbers bi are
called the constant terms of the system of equations.
The relative size of m and n is not important here. We may have more variables than equations, more
equations than variables, or an equal number of equations and variables.
A solution to a system of linear equations is an assignment of real numbers to the variables that is a
solution to all of the equations in the system.
2x + 3y − 4z = 5
−2x + y + 2z = −1.
Which of the following are solutions of the system? (a) (x, y, z) = (1, 1, 0), (b) (x, y, z) = (6, 3, 4), (c)
(x, y, z) = (0, 3, 1).
Solution. The assignment (x, y, z) = (1, 1, 0) is a solution of this system of equations, because it is a
solution to the first equation and the second equation. Also, (x, y, z) = (6, 3, 4) is another solution of this
system of equations (check this!). On the other hand, (x, y, z) = (0, 3, 1) is not a solution of the system,
because although it is a solution to the first equation, it is not a solution to the second equation. ♠
Recall from Section 1.1 that a system of equations either has a unique solution, infinitely many solutions,
or no solution. It is very important to us whether a system of equations has solutions or not. For this
reason, we introduce the following terminology:
If we think of each equation as a condition that must be satisfied by the variables, consistent means that
there is some choice of values for the variables which can satisfy all of the conditions. Inconsistent means
that there is no such choice of values for the variables. In the following sections, you will learn a method
for determining whether a system of equations is consistent or not, and in case it is consistent, to find all
of its solutions.
10 Systems of linear equations
Exercises
(a) 2x − 3y + 4z = −10
(c) x2 + y2 + z2 = 1
(d) √1 x + 43 y = sin( π3 )
2
(e) x + yz = 3
x + 2y + 3z + 4w = 4
x+ y + z + w = 2
x + 2y + 2z + w = 2.
For each of the following tuples (x, y, z, w) of real numbers, determine whether it is a solution of the first
equation, second equation, and/or third equation. Which ones are solutions to the system of equations?
(a) (2, 0, −2, 2) (b) (2, 2, −2, 0) (c) (1, 1, −1, 1) (d) (3, 0, −1, 1) (e) (2, −2, 2, 0)
Outcomes
A. Use elementary operations to simplify a system of equations.
Our strategy for solving systems of linear equations is to successively transform a difficult system of
equations into a simpler equivalent system. Here, by an “equivalent” system of equations we mean one
that has the same solutions as the original one. We will perform the process of simplifying a system of
equations by applying certain basic steps called “elementary operations”.
1.3. Elementary operations 11
How can we know whether two systems of equations are equivalent? It turns out that the following basic
operations always transform a system of equations into an equivalent system. In fact, these operations are
the key tool we use in linear algebra to solve systems of equations.
The most important property of the elementary operations is that they do not change the solutions to the
system of equations. Before proving that this is true in general, we will first verify it in an example.
Solution. We can see that the second system is obtained from the first one by applying an elementary
operation, namely, adding 2 times the first equation to the second equation:
By simplifying, we obtain 4y = 8.
To verify that the two systems are indeed equivalent, let us first solve the first system. From the
second equation, we see that x = 3. Substituting x = 3 into the first equation, the equation becomes
3 + 2y = 7, which we can solve to find y = 2. Therefore, the only solution to the first system of equations
is (x, y) = (3, 2).
Now let us solve the second system. From the second equation, we find that y = 2. Substituting y = 2
into the first equation, we get x + 4 = 7, which we can solve to find x = 3. Therefore, the only solution to
12 Systems of linear equations
the second system of equations is (x, y) = (3, 2). Since the two systems have the same solutions, they are
equivalent. ♠
This example illustrates how an elementary operation applied to a system of two equations in two variables
does not affect the set of solutions. The same is true for any size of system in any number of variables.
In the following theorem, we use the notation Ei to represent the left-hand side of an equation, while bi
denotes a constant term.
E1 = b 1
(1.1)
E2 = b 2 .
1.
E2 = b 2
(1.2)
E1 = b 1 .
2.
E1 = b 1
(1.3)
kE2 = kb2
for any scalar k, provided k 6= 0.
3.
E1 = b 1
(1.4)
E2 + kE1 = b2 + kb1
for any scalar k (including k = 0).
Proof.
2. To prove that the systems (1.1) and (1.3) have the same solution set, let (x1 , . . . , xn ) be any solution
of (1.1). Then E1 = b1 and E2 = b2 are both true. Multiplying both sides of the last equation by k,
we know that kE2 = kb2 is true, and so (x1 , . . . , xn ) is a solution of (1.3). Conversely, let (x1 , . . . , xn )
be any solution of (1.3). Then E1 = b1 and kE2 = kb2 are true. Because k 6= 0, we are allowed to
divide both sides of the last equation by k, and therefore E2 = b2 is true. Hence, (x1 , . . . , xn ) is also
a solution of (1.1). Since we have shown that every solution of (1.1) is a solution of (1.3) and vice
versa, the two systems are equivalent.
3. To prove that the systems (1.1) and (1.4) have the same solution set, let (x1 , . . . , xn ) be any solution
of (1.1). Then E1 = b1 and E2 = b2 are both true. We multiply both sides of the first equation by k
to obtain kE1 = kb1 . Then kE1 + E2 = kb1 + b2 , and hence (x1 , . . . , xn ) is a solution of (1.4). For the
converse direction, assume (x1 , . . . , xn ) is a solution of E1 = b1 and kE1 + E2 = kb1 + b2 . From the
1.3. Elementary operations 13
first equation, we have kE1 = kb1 , and subtracting this from the second equation, we get E2 = b2 ,
hence (x1 , . . . , xn ) is a solution of (1.1). Note that unlike in case 2., there was no need to divide by k,
and therefore it was not necessary to require k 6= 0.
♠
We will now use elementary operations to solve a system of three equations and three variables.
Solution. By Theorem 1.11, we can do elementary operations on this system without changing the solution
set. We will therefore use elementary operations to try to simplify the system of equations. First, we add
(−2) times the first equation to the second equation. This yields the system
x + 3y + 6z = 25
y + 2z = 8
2y + 5z = 19.
Next, we add (−2) times the second equation to the third equation. This yields the system
x + 3y + 6z = 25
y + 2z = 8 (1.5)
z = 3.
At this point, it is easy to find the solution. The last equation tells us that z = 3. We can substitute this
value of z back into the second equation to get
y + 2(3) = 8,
which we can simplify and solve for y to find that y = 2. Finally, we can substitute the values z = 3 and
y = 2 back into the first equation to get
Simplifying and solving for x, we find that x = 1. Hence, the solution to the system is (x, y, z) = (1, 2, 3).
The process we followed for solving (1.5) by first computing z, then y, then x is called back substitu-
tion. Alternatively, we could have continued from (1.5) with more elementary operations as follows. Add
(−2) times the third equation to the second and then add (−6) times the third to the first. This yields
x + 3y =7
y =2
z = 3.
14 Systems of linear equations
Now add (−3) times the second to the first. This yields
x =1
y =2
z = 3,
a system which has the same solution set as the original system. This second method avoided back substi-
tution and led to the same solution set. It is your decision which you prefer to use, as both methods lead
to the correct solution, (x, y, z) = (1, 2, 3). ♠
Note how we have written each system of equations so that “like” variables line up on columns: one
column for x, one column for y, and one column for z. This makes it easier to perform elementary
operations. It is often useful to simplify the notation further, writing systems of equations in augmented
matrix notation. Recall the system of equations from Example 1.12:
x + 3y + 6z = 25
2x + 7y + 14z = 58
2y + 5z = 19.
A matrix is just a 2-dimensional array of numbers. An augmented matrix has two parts separated by a
vertical line. Notice that the augmented matrix notation has exactly the same information as the original
system of equations. All the coefficients are written on the left side of the vertical line, and all the constant
terms are written on the right side of the vertical line. These two parts of the augmented matrix are also
called the coefficient matrix and the constantmatrix. Each rowof the augmented matrix corresponds to
one linear equation. For example, the top row 1 3 6 | 25 corresponds to the equation
x + 3y + 6z = 25.
a11 x1 + . . . + a1n xn = b1
.
..
am1 x1 + . . . + amn xn = bm
is
a11 · · · a1n b1
.. ... ...
. .
am1 · · · amn bm
We can consider elementary operations in the context of the augmented matrix. The elementary operations
can be used on the rows of an augmented matrix just as we used them on equations previously. For
example, instead of adding a multiple of one equation to another, we will now be adding a multiple of one
row to another. Note that Theorem 1.11 implies that any elementary row operation used on an augmented
matrix will not change the solutions to the corresponding system of equations. For reference, here are the
three kinds of elementary row operations, along with a shorthand notation we are going to use for them.
3. Add a multiple of one row to another row. (Notation: Ri ← Ri + kR j to add k times row j to
row i.)
We write “≃” to indicate that two augmented matrices are equivalent, i.e., that the corresponding systems
of equations have the same set of solutions.
Solution. We have:
1 3 6 25 1 3 6 25 1 3 6 25
R ← R −2R
2 7 14 58 R2 ← ≃
R2 −2R1
0 1 2 8 3 ≃3 2 0 1 2 8 .
0 2 5 19 0 2 5 19 0 0 1 3
The final augmented matrix corresponds to the system
x + 3y + 6z = 25
y + 2z = 8
z = 3,
16 Systems of linear equations
which is the same as (1.5). We can solve it by back substitution to obtain the solution x = 1, y = 2, and
z = 3.
Alternatively, we can continue with additional row operations:
1 3 6 25 R1 ← R1 −6R3 1 3 0 7 1 0 0 1
R ← R −2R
0 1 2 8 2 ≃2 3 0 1 0 2 R1 ← ≃ R1 −3R2
0 1 0 2 .
0 0 1 3 0 0 1 3 0 0 1 3
Notice how this notation is much more succinct than what we used in Example 1.12. ♠
We end this section with a final word of caution: logically, you can only perform one elementary row
operation at a time. For example, it would not be correct to simultaneously add R1 to R2 and add R2 to R1 .
What is permitted is to first add R1 to R2 , then then add the new R2 to R1 . Although we may sometimes try
to save space by skipping an intermediate step, as in the last example where we applied the row operations
R1 ← R1 −6R3 and R2 ← R2 −2R3 in one step, it is important to realize that logically, each row operation
must be performed separately before the next one can be done. When in doubt, the only safe course of
action is not to skip any steps.
Exercises
3x + y = 3
x + 2y = 1.
Exercise 1.3.2 Use elementary operations to find the point (x, y) that lies on both lines x + 3y = 1 and
4x − y = 3.
Exercise 1.3.3 Use elementary operations to determine whether the three lines x + 2y = 1, 2x − y = 1,
and 4x + 3y = 3 have a common point of intersection. If so, find the point, and if not, tell why they don’t
have such a common point of intersection.
x + 3y − 2z = 5
y + 3z = 4
z = 1.
Exercise 1.3.6 Write the following system of linear equations as an augmented matrix. Caution: you first
have to simplify and rearrange the equations so that “like” variables are lined up in columns. Write the
1.4. Gaussian elimination 17
Exercise 1.3.7 Four times the weight of Gaston is 150kg more than the weight of Ichabod. Four times
the weight of Ichabod is 660kg less than seventeen times the weight of Gaston. Four times the weight of
Gaston plus the weight of Siegfried equals 290kg. Brunhilde would balance all three of the others. Find
the weights of the four people.
Outcomes
A. Find the echelon form of a matrix.
C. Solve a system of linear equations using Gaussian elimination and back substitution.
E. Determine whether a consistent system of linear equations has a unique solution or an infinite
number of solutions from its rank.
In the previous section, we saw examples of how to solve a system of equations using elementary row
operations (and sometimes back substitution). But it is not clear whether every system of equations can be
solved this way. How do we know which elementary row operation to apply next? In this section, you will
learn a procedure called Gaussian elimination by which every system of linear equations can be solved
systematically.
Before we start, let’s figure out what it means to be “done”. At what point should we stop performing
row operations? The answer is that we will stop performing row operations when the system of equations
is in a special form called echelon form, which we now define.
18 Systems of linear equations
2. Each leading entry of a row is in a column to the right of the leading entry of any row above
it.
The word echelon comes from French échelle, which means ladder. This is because an echelon form looks
a bit like a ladder or staircase. Here are some examples of echelon forms.
An augmented matrix can always be converted to echelon form by using elementary row operations. The
following algorithm shows how to do this.
1.4. Gaussian elimination 19
1. Starting from the left, find the first non-zero column. This is the first pivot column, and the
position at the top of this column will be the position of the first pivot entry. Switch rows if
necessary to place a non-zero number in the first pivot position.
2. Use row operations to make the entries below the first pivot entry (in the first pivot column)
equal to zero.
3. Ignoring the row containing the first pivot entry, repeat steps 1 and 2 with the remaining rows.
Repeat the process until there are no more non-zero rows left.
Most often we will apply this algorithm in order to solve a system of linear equations. This works by first
converting the system to echelon form, then using back substitution to find the solutions. The next few
examples show how to do this.
x + 4y + 3z = 11
2x + 10y + 7z = 27
x + y + 2z = 5.
y + 2z = 2
2x + y − 2z = 3
4x − y − 10z = 4.
This finishes the first column. The second pivot column will be column two, with the 1 in the second row
and column as the pivot entry. We add 3 times the second row to the third row to create a zero below the
pivot:
2 1 −2 3
R3 ← R3 +3R2
≃ 0 1 2 2 .
0 0 0 4
This matrix is in echelon form. Note that the final pivot entry is on the right-hand side. The last row
corresponds to the equation
0x + 0y + 0z = 4.
This equation has no solution, because for all x, y, z, the left-hand size will equal 0 and not 4. Therefore,
there is no solution to the given system of equations. In other words, the system is inconsistent. ♠
3x − y + 5z = 8
y − 10z = 1 (1.7)
6x − y = 17.
We use Gaussian elimination to carry the augmented matrix to echelon form. The first column is the first
pivot column, and 3 is the pivot entry. We use row operating to create zeros beneath the pivot entry. We
subtract 2 times the first row from the third row and get:
R3 ← R3 −2R1
3 −1 5 8
≃ 0 1 −10 1
0 1 −10 1
Now, we have created zeros beneath the pivot entry in the first column, so we move on to the second pivot
column (which is the second column) and repeat the procedure. Subtracting the second row from the third
row, we get:
3 −1 5 8
R3 ← R3 −R2
≃ 0 1 −10 1
0 0 0 0
This matrix is now in echelon form. Observe that the first two columns are pivot columns, and the third
column is not. We call the corresponding variables x and y pivot variables, and the variable z is a free
variable. The equations corresponding to this echelon form are
3x − y + 5z = 8
y − 10z = 1.
22 Systems of linear equations
Observe that the free variable z is not constrained by any equation. In fact, z can equal any number. We
choose t to be any number and let z = t. In this context t is called a parameter. We then use back
substitution to solve for the pivot variables y and x. From the second equation, we have y = 1 + 10z =
1 + 10t. From the first equation, we have 3x = 8 + y − 5z = 8 + (1 + 10t) − 5t = 9 + 5t, and therefore
x = 3 + 53 t. Therefore, the general solution of this system is
x = 3 + 53 t
y = 1 + 10t
z = t,
where t is arbitrary. The system has an infinite set of solutions which are given by these equations. For
any value of the parameter t we select, x, y, and z will be given by the above equations. For example, if
we choose t = 4 then the corresponding solution would be
x = 3 + 53 (4) = 29
3
y = 1 + 10(4) = 41
z = 4.
♠
In Example 1.22 the solution involved a parameter. It may happen that the solution to a system involves
more than one parameter, as shown in the following example.
x + 2y − 2z + 2w = 3
x + 2y − z + 3w = 5
x + 2y − 3z + w = 1.
♠
In Examples 1.20–1.23, we have seem systems of equations with one solution, no solution, and infinitely
many solutions with one parameter as well as two parameters. Moreover, in each case, we have been
able to determine the number of solutions by looking at the echelon form of the augmented matrix. To
summarize, we have the following possibilities for a system of equations:
2. One solution: For a consistent system of equations: If every column of the coefficient matrix of the
echelon form is a pivot column, the system has exactly one solution. The following is an example
of an augmented matrix in echelon form for a system of equations with one solution.
1 1 −2 5
0 2 3 0
.
0 0 1 2
0 0 0 0
3. Infinitely many solutions: For a consistent system of equations: If not all columns of the coefficient
matrix of the echelon form are pivot columns, then the system has infinitely many solutions. In
this case, each variable corresponding to a non-pivot column is a free variable and can be assigned
a parameter. The remaining variables are pivot variables and can be expressed in terms of the
parameters. Therefore, the number of parameters in the general solution is equal to the number of
non-pivot columns. The following are examples of echelon forms for systems of equations with
infinitely many solutions.
1 0 1 5
0 1 2 −3
0 0 0 0
0 0 0 0
or
1 2 3 5
.
0 0 4 6
There is a special name for the number of pivot variables in a system of equations. It is called the rank
of the system.
24 Systems of linear equations
Solution. First, we need to find an echelon form of A. Through the usual algorithm, we find that this is
1 0 −1
0 3 6 .
0 0 0
Suppose we have a system of m equations in n variables, and suppose that n > m. Further assume that
the system is consistent. From our above discussion, we know that this system will have infinitely many
solutions. This is because there can be at most one pivot entry per row, and therefore at most m variables
can be pivot variables. It follows that there are at least n − m free variables. Therefore, the general solution
of this system has at least n − m parameters.
Notice that if n = m or n < m, it is possible for the system to have a unique solution or infinitely many
solutions. In all cases (n > m, n = m, or n < m), it is also possible for the system to be inconsistent (have
no solutions).
By refining the above argument, we get the following theorem:
2. If r < n, then the system has infinitely many solutions, with n − r parameters.
Here is a final summary of how the rank affects the number of solutions:
1. No solution. If the system of equations is inconsistent, then it has no solution, regardless of the rank.
1.4. Gaussian elimination 25
2. Unique solution. For a consistent system, suppose r = n. Then there is a pivot position in every
column of the coefficient matrix of A. Hence, there is a unique solution.
3. Infinitely many solutions. For a consistent system, suppose r < n. Then there are less pivot positions
than columns in the coefficient matrix, meaning that not every column is a pivot column. The
columns which are not pivot columns correspond to parameters. In fact, in this case we have n − r
parameters. The system has infinitely many solutions.
Exercises
Exercise 1.4.1 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a non-zero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 ∗ ∗ 0 ∗
0 0 ∗ ∗ ∗
0 0 0 0 ∗
Exercise 1.4.2 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a non-zero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗
0 ∗ ∗
0 0 ∗
Exercise 1.4.3 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a non-zero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 0 ∗ 0 ∗
0 0 0 ∗ ∗
0 0 0 0 ∗
Exercise 1.4.4 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a non-zero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 ∗ ∗ 0 ∗
0 0 0 0 0
0 0 0 0 ∗
26 Systems of linear equations
Exercise 1.4.5 Suppose a system of equations has fewer equations than variables. Will such a system
necessarily be consistent? If so, explain why and if not, give an example which is not consistent.
Exercise 1.4.6 If a system of equations has more equations than variables, can it have a solution? If so,
give an example and if not, explain why not.
Exercise 1.4.10 Choose h and k such that the augmented matrix shown has each of the following:
Exercise 1.4.11 Choose h and k such that the augmented matrix shown has each of the following:
Exercise 1.4.12 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 1
2x + y − z = 1
4x + 2y + z = 5
1.4. Gaussian elimination 27
Exercise 1.4.13 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 0
2x + y − z = 1
4x + 2y + z = 3
Exercise 1.4.15 Row reduce each of the following matrices to echelon form.
2 −1 3 −1 0 0 −1 −1 3 −6 −7 −8
(a) 1 0 2 1
(b) 1 1 1 0
(c) 1 −2 −2 −2
1 −1 1 −2 1 1 0 −1 1 −2 −3 −4
2 4 5 15 4 −1 7 10 3 5 −4 2
(d) 1 2 3 9
(e) 1 0 3 3
(f) 1 2 −1 1
1 2 2 6 1 −1 −2 1 1 1 −2 0
−2 3 −8 7
(g) 1 −2 5 −5
1 −3 7 −8
Exercise 1.4.16 Find the general solution of the system whose augmented matrix is
1 2 0 2 1 2 0 2
1 1 0 1
(a) 1 3 4 2 (b) 2 0 1 1 (c)
1 0 4 2
1 0 2 1 3 2 1 3
1 0 2 1 1 2 1 0 2 1 1 2
0 1 0 1 2 1 0 1 0 1 2 1
(d)
1
(e)
2 0 0 1 3 0 2 0 0 1 3
1 0 1 0 2 2 1 −1 2 2 2 0
Exercise 1.4.17 Solve the system of equations 7x + 14y + 15z = 22, 2x + 4y + 3z = 5, and 3x + 6y + 10z =
13.
Exercise 1.4.19 Solve the system of equations 9x − 2y + 4z = −17, 13x − 3y + 6z = −25, and −2x − z = 3.
28 Systems of linear equations
Exercise 1.4.20 Solve the system of equations 65x + 84y + 16z = 546, 81x + 105y + 20z = 682, and
84x + 110y + 21z = 713.
Exercise 1.4.21 Solve the system of equations 8x+2y+3z = −3, 8x+3y+3z = −1, and 4x+y+3z = −9.
Exercise 1.4.22 Suppose a system of equations has fewer equations than variables and you have found a
solution to this system of equations. Is it possible that your solution is the only one? Explain.
Exercise 1.4.23 Suppose a system of linear equations has an augmented matrix with 2 rows and 4 columns
and the last column is a pivot column. Could the system of linear equations be consistent? Explain.
Exercise 1.4.24 Suppose the coefficient matrix of a system of n equations with n variables has the property
that every column is a pivot column. Does it follow that the system of equations must have a solution? If
so, must the solution be unique? Explain.
Exercise 1.4.25 Suppose there is a unique solution to a system of linear equations. What must be true of
the pivot columns in the augmented matrix?
Exercise 1.4.26 The steady state temperature, u, of a plate solves Laplace’s equation, ∆u = 0. One way
to approximate the solution is to divide the plate into a square mesh and require the temperature at each
node to equal the average of the temperature at the four adjacent nodes. In the following picture, the
numbers represent the observed temperature at the indicated nodes. Find the temperature at the interior
nodes, indicated by x, y, z, and w. One of the equations is z = 14 (10 + 0 + w + x).
20 20
x y
10 30
z w
10 30
0 0
Exercise 1.4.28 Suppose A is an m × n-matrix. Explain why the rank of A is always no larger than
min(m, n).
1.5. Gauss-Jordan elimination 29
Exercise 1.4.29 State whether each of the following sets of data is possible for a system of equations. If
possible, describe the solution set. That is, indicate whether there exists a unique solution, no solution or
infinitely many solutions. Here, A is the coefficient matrix, and [A | B] denotes the augmented matrix of the
system.
Exercise 1.4.30 Consider the system −5x + 2y − z = 0 and −5x − 2y − z = 0. Both equations equal zero
and so −5x + 2y − z = −5x − 2y − z which is equivalent to y = 0. Does it follow that x and z can equal
anything? Notice that when x = 1, z = −4, and y = 0 are plugged in to the equations, the equations do
not equal 0. Why?
Outcomes
A. Find the reduced echelon form of a matrix.
In the previous section, we saw how to solve a system of equations by using Gaussian elimination and
back substitution. The back substitution step can be quite confusing and error prone, especially when
there are parameters. For example, in Example 1.23, we had to substitute y = s, z = 2 − t, and w = t into
the equation x = 3 − 2y + 2z − 2w, which required another simplification step.
In this section, you will learn an alternative procedure called Gauss-Jordan elimination which elimi-
nates the need for back substitution, at the expense of doing a few additional row operations. The key to
this technique is a special kind of echelon form called a reduced echelon form.
1. It is in echelon form.
We can carry every augmented matrix to reduced echelon form by doing elementary row operations.
1. First, use Gaussian elimination (Algorithm 1.19) to reduce the matrix to echelon form.
2. Moving from right to left, consider each pivot entry. Without changing the row containing the
pivot entry, or any rows below it, use row operations to create zeros in the column above the
pivot entry. Finally, divide the row by its pivot entry, to make the pivot entry equal to 1.
x + 4y + 3z = 11
2x + 10y + 7z = 27
x + y + 2z = 5.
Solution. In Example 1.20, we had already reduced the system to echelon form:
1 4 3 11
0 2 1 5 .
0 0 1 3
The resulting matrix is in reduced echelon form. Note that the final system of equations is especially easy
to solve, because the three equations are x = −2, y = 1, and z = 3. No back substitution is needed. ♠
1.5. Gauss-Jordan elimination 31
x + 2y − 2z + 2w = 3
x + 2y − z + 3w = 5
x + 2y − 3z + 1w = 1.
One situation where Gauss-Jordan elimination excels is when you have to solve many systems of equations
that all have the same coefficient matrix.
x+z = 1 x+z = 2
2x + y + 3z = 2 2x + y + 3z = 5
3x + 2y + 5z = 4 3x + 2y + 5z = 8
Solution. We could certainly solve each system of equations separately. But since the left-hand sides are
the same, we will perform exactly the same row operations on both systems. We can save some work by
solving both systems together. Instead of a usual augmented matrix with only one constant vector, we
create an augmented matrix containing both constant vectors at the same time.
1 0 1 1 2
2 1 3 2 5
3 2 5 4 8
Then we row-reduce the coefficient matrix to reduced echelon form as usual. (We do not need to bother
reducing the right-hand size to reduced echelon form).
1 0 1 1 2 R2 ← R2−2R1 1 0 1 1 2 1 0 1 1 2
R ← R −3R R ← R −2R
2 1 3 2 5 3 ≃3 1 0 1 1 0 1 3 ≃3 2 0 1 1 0 1 .
3 2 5 4 8 0 2 2 1 2 0 0 0 1 0
We see that the first system is inconsistent, because it contains a row of the form [0 0 0 | 1]. The second
system is consistent, and we get the general solution z = t, y = 1 − t, x = 2 − t. ♠
Exercises
Exercise 1.5.2 Reduce each of the matrices from Exercise 1.4.15 to reduced echelon form.
Exercise 1.5.3 Use Gauss-Jordan elimination to solve the system of equations −8x + 2y + 5z = 18, −8x +
3y + 5z = 13, and −4x + y + 5z = 19.
Exercise 1.5.5 Use Gauss-Jordan elimination to solve the system of equations −9x + 15y = 66, −11x +
18y = 79, −x + y = 4, and z = 3.
Exercise 1.5.6 Use Gauss-Jordan elimination to solve the system of equations −19x+8y = −108, −71x+
30y = −404, −2x + y = −12, 4x + z = 14.
Exercise 1.5.7 Solve the following two systems of equations simultaneously, by using a single augmented
matrix with two constant vectors.
x + 2y − z = 0 x + 2y − z = 1
2x + 3y + z = 3 2x + 3y + z = 7
x − y + 2z = 3 x − y + 2z = 4
Outcomes
A. Determine whether a homogeneous system of equations has non-trivial solutions from its
rank.
C. Understand the relationship between the general solution of a system of equations and that of
its associated homogeneous system.
There is a special type of system of linear equations that requires additional study. This type of system is
called a homogeneous2 system of equations. Our focus in this section is to consider what types of solutions
are possible for a homogeneous system of equations, and how the solutions of non-homogeneous systems
are related to those of their homogeneous counterparts.
2 The word “homogeneous” has 5 syllables. In scientific usage, it is not the same as the word “homogenous”.
34 Systems of linear equations
The first thing we note is that a homogeneous system is always consistent. Indeed, it always has the
solution x1 = 0, x2 = 0, . . ., xn = 0. This solution is called the trivial solution.
If the system has a solution in which not all of the x1 , . . . , xn are equal to zero, then we call this solution
non-trivial. When working with homogeneous systems of equations, since the trivial solution always
exists, we are usually interested in finding whether there are non-trivial solutions.
The following theorem is a special case of Theorem 1.26. Recall that the rank of a system is the
number of pivot variables in its echelon form.
Solution. This is true. If the system has m equations and n variables, then the rank can be at most m. Since
m < n, the system has infinitely many solutions. Note that it is not possible for a homogeneous system to
be inconsistent, since there is always the trivial solution. ♠
Example 1.37: Homogeneous system with an equal number of variables and equations
True or false: Suppose a homogeneous system has the same number of variables as equations. Then
the system has a unique solution.
Solution. This is false in general. While it is possible for such a system to have a unique solution, it is also
possible for it to have infinitely many. Let there be n equations and n variables. Then depending on the
1.6. Homogeneous systems 35
echelon form, the rank r could be either equal to n, in which case there is a unique solution, or less than n,
in which case there are infinitely many. ♠
We now consider an example of solving a homogeneous system of equations.
Solution. Notice that this system has m = 2 equations and n = 4 variables, so n > m. Therefore by our
previous discussion, we expect this system to have infinitely many solutions. In particular, it will have
non-trivial solutions.
The process we use to find the solutions for a homogeneous system of equations is the same process
we used for non-homogeneous equations. We construct the augmented matrix and reduce it to reduced
echelon form.
2 1 1 4 0 1 0 1 1 0
≃
1 2 −1 5 0 0 1 −1 2 0
The corresponding system of equations is
x+z+w = 0
y − z + 2w = 0.
The free variables are z and w. We set them equal to parameters z = s and w = t. Then our general solution
has the form
x = −s − t
y = s − 2t
z=s
w = t.
Hence this system has infinitely many solutions, with two parameters s and t. ♠
Let us write the solution of the last example in another form. Specifically, it can be written as
x −1 −1
y
= s 1 + t −2 . (1.9)
z 1 0
w 0 1
Notice that we have constructed a column from the coefficients of s in each equation, and another column
from the coefficients of t. We will discuss this notation more in later chapters. For now, consider what
happens when we choose the parameters to be s = 1 and t = 0. In this case, we get the solution
−1
1
1 , (1.10)
0
36 Systems of linear equations
which is the same as the column of coefficients for s. This is called a basic solution of the homogeneous
system of equations. The other basic solution is obtained by setting s = 0 and t = 1. In this case,
−1
−2
0 . (1.11)
1
The basic solutions of a system are columns constructed from the coefficients on parameters in the solution.
If X1 and X2 are the basic solutions (1.10) and (1.11), then the general solution (1.9) is of the form sX1 +
tX2. We say that the general solution of the homogeneous system is a linear combination of its basic
solutions.
We explore this further in the following example.
x + 4y + 3z = 0
(1.12)
3x + 12y + 9z = 0.
Solution. The augmented matrix of this system and the resulting reduced echelon form are
1 4 3 0 1 4 3 0
≃ .
3 12 9 0 0 0 0 0
x + 4y + 3z = 0.
Notice that x is the only pivot variable, and y and z are free variables. Let y = s and z = t for parameters s
and t. Then the general solution is
x = −4s − 3t
y=s
z = t,
which can be written as
x −4 −3
y = s 1 +t 0 .
z 0 1
You can see here that we have two columns of coefficients corresponding to parameters, specifically one
for s and one for t. Therefore, this system has two basic solutions! They are
−4 −3
X1 = 1 , X2 = 0 .
0 1
1.6. Homogeneous systems 37
♠
We can take any non-homogeneous system of equations and get a new homogeneous system by keeping
the left-hand sides the same and setting all of the constant terms equal to 0. This is called the associated
homogeneous system of the system of equations. We end this section by investigating how the solutions
of a system of equations are related to the solutions of its associated homogeneous system.
Solution. We note that the associated homogeneous system of (1.13) is the system we saw in Exam-
ple 1.39. We solve the system (1.13) in the usual way by reducing its augmented matrix to reduced
echelon form
1 4 3 2 1 4 3 2
≃
3 12 9 6 0 0 0 0
and then assigning parameters y = s, z = t to the free variables. From the equation x + 4y + 3z = 2, the
general solution is
x = 2 − 4s − 3t
y=s
z = t,
which can be written as
x 2 −4 −3
y = 0 +s 1 +t 0 . (1.14)
z 0 0 1
We see that the general solution is almost exactly the same as that of the homogeneous system in Exam-
ple 1.39. The only difference is the additional column
2
0 (1.15)
0
♠
Note that the column (1.15), by itself, is a solution of the non-homogeneous system. It is not the most
general solution, but rather the particular solution resulting from the parameters s = 0 and t = 0. We can
therefore interpret equation (1.14) as saying that the general solution of the non-homogeneous system is
equal to a particular solution of the non-homogeneous system, plus the general solution of the associated
homogeneous system. The same is true in general, and we summarize it as a theorem.
Exercises
Exercise 1.6.1 Find the basic solutions of each of the following homogeneous systems of equations.
2x + 3y + 4z = 0 x−y+z = 0 x + y − z + 2w = 0
(a) x − 2y + z = 0 (b) −x − 2y − 4z = 0 (c) x + 3y + z + 6w = 0
4x − y + 6z = 0 2x + y + 5z = 0 x + 2y + 4w = 0
Exercise 1.6.2 Which of the following homogeneous systems of linear equations have non-trivial solu-
tions?
Exercise 1.6.3 My system of equations has a solution (x, y, z) = (1, 2, 4). The associated homogeneous
system has basic solutions (x, y, z) = (1, 0, 1) and (x, y, z) = (0, 1, −1). What is the general solution of my
system of equations?
Outcomes
A. Determine whether two systems of equations are row equivalent, by comparing their reduced
echelon form.
B. For two homogeneous systems of equations that are not row equivalent, find a solution to one
system that is not a solution to the other.
We have seen in earlier sections that every matrix can be brought into reduced echelon form by a sequence
of elementary row operations. Here we will prove that the resulting matrix is unique; in other words, the
resulting matrix in reduced echelon form does not depend upon the particular sequence of elementary row
operations or the order in which they were performed.
Let A be the augmented matrix of a homogeneous system of linear equations in the variables x1 , x2 , . . . , xn
which is also in reduced echelon form. Recall that the matrix A divides the set of variables in two different
types: xi is a pivot variable when column i is a pivot column, and a free variable otherwise.
1.7. Uniqueness of the reduced echelon form 39
x + 2y − z + w = 0
x+y−z+w = 0
x + 3y − z + w = 0
From this, we see that columns 1 and 2 are pivot columns. Therefore, x and y are pivot variables and z and
w are free variables. We can write the solution to this system as
x = s−t
y=0
z=s
w = t.
♠
In general, all solutions can be written in terms of the free variables. In such a description, the free
variables are written as parameters, while the pivot variables are written as functions of these parameters.
Indeed, a pivot variable xi is a function of only those free variables x j with j > i. This leads to the following
observation.
Using this proposition, we prove a lemma which will be used in the proof of the main result of this section.
Proof. With respect to the linear systems associated with the matrices A and B, there are two cases to
consider:
• Case 2: the two systems do not have the same pivot variables
40 Systems of linear equations
In case 1, the two matrices will have exactly the same pivot positions. However, since A and B are not
identical, there is some row of A which is different from the corresponding row of B and yet the rows each
have a pivot in the same column position. Let i be the index of this column position. Since the matrices
are in reduced echelon form, the two rows must differ at some entry in a column j > i. Let these entries be
a in A and b in B, where a 6= b. Since A is in reduced echelon form, if x j were a pivot variable for its linear
system, we would have a = 0. Similarly, if x j were a pivot variable for the linear system of the matrix B,
we would have b = 0. Since a and b are unequal, they cannot both be equal to 0, and hence x j cannot be a
pivot variable for both linear systems. However, since the systems have the same pivot variables, x j must
then be a free variable for each system. We now look at the solutions of the systems in which x j is set
equal to 1 and all other free variables are set equal to 0. For this choice of parameters, the solution of the
system for matrix A has xi = −a, while the solution of the system for matrix B has xi = −b, so that the
two systems have different solutions.
In case 2, there is a variable xi which is a pivot variable for one matrix, let’s say A, and a free variable
for the other matrix B. The system for matrix B has a solution in which xi = 1 and x j = 0 for all other free
variables x j . However, by Proposition 1.43 this cannot be a solution of the system for the matrix A. This
completes the proof of case 2. ♠
Now, we say that the matrix B is row equivalent to the matrix A if B can be obtained from A by performing
a sequence of elementary row operations. By Theorem 1.11, we know that row equivalent systems have
exactly the same solutions. Now, we can use Lemma 1.44 to prove the main result of this section, which
is that each matrix A has a unique reduced echelon form.
Proof. By Gauss-Jordan elimination, we already know that every matrix is row equivalent to some reduced
echelon form. What we must show is that the resulting reduced echelon form is unique, i.e., does not
depend on the order in which row operations are performed.
Therefore, let A be an m × n-matrix and let B and C be matrices in reduced echelon form, each row
equivalent to A. We have to show that B = C.
Let A+ be the matrix A augmented with a new rightmost column consisting entirely of zeros. Similarly,
augment matrices B and C each with a rightmost column of zeros to obtain B+ and C+. Note that B+ and
C+ are augmented matrices in reduced echelon form, and that both B+ and C+ are row equivalent to A+ ,
because the addition of a column of zeros does not change the effect of any row operations.
Now, A+ , B+ , and C+ can all be considered as augmented matrices of homogeneous linear systems
in the variables x1 , x2 , . . . , xn . Because all three systems are row equivalent, they have exactly the same
solutions. By Lemma 1.44, we conclude that B+ = C+. Omitting the final column of zeros, we must also
have B = C. ♠
1.7. Uniqueness of the reduced echelon form 41
2x + 3y + z = 12 x + 2y = 7
x − 2y + 4z = −1 3x − y + 7z = 7
x + 2z = 3, y − z = 2.
Since both systems have the same reduced echelon form, they are row equivalent. ♠
x − 2y − 5z = 0 2x + 2y + z = 0
x+z = 0 x + y + 3z = 0
x + y + 4z = 0, −x − y + 2z = 0.
Since the two systems have different reduced echelon forms, they are not row equivalent. Following the
proof of Lemma 1.44, we see that z is a free variable for the first system, but a pivot variable for the second
system. Therefore, there exists a solution of the first system with z = 1, namely (x, y, z) = (−1, −3, 1). But
there exists no solution for the second system with z = 1, and in particular, (x, y, z) = (−1, −3, 1) is not a
solution of the second system. ♠
42 Systems of linear equations
We finish this section by pointing out an important consequence of Theorem 1.45, namely that the rank of
a matrix is well-defined. Recall that in Definition 1.24, we defined the rank of a matrix A to be the number
of pivot entries of “any” echelon form of A. It was not clear, however, why different echelon forms of A
could not have different numbers of pivot entries. Now we can answer this question. By the Gauss-Jordan
algorithm, we know that every echelon form can be converted to a reduced echelon form without changing
the number or position of the pivots. Since the reduced echelon form is unique, it follows that all echelon
forms of A have the same number of pivot entries (and in fact the same pivot columns). Therefore, the
rank of A is a well-defined quantity.
Exercises
Exercise 1.7.1 The following are augmented matrices for four systems of equations. Determine which of
them, if any, are row equivalent.
1 3 5 1 12 1 2 4 −1 8 3 6 −3 1 6 1 2 4 1 10
−1 1 −1 2 5 2 4 3 1 15 2 4 2 1 13 0 1 1 1 5
2 0 4 −2 0 3 6 1 −1 8 0 0 1 0 2 2 1 5 0 8
Exercise 1.7.2 Find a tuple (x, y, z) that is a solution to one system of equations but not the other.
1 0 1 5 0 1 0 3 6 0
2 1 4 3 0 2 1 8 2 0
3 1 5 9 0 2 0 6 13 0
1.8 Fields
Outcomes
A. Solve systems of equations using scalars from a field other than the real numbers, such as Z2
or Z5 .
So far in this chapter, we have worked with real numbers: all of the scalars we used, for coefficients,
constant terms, variables, and parameters, were real numbers. But in fact, we have not used very many
properties of the real numbers, except for the fact that we can add, subtract, multiply, and divide them. For
example, we have never needed to take a square root or to compute a trigonometric function.
In fact, most of linear algebra only requires addition, subtraction, multiplication, and division. This
opens the door to doing linear algebra using other kinds of scalars besides the real numbers. For example,
we can do linear algebra over the rational numbers, complex numbers, or even over some more exotic
1.8. Fields 43
number systems that you will learn about in this section. A system of scalars that one can do linear
algebra with is called a field.
Properties (A1)–(A4) are about addition, properties (M1)–(M4) are about multiplication, and property (D)
is about both addition and multiplication. Here are some examples and non-examples of fields:
(c) The set Z of integers satisfies all field properties except for (M4). It is therefore not a field.
(d) The set N = {0, 1, 2, . . .} of natural numbers satisfies all field properties except (A4) and (M4).
It is therefore not a field.
A field doesn’t have to be infinite. The following is an example of a field with only two elements.
44 Systems of linear equations
+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1
This particular alternative arithmetic is called “arithmetic modulo 2”. In computer science, the
addition is also called the “logical exclusive or” operation, and multiplication is also called the
“logical and” operation. You can also think of 0 as “even” and 1 as “odd”, and not that odd plus
odd makes even. For example, we can calculate like this:
1 · ((1 + 0) + 1) + 1 = 1 · (1 + 1) + 1
= 1·0+1
= 0+1
= 1.
The binary digits form a field Z2 = {0, 1}, also called the field of integers modulo 2.
You can convince yourself that the 9 properties of fields are all satisfied by the integers modulo 2. This
is a bit tedious, but it can be checked by calculations. For example, to verify (A1), we have to check that
0 + 0 = 0 + 0, 0 + 1 = 1 + 0, 1 + 0 = 0 + 1, and 1 + 1 = 1 + 1. Perhaps the most interesting properties are
(A4) and (M4). For (A4), we can set (−0) = 0 and (−1) = 1. It may be surprising that (−1) = 1, but
you can check for yourself that 1 + (−1) = 1 + 1 = 0 when calculating modulo 2. For (M4), we can set
1−1 = 1.
When solving systems of linear equations, we only used addition, subtraction, multiplication, and
division. Therefore, we can solve systems of equations using the elements of any field as the scalars,
instead of the real numbers.
x+y = 0
x+z = 1
y + z = 1.
Solution. As usual, we write the augmented matrix of the system of equations, then reduce it to reduced
echelon form using elementary row operations. The only difference is that we will perform all arithmetic
operations modulo 2, rather than in the real numbers. The augmented matrix is:
1 1 0 0
1 0 1 1 .
0 1 1 1
1.8. Fields 45
The first pivot entry is the 1 in the upper left. We use a row operation to create a zero below it. Note that,
because we are working modulo 2, adding 1 and subtracting 1 is the same thing, so 0 − 1 = 1.
1 1 0 0 R ← R −R 1 1 0 0
1 0 1 1 2 ≃2 1 0 1 1 1 .
0 1 1 1 0 1 1 1
The next pivot entry is in row 2 and column 2. We create a zero below it by subtracting row 2 from row 3,
and a zero above it by subtracting row 2 from row 1:
1 1 0 0 R ← R −R 1 1 0 0 1 0 1 1
0 1 1 1 3 ≃3 2 0 1 1 1 R1 ←≃R1 −R2 0 1 1 1 .
0 1 1 1 0 0 0 0 0 0 0 0
The resulting system is in reduced echelon form. We can see that the system is consistent, because there is
no row whose left-hand side is zero and whose right-hand side is non-zero. We also see that there are two
pivot columns, and therefore two pivot variables, x and y. On the other hand, z is a free variable, so we set
it equal to a parameter: z = t. Notice that this time, the parameter t is not a real number, but an element
of Z2 . From the equation x + z = 1, we get x = 1 − z = 1 + t. Can you guess why I have written 1 + t
instead of 1 −t? This is because (−1) = 1 in the integers modulo 2. So 1 −t = 1 + (−1)t = 1 + 1t = 1 +t.
Similarly, from the equation y + z = 1, we get that y = 1 + t. Therefore, the general solution to the system
of equations is
x = 1+t
y = 1+t
z = t,
where t ∈ {0, 1} is an arbitrary parameter. Recall that this means that each time we plug in a particular
value for t, we get a solution.
There is one difference between solving equations in the real numbers and solving equations in Z2 .
In the real numbers, a system of equations has either no solution, a unique solution, or infinitely many
solutions. This is because when there is a parameter, we automatically get infinitely many solutions. By
contrast, in Z2 , there are only two scalars, and therefore only two possible values for the parameter t,
namely t = 0 and t = 1. For t = 0 we get the solution (x, y, z) = (1, 1, 0), and for t = 1 we get the solution
(x, y, z) = (0, 0, 1). Thus, when the general solution has one parameter in Z2 , there are only two solutions,
instead of infinitely many. ♠
46 Systems of linear equations
Each light is also a button. When a button is pressed, its own light, and all the lights neighboring it
(i.e., above, below, to the left and to the right) are toggled (i.e., any light that was off is turned on
and vice versa). Figure out which buttons to press to turn off all the lights if the starting position is
as shown above.
Solution. We number the lights and buttons from top to bottom, left to right, like this:
1 2 3
4 5 6 .
7 8 9
Let xi be a variable in Z2 , corresponding to the event “button i is pressed” (or more precisely, “button i is
pressed an odd number of times”, because pressing a button twice is the same as not pressing it at all. That
is why we are working modulo 2). The light in position 1 is initially on. It is toggled each time buttons 1,
2, and 4 are pressed, i.e., it is toggled x1 + x2 + x4 times. We want this light to be off in the end. So we
must have 1 + x1 + x2 + x4 = 0. Similarly, the light in position 2 is initially off. To ensure that it stays off,
we must have 0 + x1 + x2 + x3 + x5 = 0. In this way, we obtain 9 linear equations in 9 variables:
1 + x1 + x2 + x4 = 0
0 + x1 + x2 + x3 + x5 = 0
1 + x2 + x3 + x6 = 0
0 + x1 + x4 + x5 + x7 = 0
1 + x2 + x4 + x5 + x6 + x8 = 0
0 + x3 + x5 + x6 + x9 = 0
0 + x4 + x7 + x8 = 0
0 + x5 + x7 + x8 + x9 = 0
1 + x6 + x8 + x9 = 0.
If we write this system in standard form, we obtain the following augmented matrix:
1 1 0 1 0 0 0 0 0 1
1 1 1 0 1 0 0 0 0 0
0 1 1 0 0 1 0 0 0 1
1 0 0 1 1 0 1 0 0 0
0 1 0 1 1 1 0 1 0 1 .
0 0 1 0 1 1 0 0 1 0
0 0 0 1 0 0 1 1 0 0
0 0 0 0 1 0 1 1 1 0
0 0 0 0 0 1 0 1 1 1
1.8. Fields 47
We solve this system of equations by doing Gauss-Jordan elimination with scalars in Z2 . The reduced
echelon form is
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 1
0 0 0 0 1 0 0 0 0 1
.
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 1 0
The unique solution is (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ) = (0, 0, 1, 1, 1, 0, 0, 1, 0). This means that we must press
buttons 3, 4, 5, and 8. ♠
0
4 1
3 2
If we want to calculate 3 o’clock plus 4 hours, we get 2 o’clock, because whenever the clock reaches
5, it resets to 0. This is how addition and multiplication modulo 5 are defined:
+ 0 1 2 3 4 · 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
We note that the integers modulo 5 form a field. Most of the properties are tedious but easy to verify.
Perhaps the most interesting of the field properties is (M4). It says that for each non-zero element a, there
is another element a−1 such that aa−1 = 1. By looking at the multiplication table, we see that 1 · 1 = 1,
2 · 3 = 1, 3 · 2 = 1, and 4 · 4 = 1. Therefore we can set 1−1 = 1, 2−1 = 3, 3−1 = 2, and 4−1 = 4.
Solution. There are no fractions in Z5 . The key to dividing is this: instead of dividing by a, multiply by
a−1 . So we have:
2/3 = 2 · 3−1 = 2 · 2 = 4.
So 2 divided by 3 equals 4. This makes sense, because 4 times 3 equals 2, when calculating modulo 5.
♠
2x + z = 1
x + 4y + z = 3
x + 2y + 3z = 2.
Solution. We perform the usual Gauss-Jordan algorithm on the augmented matrix. The only thing to keep
in mind is that, instead of dividing a row by a, we should multiply it by a−1 . And of course, we should
reduce all intermediate results modulo 5. For example, to change the first pivot entry from 2 to 1, we
multiply by 2−1 = 3, instead of dividing by 2.
2 0 1 1 R ← 3R 1 0 3 3 RR2 ← R2 −R1
← R −R
1 0 3 3
R ← 4R
1 0 3 3
1 4 1 3 1 ≃ 1 1 4 1 3 3 ≃3 1 0 4 3 0 2 ≃ 2 0 1 2 0
1 2 3 2 1 2 3 2 0 2 0 4 0 2 0 4
1 0 3 3 R1 ← R1−3R3 1 0 0 1
R3 ← R3 −2R2 R ← R −2R
≃ 0 1 2 0 2 ≃2 3 0 1 0 2 .
0 0 1 4 0 0 1 4
The final matrix is in reduced echelon form, and we see that the system has the unique solution (x, y, z) =
(1, 2, 4). Please double-check the solution with respect to the original equations. ♠
Solution. The integers modulo 6 are the set Z6 = {0, 1, 2, 3, 4, 5}, with the addition and multiplication
modulo 6:
+ 0 1 2 3 4 5 · 0 1 2 3 4 5
0 0 1 2 3 4 5 0 0 0 0 0 0 0
1 1 2 3 4 5 0 1 0 1 2 3 4 5
2 2 3 4 5 0 1 2 0 2 4 0 2 4
3 3 4 5 0 1 2 3 0 3 0 3 0 3
4 4 5 0 1 2 3 4 0 4 2 0 4 2
5 5 0 1 2 3 4 5 0 5 4 3 2 1
Then Z6 satisfies all of the field axioms except (M3). To see why (M3) fails, let a = 2, and note, by looking
at the multiplication table, that there is no b ∈ Z6 such that ab = 1. Therefore, Z6 is not a field. ♠
1.8. Fields 49
Another field that is very useful in mathematics and the natural sciences is the field C of complex num-
bers. You can read about the complex numbers in Appendix A.
x − y + z = −1 + i,
x + iy + 3z = 1 + 3i.
Solution. We perform Gauss-Jordan elimination on the augmented matrix. When multiplying or dividing,
we have to use complex number arithmetic. For example,
2 2 1 − i 2 − 2i
= · = = 1 − i.
1+i 1+i 1−i 2
The row operations are:
1 −1 1 −1 + i R2 ← R2 −R1 1 −1 1 −1 + i R2 ← R2 /(1+i) 1 −1 1 −1 + i
≃ ≃
1 i 3 1 + 3i 0 1 + i 2 2 + 2i 0 1 1−i 2
R1 ← R1 +R2 1 0 2−i 1+i
≃ .
0 1 1−i 2
Therefore the general solution is z = t, y = 2 − (1 − i)t, x = 1 + i − (2 − i)t, where t ∈ C is a parameter,
i.e., t is any complex number. In vector form, the general solution is
x 1+i −2 + i
y = 2 + t −1 + i .
z 0 1
♠
Exercises
Exercise 1.8.1 Solve each of the following systems of equations with scalars in Z2 . If there is more
than one solution, write the general solution in parametric form and also write down all of the solutions
individually. How many solutions are there?
50 Systems of linear equations
0 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1
(a) 1 0 0 1 1 (b) 1 0 1 1 (c) 1 0 1 0 0 (d) 1 0 1 1
1 1 1 0 1 1 1 0 1 1 1 0 1 0 1 1 1 0
Exercise 1.8.2 Solve each of the following systems of equations with scalars in Z3 . How many solutions
does each system have?
0 1 1 1 2 1 2 1 0 1 2 0 2
(a) 1 2 0 1 (b) 0 2 1 1 2 (c) 1 0 1 1
2 0 2 1 1 0 0 1 1 1 1 1 0
Exercise 1.8.3 Solve each of the following systems of equations with scalars in Z5 .
0 2 1 4 0 1 2 4 1
(a) 1 1 2 3 2 (b) 3 0 1 1
2 4 0 0 4 2 4 3 2
Exercise 1.8.4 In Z7 , calculate 1−1 , 2−1 , 3−1 , 4−1 , 5−1 , and 6−1 . Hint: write down the multiplication
table.
Exercise 1.8.5 Consider a game similar to Example 1.52, with 6 lights arranged in a rectangle:
Again, each light doubles as a button. Pressing it toggles its own light, as well as all of its neighbors.
Which buttons do you have to press to turn off all the light from the starting position shown above? Is the
answer unique? Is every pattern of lights reachable from this starting position?
Exercise 1.8.6 Solve each of the following systems of equations with scalars in the complex numbers.
1 i 1 + i −1 + 2i
1 1 1+i 2
(a) (b) 1 + i 2 1 − i 4 + 4i
1 + i 2 + i 3i 3 + 2i
1 −1 + i −i 0
1.9. Application: Balancing chemical reactions 51
The tools of linear algebra can also be used in the subject area of Chemistry, specifically for balancing
chemical reactions. Consider the chemical reaction
SnO2 + H2 → Sn + H2 O.
Here the elements involved are tin (Sn), oxygen (O), and hydrogen (H). A chemical reaction occurs that
transforms a combination of tin dioxide (SnO2 ) and hydrogen (H2 ) into a combination of tin (Sn) and
water (H2 O). When considering chemical reactions, we want to investigate how much of each substance
we began with and how much of each substance is involved in the result.
An important theory we will use here is the mass balance theory. It tells us that we cannot create
or delete elements within a chemical reaction. For example, in the above expression, we must have the
same number of atoms of oxygen, tin, and hydrogen on both sides of the reaction. Notice that this is not
currently the case. For example, there are two oxygen atoms on the left and only one on the right. In order
to fix this, we want to find numbers x, y, z, w such that
x SnO2 + y H2 → z Sn + w H2 O,
where both sides of the reaction have the same number of atoms of the various elements.
This is a familiar problem. We can solve it by setting up a system of equations in the variables x, y, z, w.
Thus we need
Sn : x = z
O : 2x = w
H : 2y = 2w.
We can rewrite these equations as
Sn : x − z = 0
O : 2x − w = 0
H : 2y − 2w = 0.
The augmented matrix for this system of equations is given by
1 0 −1 0 0
2 0 0 −1 0 .
0 2 0 −2 0
The reduced echelon form of this matrix is
1 0 0 − 21 0
0 1 0 −1 0 ,
0 0 1 − 21 0
and the solution is given in parametric form as
x = 21 t
y=t
z = 12 t
w = t.
52 Systems of linear equations
For example, let t = 2 and this would yield x = 1, y = 2, z = 1, and w = 2. We can put these values back
into the expression for the reaction which yields
Observe that each side of the expression contains the same number of atoms of each element. This means
that the chemical reaction is balanced. Of course, because it is a homogeneous system of equations, any
multiple of a solution is also a solution. For example,
is also correct. It just means that we have just doubled the amount of every substance involved. In
chemistry, the numbers you are finding would typically be the number of mols of the molecules on each
side. Thus one mol of SnO2 added two mols of H2 yields one mol of Sn and two mols of H2 O.
Here is another example.
Solution. We will use the same procedure as above to solve this problem. We need to find values for
x, y, z, w such that
x KOH + y H3 PO4 → z K3 PO4 + w H2 O.
preserves the total number of atoms of each element. Finding these values can be done by finding the
solution to the following system of equations.
K: x = 3z
O: x + 4y = 4z + w
H: x + 3y = 2w
P: y = z.
x=t
y = 31 t
z = 13 t
w=t
Choose a value for t, say 3. This yields x = 3, y = 1, z = 1, and w = 3. It follows that the balanced reaction
is given by
3KOH + H3 PO4 → K3 PO4 + 3H2 O
Note that this results in the same number of atoms of each element on both sides. ♠
Exercises
This section shows how solving systems of equations can be used to determine appropriate dimensionless
variables. It is only an introduction to this topic and considers a specific example of a simple airplane
wing shown below. We assume for simplicity that it is a flat plane at an angle to the wind which is blowing
against it with speed V as shown.
A
θ B
54 Systems of linear equations
The angle θ is called the angle of incidence, B is the span of the wing and A is called the chord.
Denote by l the lift. Then this should depend on various quantities like θ ,V , B, A and so forth. Here is a
table which indicates various quantities on which it is reasonable to expect l to depend.
Here m denotes meters, s refers to seconds and kg refers to kilograms. All of these are likely familiar
except for µ , which we will discuss in further detail now.
Viscosity is a measure of how much internal friction is experienced when the fluid moves. It is roughly
a measure of how “sticky" the fluid is. Consider a piece of area parallel to the direction of motion of the
fluid. To say that the viscosity is large is to say that the tangential force applied to this area must be large
in order to achieve a given change in speed of the fluid in a direction normal to the tangential force. Thus
Hence m
(units of µ ) m 2
= kg s−2 m
sm
Thus the units of µ are
kg s−1 m−1
as claimed above.
Returning to our original discussion, you may think that we would want
l = f (A, B, θ ,V ,V0 , ρ , µ )
This is very cumbersome because it depends on seven variables. Also, it is likely that without much care,
a change in the units such as going from meters to centimeters would result in an incorrect value for l.
The way to get around this problem is to look for l as a function of dimensionless variables multiplied by
something which has units of force. It is helpful because first of all, you will likely have fewer independent
variables and secondly, you could expect the formula to hold independent of the way of specifying length,
mass and so forth. One looks for
l = f (g1 , . . . , gk )ρ V 2 AB
where the units of ρ V 2 AB are
kg m 2 2 kg × m
m =
m3 s s2
which are the units of force. Each of these gi is of the form
and each gi is independent of the dimensions. That is, this expression must not depend on meters, kilo-
grams, seconds, etc. Thus, placing in the units for each of these quantities, one needs
mx1 mx2 (mx4 s−x4 )(mx5 s−x5 )(kg m−3 )x6 (kg s−1 m−1 )x7 = m0 kg0 s0
Notice that there are no units of θ because it is just the radian measure of an angle. Hence its dimensions
consist of length divided by length, thus it is dimensionless. Then this leads to the following equations for
the xi .
m : x1 + x2 + x4 + x5 − 3x6 − x7 = 0
s: −x4 − x5 − x7 = 0
kg : x6 + x7 = 0
The augmented matrix for this system is
1 1 0 1 1 −3 −1 0
0 0 0 1 1 0 1 0
0 0 0 0 0 1 1 0
The reduced echelon form is given by
1 1 0 0 0 0 1 0
0 0 0 1 1 0 1 0
0 0 0 0 0 1 1 0
and so the solutions are of the form
x1 = −x2 − x7
x3 = x3
x4 = −x5 − x7
x6 = −x7
Thus, in terms of vectors, the solution is
x1 −x2 − x7
x2 x2
x3 x3
x4 = −x5 − x7
x5 x5
x6 −x7
x7 x7
Thus the free variables are x2 , x3 , x5 , x7 . By assigning values to these, we can obtain dimensionless vari-
ables by placing the values obtained for the xi in the formula (1.16). For example, let x2 = 1 and all the
rest of the free variables are 0. This yields
x1 = −1, x2 = 1, x3 = 0, x4 = 0, x5 = 0, x6 = 0, x7 = 0
The dimensionless variable is then A−1 B1 . This is the ratio between the span and the chord. It is called
the aspect ratio, denoted as AR. Next let x3 = 1 and all others equal zero. This gives for a dimensionless
quantity the angle θ . Next let x5 = 1 and all others equal zero. This gives
x1 = 0, x2 = 0, x3 = 0, x4 = −1, x5 = 1, x6 = 0, x7 = 0
56 Systems of linear equations
Then the dimensionless variable is V −1V01 . However, it is written as V /V0 . This is called the Mach number
M . Finally, let x7 = 1 and all the other free variables equal 0. Then
then the dimensionless variable which results from this is A−1V −1 ρ −1 µ . It is customary to write it as
Re = (AV ρ )/µ . This one is called the Reynold’s number. It is the one which involves viscosity. Thus we
would look for
l = f (Re, AR, θ , M ) kg × m / s2
This is quite interesting because it is easy to vary Re by simply adjusting the velocity or A but it is hard to
vary things like µ or ρ . Note that all the quantities are easy to adjust. Now this could be used, along with
wind tunnel experiments, to get a formula for the lift that would be reasonable. You could also consider
more variables and more complicated situations in the same way.
Exercises
Exercise 1.10.1 In this section, we observed that ρ V 2 AB has the units of force. Describe a systematic
way to obtain such combinations of the variables that will yield something that has the units of force.
The tools of linear algebra can be used to study the application of resistor networks. An example of an
electrical circuit is below.
2Ω
18 V
−
+ I1 4Ω
2Ω
The jagged lines ( ) denote resistors and the numbers next to them give their resistance in ohms,
written as Ω. The voltage source ( ) causes the current to flow in the direction from the longer of
3
the two lines toward the shorter . Voltage is measured in volts, written as V. The current for a circuit is
labelled Ik , and is measured in amperes, written as A.
3 By
current, we always mean the conventional current, which flows from plus to minus. It is the opposite of the electron
flow, which goes from minus to plus.
1.11. Application: Resistor networks 57
In the above figure, the current I1 has been labelled with an arrow in the counterclockwise direction.
This is an entirely arbitrary decision and we could have chosen to label the current in the counterclockwise
direction. With our choice of direction here, we define a positive current to flow in the counterclockwise
direction and a negative current to flow in the clockwise direction.
The goal of this section is to use the values of resistors and voltage sources in a circuit to determine
the current. An essential theorem for this application is Kirchhoff’s law.
Kirchhoff’s law allows us to set up a system of linear equations and solve for any unknown variables.
When setting up this system, it is important to trace the circuit in the counterclockwise direction. If a
resistor or voltage source is crossed against this direction, the related term must be given a negative sign.
We will explore this in the next example where we determine the value of the current in the initial
diagram.
2Ω
18 V
−
+ I1
2Ω
4Ω
Solution. Begin in the bottom left corner, and trace the circuit in the counterclockwise direction. At the
first resistor, multiplying resistance and current gives 2I1 . Continuing in this way through all three resistors
gives 2I1 + 4I1 + 2I1 . This must equal the voltage source in the same direction. Notice that the direction
of the voltage source matches the counterclockwise direction specified, so the voltage is positive.
Therefore the equation and solution are given by
27 V
− + 3Ω
4Ω
I1 1Ω
6Ω
Solution. Begin in the top left corner this time, and trace the circuit in the counterclockwise direction.
At the first resistor, multiplying resistance and current gives 4I1 . Continuing in this way through the four
resistors gives 4I1 + 6I1 + 1I1 + 3I1 . This must equal the voltage source in the same direction. Notice that
the direction of the voltage source is opposite to the counterclockwise direction, so the voltage is negative.
Therefore the equation and solution are given by
A more complicated example follows. Two of the circuits below may be familiar; they were examined in
the examples above. However as they are now part of a larger system of circuits, the answers will differ.
27 V
2Ω − + 3Ω
18 V
−
+
I2 4 Ω
I3 1Ω
2Ω 6Ω
23 V
+
−
I1 1 Ω
I4 2Ω
5Ω 3Ω
1.11. Application: Resistor networks 59
Solution. Starting with the top left circuit, multiply the resistance by the current and sum the resulting
products. Specifically, consider the resistor labelled 2 Ω that is part of the circuits of I1 and I2 . Notice that
current I2 runs through this in a positive (counterclockwise) direction, and I1 runs through in the opposite
(negative) direction. The product of resistance and current is then 2(I2 − I1 ) = 2I2 − 2I1 . Continue in this
way for each resistor, and set the sum of the products equal to the voltage source to write the equation:
2I2 − 2I1 + 4I2 − 4I3 + 2I2 = 18.
The above process is used on each of the other three circuits, and the resulting equations are:
Upper right circuit:
4I3 − 4I2 + 6I3 − 6I4 + I3 + 3I3 = −27.
Lower right circuit:
3I4 + 2I4 + 6I4 − 6I3 + I4 − I1 = 0.
Lower left circuit:
5I1 + I1 − I4 + 2I1 − 2I2 = −23.
Notice that the voltage for the upper right and lower left circuits are negative due to the clockwise
direction they indicate. The resulting system has four equations in four variables. Simplifying and rear-
ranging with variables in order, we have:
−2I1 + 8I2 − 4I3 = 18,
−4I2 + 14I3 − 6I4 = −27,
−I1 − 6I3 + 12I4 = 0,
8I1 − 2I2 − I4 = −23.
The augmented matrix is
−2 8 −4 0 18
0 −4 14 −6 −27
.
−1 0 −6 12 0
8 −2 0 −1 −23
The solution to this system of equations is
I1 = −3 A,
1
I2 = A,
4
5
I3 = − A,
2
3
I4 = − A .
2
This tells us that currents I1 , I3 , and I4 travel clockwise while I2 travels counterclockwise. ♠
Exercises
20 V
3Ω − + 1Ω
5V
−
+ I2 5 Ω
I3 1Ω
2Ω 6Ω
10 V
+
− I1 1 Ω
I4 3Ω
4Ω 2Ω
The current in amperes in the four circuits is denoted by I1 , I2 , I3 , and I4 . It is understood that a positive
current means a current flowing in the counterclockwise direction. If Ik ends up being negative, then it just
means the current flows in the clockwise direction. In the above diagram, the top left circuit should give
the equation
2I2 − 2I1 + 5I2 − 5I3 + 3I2 = 5.
Write equations for each of the other three circuits and then give a solution to the resulting system of
equations.
Exercise 1.11.2 Find I1 , I2 , and I3 , the counterclockwise currents in amperes in the three circuits of the
following diagram.
12 V
3Ω − + 7Ω
10 V
−
+ I1 5 Ω
I2 3Ω
2Ω 1Ω
2Ω
I3 4Ω
4Ω
2. Vectors in Rn
Outcomes
A. Understand the geometric and algebraic meaning of points and vectors in Rn .
In this section, we define points and vectors in n-dimensional space, and discuss some of their interpreta-
tions. We start with a brief review of Cartesian coordinate systems.
Points in n-dimensional space. You are probably already familiar with Cartesian coordinates, which let
you describe points in 2- or 3-dimensional space. Consider the familiar coordinate plane, with an x-axis
and a y-axis. Any point within this coordinate plane is identified by its x- and y-coordinates. For example,
the point P in the following diagram has x-coordinate 2 and y-coordinate 1. We write these coordinates
as an ordered pair P = (2, 1). Here, “ordered” means that the x-coordinate comes first, and then the y-
coordinate, i.e., (1, 2) is not the same point as (2, 1). Coordinates can be positive, negative, or zero. The
special point with coordinates (0, 0) is called the origin of the coordinate system, and also written as 0.
y
Q = (−3, 4)
4
P = (2, 1)
1
x
−3 2
The situation in 3 dimensions is analogous. Here, the coordinate system has three axes, and each point
is described by a triple of coordinates, which we can write as (x, y, z). We can extend these ideas beyond
n = 3. A coordinate system for n-dimensional space has n axes, which we may call x1 , . . . , xn (as there
are not enough letters in the alphabet to continue after z). A point of n-dimensional space is described by
an ordered n-tuple (x1 , . . . , xn ) of coordinates. For example, P = (2, 1, 0, −1) is a point in 4-dimensional
space which has x1 -coordinate 2, x2 -coordinate 1, and so on. While most people cannot really picture
61
62 Vectors in Rn
space beyond 3 dimensions, it is easy to imagine tuples of n real numbers. Thus, although we may not be
able to “see” the points in higher dimensions, we can still talk about their coordinates.
Vectors in n-dimensional space. Unlike a point, which describes a location in a coordinate system, a
vector describes an offset or a distance and direction. We usually picture a vector as an arrow, starting at
one point (called the tail of the arrow) and ending at another point (called the tip of the arrow).
Two vectors are considered equal if they have the same direction and length. Thus, all four blue arrows
in the above image describe exactly the same vector. Mathematically, a vector in 2-dimensional space is
described as an offset in the x-direction and an offset in the y-direction. For example, a certain vector v
may be described by the instruction: “move 4 units in the direction parallel to the x-axis, and move 3 units
in the direction parallel to the y-axis”. This situation is pictured here:
v
3 units
4 units
The numbers 4 and 3 are also called the x-component and the y-component of the vector. Notice that
a point has “coordinates”, but a vector has h“components”.
i We write the components of a vector as an
4
ordered column within square brackets: v = 3 . Note that components can also be negative; for example,
a negative x-component indicates to move left instead of right, and a negative y-component indicates to
move down instead of up. The vector with all components equal to 0 is called the zero vector, and is
written 0.
The situation in 3 dimensions is similar. Here, a vector is described by three components,
hxi namely, its
x-component, y-component, and z-component. The three components are written as yz . The same idea
generalizes to n-dimensional vectors when n is greater than 3.
We write Rn for the set of all n-dimensional column vectors. It is also known as n-dimensional
Euclidean space.
2.1. Points and vectors 63
Vectors are usually denoted by boldface lower-case letters such as v, w, a, b. Some people write a small
arrow above the vector, but we do not do this here.
Vectors from points. If Q and P are two points in n-dimensional space, we can define a vector from Q
−→
to P. This vector is written QP, and is described by the arrow whose tail is at Q and whose tip is at P, as
in the following picture:
−→
QP
P
Q
If the point Q has coordinates (q1 , . . . , qn ) and the point P has coordinates (p1 , . . . , pn ), then the components
−→
of QP are
p1 − q1
−→ ..
QP = . .
pn − qn
An important special case of this is the case when the point Q is the origin. The following definition is
concerned with that situation.
p P
If the point P has coordinates (p1 , . . . , pn ), then the components of the position vector are
p1
p = ... .
pn
Thus, the coordinates of a point are the same as the components of its position vector. For this
reason, the position vector is also sometimes called the coordinate vector of P.
64 Vectors in Rn
Points from vectors. Conversely, given any vector p, we may find a point P that has p as its position
vector. To do so geometrically, we first have the move the vector p around until its tail is at the origin. The
point P will then be located at its tip.
p
p P
p
then the point P will have coordinates (p1 , . . . , pn ). This is just the opposite process of Definition 2.2.
So although we went to some lengths to point out that vectors and points are different geometric
objects, as soon as an origin of a coordinate system has been fixed, we can always talk about a point by
talking about its coordinate vector. We will systematically do so, and eventually the distinction between a
point and its coordinate vector will become blurred, so that we will be able to talk about Rn as “a set of
points” or “a set of vectors” interchangeably.
Equality of vectors. Two vectors are equal precisely when all corresponding components are equal. In
symbols, if
u1 v1
u = ... and v = ... ,
un vn
then u = v if and only if u1 = v1 and u2 = v2 and . . . and un = vn .
Notation. In the text, it is often awkward to write column vectors, because they take up so much space.
To save space, we sometimes use a superscript “T ” to denote a column vector. For example, we write
T
1 2 3 , or sometimes [1, 2, 3]T , to denote the vector
1
2 .
3
The letter “T ” stands for “transpose”. To transpose a vector means to turn a row into a column or vice
versa.
2.2. Addition 65
Exercises
Exercise 2.1.3 Find x and y so that u = [5x − 3y, 4]T and v = [2x − 2y, 2y]T are equal in R2 .
2.2 Addition
Outcomes
A. Compute sums and differences of vectors algebraically and geometrically.
B. Use the laws of vector addition to prove equalities between vector expressions.
To add vectors, we simply add corresponding components. Therefore, in order to add vectors, they must
be the same size. For example, [1, 2, 3]T + [4, 5, 6]T = [1 + 4, 2 + 5, 3 + 6]T = [5, 7, 9]T .
The geometric significance of vector addition in Rn is given in the following proposition.
66 Vectors in Rn
u+v
v
Solution. In the following diagram, the vectors u and v form a parallelogram. Therefore, whether we line
up the tail of u with the tip of v or vice versa, we obtain the same vector, which is both u + v and v + u.
u
v
v+u
u+v
v
u
Geometrically, the vector −u has the same magnitude as u, but the opposite direction.
u −u
To define the subtraction of two vectors, we simply regard u − v as an abbreviation for u + (−v),
exactly as we do with real numbers. Algebraically, this just amounts to componentwise subtraction:
u1 v1 u1 − v1
.. .. ..
. − . = . .
un vn un − vn
2.2. Addition 67
v
u
Solution. We will first sketch u + v. Begin by drawing u and then at the point of u, place the tail of v as
shown. Then u + v is the vector which results from drawing a vector from the tail of u to the tip of v.
v
u
u+v
Next consider u − v. This means u + (−v). From the above geometric description of vector addition,
−v is the vector which has the same length but which points in the opposite direction to v. Here is a picture
of −v
−v
−v
u−v
u
Given any two vectors u and v one can create a parallelogram with sides these vectors and diagonals
u + v and v − u:
v
u
u+v
u−v u
v
68 Vectors in Rn
♠
Addition of vectors satisfies some important properties which are outlined in the following proposition.
Recall that 0 is the zero vector, the vector from Rn in which all components are equal to 0.
(u + v) + w = u + (v + w).
u + (−u) = 0.
Exercises
1 1 −1
Exercise 2.2.1 Find 2 + 5 + 2 .
3 1 −4
Exercise 2.2.2 Use the properties of vector addition from Proposition 2.8 to show the following equalities.
Justify every step.
(a) (u + v) + w = (v + w) + u.
(b) (u + 0) + (v + (−u)) = v.
2.3. Scalar multiplication 69
Outcomes
A. Multiply a scalar by a vector algebraically and geometrically.
B. Use the laws of scalar multiplication to prove equalities between vector expressions.
For example 3 [1, 2, 3]T = [3, 6, 9]T and −2 [1, 2, 3]T = [−2, −4, −6]T .
Solution. Here is a picture of the seven vectors. We draw their tails in different places to make their
relationship easier to see.
1
2u u 2u 0u − 12 u −u −2u
We see that the vector ku has the same direction as u when k is positive, and the opposite direction when k
is negative. Further, the length of the vector is scaled by a factor of |k|. It increases if |k| > 1 and decreases
if |k| < 1. For example, the vector 2u is exactly twice as long as u. (It is because of this scaling property
that scalars are called scalars). ♠
Just as with addition, scalar multiplication of vectors satisfies several important properties. These are
outlined in the following proposition.
70 Vectors in Rn
k(u + v) = ku + kv.
(k + ℓ)u = ku + ℓu.
k(ℓu) = (kℓ)u.
k(u + v) = k [u1 + v1 , . . . , un + vn ]T
= [k(u1 + v1 ), . . ., k(un + vn )]T
= [ku1 + kv1 , . . . , kun + kvn ]T
= [ku1 , . . . , kun ]T + [kv1 , . . . , kvn ]T
= ku + kv.
Exercises
u
v
5 −8
−1
Exercise 2.3.2 Find −3 + 5 2 .
2 −3
−3 6
Exercise 2.3.3 Use the properties of scalar multiplication from Proposition 2.11 and the properties of
vector addition from Proposition 2.8 to prove the following equalities. Justify every step.
(b) 0u = 0.
Outcomes
A. Compute linear combinations of vectors algebraically and geometrically.
Now that we have studied both vector addition and scalar multiplication, we can combine the two opera-
tions. You may remember that when we talked about the solutions to homogeneous systems of equations
in Section 1.6, we briefly mentioned that the general solution of a homogeneous system is a linear combi-
nation of its basic solutions. We now return to the concept of a linear combination.
For the specific case of R3 , there are three special vectors which we often use. They are given by
1 0 0
i = 0 , j = 1 , and k = 0 .
0 0 1
We can write any vector u = [a1 , a2 , a3 ]T as a linear combination of these vectors, namely
u = a1 i + a2 j + a3 k.
Solution. This question can be rephrased as: can we find scalars x, y, z such that
2x + y + 3z = 1
2x + 6y + 8z = 3
6x + 8y + 18z = 5.
1 R1
2 6 8 3 2
R ↔R
1 3 4 32 1
R 1 3 4 32 RR1 ← R1 −4R3
4 3 2 ← R2 −9R3
0 0 4 0 2≃ 3 0 5 9 2 ≃ 0 5 9 2 ≃
0 5 9 2 0 0 4 0 0 0 1 0
3
3
3
1 3 0 2 1
R2 1 3 0 2 1 0 0 10
0 5 0 2 5≃ 0 1 2 R1 ← ≃R1 −3R2
0 5 0 1 0 25 .
0 0 1 0 0 0 1 0 0 0 1 0
We are in the case where we have a unique solution:
3
x=
10
2
y=
5
z = 0.
This means that v is a linear combination of u1 , u2 , and u3 :
3 2
v= u1 + u2 + 0u3 .
10 5
3 2
The coefficients are 10 , 5 , and 0. In fact, v is also a linear combination of just u1 and u2 . ♠
In the following example, we examine the geometric meaning of linear combinations.
− 12 v
u − 21 v
2v
u
u + 2v
♠
Given any two non-parallel vectors u and v in R2 , we can create a grid of their linear combinations. The
integer ones are pictured below. From this we can see that all vectors in R2 can be written as a linear
combination of u and v.
74 Vectors in Rn
3u + 3v
1u + 3v
1u + 3v
3u + 2v
0u + 3v
−1u + 3v 3u + 1v
−1u + 2v 3u + 0v
−1u + 1v v 3u − 1v
u
0 2u − 1v
−1u + 0v
1u − 1v
0u − 1v
−1u − 1v
Exercises
6 −13
0
Exercise 2.4.1 Find −7 + 6 −1 .
4 1
−1 6
Outcomes
A. Compute the distance between points in n-dimensional space.
E. Normalize a vector.
In this section, we explore what is meant by the length of a vector in Rn . We develop this concept by first
looking at the distance between two points in Rn . Consider two points P = (p1 , p2 ) and Q = (q1 , q2 ) in
the plane, as in the following picture.
P = (p1 , p2 )
Q = (q1 , q2 )
The distance between P and Q is shown in the picture as a solid line, which is the hypotenuse of a right
triangle. The lengths of the two other sides of this triangle are |p1 − q1 | and |p2 − q2 |. Therefore, the
Pythagorean Theorem implies the length of the hypotenuse (and thus the distance between P and Q)
equals q q
d(P, Q) = |p1 − q1 |2 + |p2 − q2 |2 = (p1 − q1 )2 + (p2 − q2 )2 .
Now consider two points P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) in 3-dimensional space.
P = (p1 , p2 , p3 )
R = (p1 , p2 , q3 )
Q = (q1 , q2 , q3 ) S = (p1 , q2 , q3 )
76 Vectors in Rn
We will use the Pythagorean Theorem twice to find the length of the solid line connecting P and Q. First,
by the Pythagorean Theorem applied to the right triangle QSR, the length of the line joining R and Q
equals q
d(R, Q) = (p1 − q1 )2 + (p2 − q2 )2 .
Second, by the Pythagorean Theorem applied to the triangle QRP, the length of the line joining P and Q
equals q q
d(P, Q) = d(R, Q)2 + (p3 − q3 )2 = (p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2
This discussion motivates the following definition for the distance between points in Rn .
In the following example, we use Definition 2.16 to find the distance between two points in R4 .
Solution. Let P = (p1 , p2 , p3 ) be such a point. Then P is the same distance from Q and R, thus d(P, Q) =
d(P, R). By the distance formula, we have
q q
(p1 − 1) + (p2 − 2) + (p3 − 3) = (p1 − 0)2 + (p2 − 1)2 + (p3 − 2)2 .
2 2 2
(p1 − 1)2 + (p2 − 2)2 + (p3 − 3)2 = p21 + (p2 − 1)2 + (p3 − 2)2 ,
2.5. Length of a vector 77
and so
(p21 − 2p1 + 1) + (p22 − 4p2 + 4) + (p23 − 6p3 + 9) = p21 + (p22 − 2p2 + 1) + (p23 − 4p3 + 4).
Simplifying, this becomes
−2p1 − 4p2 − 6p3 + 14 = −2p2 − 4p3 + 5,
which can finally be written as
2p1 + 2p2 + 2p3 = 9. (2.1)
Therefore, the points P = (p1 , p2 , p3 ) that are the same distance from Q and R form a plane whose equation
is given by (2.1). ♠
We can now use our understanding of the distance between two points to define what is meant by the
length of a vector.
The length of a vector is also sometimes called its magnitude or its norm.
This definition corresponds to Definition 2.16, if we consider the vector u to have its tail at the point
0 = (0, . . . , 0) and its
tip at
the point U = (u1 , . . . , un ). Then the length of u is equal to the distance between
−→
0 and U . In general,
PQ
= d(P, Q).
Reconsider Example 2.17. We could have also computed the distance between P and Q as the length
−→
of the vector connecting them. This vector is PQ = [1, 1, 3, −6]T , and its length is
−→
p √
PQ
= 12 + 12 + 32 + 62 = 47.
The following proposition states a few important properties of the length of vectors.
• kuk ≥ 0
kuk = 1.
Let v be a non-zero vector in Rn . Then there is a unit vector u that points in the same direction as v, but
has length 1. This vector is given by
1
u= v.
kvk
We often use the term normalize to refer to this process. When we normalize a vector, we find the
corresponding unit vector.
p √
Solution. We have kvk = 12 + (−3)2 + 42 =
26, and therefore
1 1 T 1 3 4 T
u= v = √ [1, −3, 4] = √ , − √ , √ .
kvk 26 26 26 26
♠
Exercises
Exercise 2.5.1 Find the distance between the points P = (0, 1, 3) and Q = (2, −1, 0) in R3 .
Exercise 2.5.2 Find the distance between the points P = (1, 3, −1, 0) and Q = (2, 2, 3, 3) in R4 .
Exercise 2.5.3 Describe the points in R3 that are equally distant from the two points Q = (1, 1, 1) and
R = (−1, −1, −1).
Exercise 2.5.4 Describe the points in R3 that have distance 1 from the origin.
Exercise 2.5.7 Prove that for all vectors u ∈ Rn , we have k−uk = kuk.
Outcomes
A. Compute the dot product of vectors geometrically and algebraically.
B. Use properties of the dot product, including the Cauchy-Schwarz inequality and the triangle
inequality, to prove further equalities and inequalities.
D. Compute the scalar and vector projection of one vector onto another.
There are two ways of multiplying vectors that are useful in applications. The first of these is called the
dot product, and the second is called the cross product. We will consider the dot product here, and the
cross product in the next section.
80 Vectors in Rn
When we take the dot product of two vectors, the result is a scalar. For this reason, the dot product is also
called the scalar product. Sometimes it is also called the inner product. The definition is as follows.
·
u v = u1 v1 + u2 v2 + . . . + un vn .
·
Find u v for u = [1, 2, 0, −1]T and v = [0, 1, 2, 3]T .
Solution. We have
·
u v = (1)(0) + (2)(1) + (0)(2) + (−1)(3)
= 0 + 2 + 0 + −3
= −1.
♠
The dot product satisfies a number of important properties.
· ·
• u v = v u.
• u·u = kuk . 2
The proof is left as an exercise. Note that, by the last part of the proposition, we can also use the dot
product to find the length of a vector.
2.6. The dot product 81
√
Solution. By the last part of Proposition
√ √ 2.25, we have kuk = · ·
u u. We have u u = 22 + 12 + 42 + 22 =
·
25, and therefore kuk = u u = 25 = 5. ♠
The Cauchy-Schwarz inequality is a fundamental inequality satisfied by the dot product. It is given in
the following proposition.
·
|u v| ≤ kuk kvk (2.2)
Furthermore equality is obtained if and only if one of u or v is a scalar multiple of the other.
Proof. First note that if u = 0, then both sides of (2.2) are equal to zero, and so the inequality holds in this
case. Therefore, we will assume in what follows that u 6= 0. Define a function of t ∈ R by
·
f (t) = (tu + v) (tu + v).
Then by Proposition 2.25, f (t) ≥ 0 for all t ∈ R. Also from Proposition 2.25, we have
·
f (t) = tu (tu + v) + v (tu + v) ·
· ·
= t 2 u u + t(u v) + tv u + v v · ·
2 2
= t kuk + 2t(u v) + kvk . · 2
This means the graph of y = f (t) is a parabola which opens upwards and is never negative. It follows
that this function has at most one root. From the quadratic formula, we know that a quadratic function
at 2 + bt + c has one or zero roots if and only if b2 − 4ac ≤ 0. Applying this reasoning to the function f (t),
we obtain
·
(2(u v))2 − 4 kuk2 kvk2 ≤ 0,
·
which is equivalent to |u v| ≤ kuk kvk. ♠
An important consequence of the Cauchy-Schwarz inequality is the so-called triangle inequality, which
states that the length of one side of a triangle is less than or equal the sum of the lengths of the two other
sides.
82 Vectors in Rn
u+v
v
u
Proof. By properties of the dot product and the Cauchy-Schwarz inequality, we have
·
ku + vk2 = (u + v) (u + v)
· · ·
= (u u) + (u v) + (v u) + (v v) ·
= kuk + 2(u·v) + kvk
2 2
Therefore,
ku + vk2 ≤ (kuk + kvk)2 .
Taking square roots of both sides, we obtain (2.3). ♠
kuk − kvk ≤ ku − vk
Solution. We have
kuk = k(u − v) + vk ≤ ku − vk + kvk ,
where we have used the triangle inequality in the last step. Note that this is an inequality between real
numbers. Bringing kvk to the other side of the equation, we have
kuk − kvk ≤ ku − vk .
The included angle of two vectors u and v is the angle θ between the vectors such that 0 ≤ θ ≤ π .
2.6. The dot product 83
u
θ
v
The dot product can be used to determine the included angle between two vectors.
In words, the dot product of two vectors equals the product of the magnitude (or length) of the two vectors
multiplied by the cosine of the included angle. Note that this gives a geometric description of the dot
product that does not depend explicitly on the coordinates of the vectors.
Therefore, we have
6 1
cos θ = √ = √ .
3 8 2
π
Taking the inverse cosine of both sides of the equation, we find that θ = 4 radians, or 45 degrees. ♠
Two non-zero vectors are said to be orthogonal, sometimes also called perpendicular, if the included
angle is π /2 radians (90◦ ). By convention, we also say that the zero vector is orthogonal to all vectors.
·
u v = 0.
Proof. If u or v is zero, the vectors are orthogonal by definition, and the dot product is 0 in that case, so the
proposition holds. Now assume u and v are both non-zero. Then by Proposition 2.30, we have u v = 0 ·
if and only if kuk kvk cos θ if and only if cos θ = 0. Recall that the included angle is between 0 and π .
Therefore, cos θ = 0 if and only if θ = π /2. ♠
are orthogonal.
Solution. In order to determine if these two vectors are orthogonal, we compute the dot product. We have
·
u v = (2)(1) + (1)(3) + (−1)(5) = 0,
2.6.5. Projections
It is sometimes important to find the component of a vector in a particular direction. Consider the following
picture:
2.6. The dot product 85
v
u
P
0
Here, u is a non-zero vector specifying a direction, and v is any vector. We have given the label Q to
the tip of v. The point P lies at the place along u that is closest to Q, or equivalently, such that (0, P, Q)
forms a right triangle. The distance from 0 to P (measured positively in the direction of u) is called the
−
→
component of v in the direction of u, and is denoted compu (v). The vector 0P is called the projection of
v onto u, and is denoted proju (v). We wish to find formulas for these quantities.
Let θ be the included angle between v and u. From trigonometry, considering the right triangle
(0, P, Q), we know that
|0P| |0P|
cos θ = = ,
|0Q| kvk
and therefore
|0P| = kvk cos θ . (2.4)
On the other hand, from Proposition 2.30, we have
·
u v = kuk kvk cos θ ,
and therefore
kvk cos θ = ·
u v
. (2.5)
kuk
Putting equations (2.4) and (2.5) together, we obtain the desired formula for the component of v in the
direction of u:
compu (v) = |0P| =
u v
. · (2.6)
kuk
Note that it is possible for this quantity to be negative; this happens when the angle between v and u is
obtuse. In this case, v will have a negative component along u.
−→
The vector proju (v) = 0P can now be computed by re-scaling u to the correct length. Specifically, we
first normalize u by dividing it by its own length, and then multiply by |0P|. In formulas:
→ |0P|
−
proju (v) = 0P = u=
u v
u= ·
u v
u. · (2.7)
kuk kuk2 u u ·
The following definition summarizes what we have just found.
86 Vectors in Rn
proju (v) = ·
u v
u=
u v ·u.
·
u u kuk2
These two operations are also called the scalar projection and vector projection, respectively.
·
Solution. We can use the formula provided in Definition 2.35 to find proju (v). First, compute u v. This
is given by
2 1
−4
·
3 −2 = (2)(1) + (3)(−2) + (−4)(1) = −8.
1
·
Similarly, u u is given by
2 2
−4 −4
·
3 3 = 22 + 32 + (−4)2 = 29.
An important application of projections is that every vector v can be uniquely written as a sum of two
orthogonal vectors, one of which is a scalar multiple of some given non-zero vector u, and the other of
which is orthogonal to u.
2.6. The dot product 87
Proof.
v v⊥
vk
u
vk = proju (v) = ·
u v
u,
·
u u
and define v⊥ = v − vk . By definition, (2.8) is satisfied, and vk is a scalar multiple of u. We must show
that v⊥ is orthogonal to u. For this, we verify that their dot product equals zero:
· ·
u v⊥ = u (v − vk )
·
= u v − u vk ·
= u v−
u v
· · ·
u u
u u ·
= u v−u v· ·
= 0.
To show uniqueness, suppose that (2.8) holds and vk = ku. Taking the dot product of both sides of (2.8)
·
with u and using u v⊥ = 0, this yields
· ·
u v = u (vk + v⊥ )
·
= u ku + u v⊥ ·
= k kuk2 ,
·
which implies k = u v/ kuk2 . Thus there can be no more than one such vector vk . Since v⊥ must equal
v − vk , it follows that there can be no more than one choice for both vk and v⊥ , proving their uniqueness.
♠
Exercises
1 2
2 0
Exercise 2.6.1 Find
·
3 1
.
4 3
·
Exercise 2.6.2 Let a, b be vectors. Show that (a b) = 14 (ka + bk2 − ka − bk2 ).
Exercise 2.6.3 Using the properties of the dot product, prove the parallelogram identity:
Exercise 2.6.4 Find cos θ where θ is the angle between the vectors
3 1
u = −1 , v = 4
−1 2
Exercise 2.6.5 Find cos θ where θ is the angle between the vectors
1 1
u = −2 , v = 2
1 −7
Exercise 2.6.6 Use the formula given in Proposition 2.30 to verify the Cauchy Schwarz inequality and to
show that equality occurs if and only if one of the vectors is a scalar multiple of the other.
Exercise 2.6.7 Show that the triangle with vertices A = (2, 0, −3), B = (5, −2, 1) and C = (7, 5, 3) is a
right triangle.
2.6. The dot product 89
1 1
Exercise 2.6.8 Find projv (w) where w = 0 and v = 2 .
−2 3
1 1
Exercise 2.6.9 Find projv (w) where w = 2 and v = 0 .
−2 3
1 1
2 2
Exercise 2.6.10 Find projv (w) where w = and v =
−2
.
3
1 0
1 0
Exercise 2.6.11 Find compv (w) where w = 1 and v = 3 .
2 3
1 1
Exercise 2.6.12 Find compv (w) where w = −1 and v = 1 .
0 2
Exercise 2.6.14 Decompose the vector v into v = a + b where a is parallel to u and b is orthogonal to u.
3 1
v = 2 , u = −1 .
−5 2
Exercise 2.6.15 Prove the Cauchy Schwarz inequality in Rn as follows. For u, v vectors, consider
·
(u − projv (u)) (u − projv (u)) ≥ 0
Simplify using the axioms of the dot product and then put in the formula for the projection. Notice that
this expression equals 0 and you get equality in the Cauchy Schwarz inequality if and only if u = projv (u).
What is the geometric meaning of u = projv (u)?
Exercise 2.6.16 Let v, w u be vectors. Show that (w + u)⊥ = w⊥ + u⊥ , where w⊥ = w − projv (w).
Outcomes
A. Compute the cross product of vectors algebraically and geometrically.
F. Use properties of the cross product and the dot product to prove algebraic equalities.
Unlike the dot product, the cross product is only defined in R3 , i.e., only in 3-dimensional space. The
cross product of two vectors is a vector.
We will first discuss the geometric meaning of the cross product, and then give an algebraic description.
Both descriptions are equally important: the geometric description is essential for applications in physics
and geometry, whereas the algebraic description is necessary for computing.
u
v w
w
2.7. The cross product 91
You should consider how a right-handed system would differ from a left-handed system. Try using your
left hand and you will see that the vector w would need to point in the opposite direction.
Recall the special vectors i = [1, 0, 0]T , j = [0, 1, 0]T , and k = [0, 0, 1]T we saw in Section 2.4. We
always assume that our coordinate system is drawn in such a way that the vectors i, j, k form a right-
handed system. Thus, if the thumb of your right hand points along the x-axis and your index finger points
along the y-axis, your middle finger should point along the z-axis.
j
i
When all three vectors lie in a plane, then we say that the vectors are coplanar. In this case, the system is
neither right-handed nor left-handed.
The following is the geometric description of the cross product. Recall that the dot product of two vectors
results in a scalar. In contrast, the cross product results in a vector, as the cross product gives a direction
as well as a magnitude.
1. Its length is ku × vk = kuk kvk sin θ , where θ is the included angle between u and v.
We note that the length of the cross product, ku × vk, given by the formula kuk kvk sin θ , is the area of the
parallelogram determined by u and v, as shown in the following picture.
v kvk sin(θ )
θ
u
92 Vectors in Rn
From its geometric description, we can prove that the cross product satisfies the following properties.
1. u × v = −(v × u).
2. u × u = 0.
4. u × (v + w) = u × v + u × w.
5. (v + w) × u = v × u + w × u.
Proof. Formula 1. follows immediately from the definition. The vectors u × v and v × u have the same
magnitude, kuk kvk sin θ , and an application of the right hand rule shows they have opposite direction.
Formula 2. is proven as follows. If k is a non-negative scalar, the direction of (ku) × v is the same as
the direction of u × v, k(u × v) and u × (kv). The magnitude is k times the magnitude of u × v which is the
same as the magnitude of k(u × v) and u × (kv). Using this yields equality in 2. In the case where k < 0,
everything works the same way except the vectors are all pointing in the opposite direction and you must
multiply by |k| when comparing their magnitudes.
The distributive laws, 3. and 4., are harder to establish. For now, we will content ourselves with
noticing that if we know that 3. is true, 4. follows. Namely, assuming 3., and using 1., we have
(v + w) × u = −u × (v + w)
= −(u × v + u × w)
= v × u + w × u.
♠
In turn, we can use the properties from Proposition 2.41 to get an algebraic description of the cross product.
We begin by determining the cross products of the special vectors i, j, and k. They are as follows:
i × j = k, j × i = −k, i × i = 0,
k × i = j, i × k = −j, j × j = 0,
j × k = i, k × j = −i, k × k = 0.
With this information and the laws of Proposition 2.41, we can compute the cross product of any two
vectors from their coordinates. Let
u1 v1
u = u2 and v = v2 .
u3 v3
Then we have:
u × v = (u1 i + u2 j + u3 k) × (v1 i + v2 j + v3 k)
2.7. The cross product 93
= u1 v1 (i × i) + u1 v2 (i × j) + u1 v3 (i × k)
+ u2 v1 (j × i) + u2 v2 (j × j) + u2 v3 (j × k)
+ u3 v1 (k × i) + u3 v2 (k × j) + u3 v3 (k × k)
= u1 v1 0 + u1 v2 k − u1 v3 j
− u2 v1 k + u2 v2 0 + u2 v3 i
+ u3 v1 j − u3 v2 i + u3 v3 0
= (u2 v3 − u3 v2 )i + (u3 v1 − u1 v3 )j + (u1 v2 − u2 v1 )k.
The resulting formula for the cross product is summarized in the following Proposition.
Solution. Notice that these vectors are the same as the ones given in Example 2.43. Recall from the
geometric description of the cross product that the area of the parallelogram is the magnitude of u × v.
94 Vectors in Rn
From Example 2.43, u × v = [3, 5, 1]T . Thus the area of the parallelogram is
p √
ku × vk = 32 + 52 + 12 = 35.
♠
Solution. Let P = (1, 0, 1), Q = (2, 2, 3), R = (−1, 1, 3), and S = (0, 3, 5).
Q S
P R
−→ − →
First, we check that this really is a parallelogram. We have to have PQ = RS. Indeed, this is the case,
−→ −
→
as PQ = [2 − 1, 2 − 0, 3 − 1]T = [1, 2, 2]T and RS = [0 − (−1), 3 − 1, 5 − 3]T = [1, 2, 2]T . We also compute
−→ − →
PR = QS = [−2, 1, 2]T . The area of the parallelogram is
−→
1 −2
2
q √
−→
PQ × PR
=
2 × 1
=
−6
= 2 2 + (−6)2 + 52 = 65.
2 2
5
♠
We can also use this concept to find the area of a triangle, as in the following example.
Solution. Let P = (1, 2, 3), Q = (0, 2, 5), and R = (5, 1, 2). The area of the triangle is exactly half of the
−→ −→
area of the parallelogram determined by the vectors PQ and PR.
P R
−→ −
→
We have PQ = [−1, 0, 2]T and PR = [4, −1, −1]T . The area of the parallelogram is the magnitude of the
cross product:
−1 4
−→ −
→
2
p √
PQ × PR
=
0 × −1
=
7
= 22 + 72 + 12 = 54.
2 −1
1
2.7. The cross product 95
1
√ 3
√
Hence the area of the triangle is 2 54 = 2 6. ♠
In this section, we explore another application of the cross product. Recall that we can use the cross
product to find the the area of a parallelogram. As we will now show, we can also use the cross product
together with the dot product to find the volume of a parallelepiped. We begin with a definition.
ru + sv + tw,
where r, s,t are real numbers between 0 and 1, inclusive. The parallelepiped is a 3-dimensional
body bounded by parallelograms as shown in this picture.
w
v
Notice that the base of the parallelepiped is the parallelogram determined by the vectors u and v. There-
fore, its area is equal to ku × vk. The height of the parallelepiped is kwk cos θ , where θ is the angle
between w and u × v, as shown in this picture.
u×v w
v
θ
u
The volume of this parallelepiped is the area of the base times the height which is just
ku × vk kwk cos θ = (u × v) w. ·
This expression is known as the box product and is sometimes written as [u, v, w].
Consider what happens if you interchange v with w or u with w. Geometrically, we can see that this
merely introduces a minus sign. We find that the box product of three vectors equals the volume of the
96 Vectors in Rn
parallelepiped determined by the three vectors if the three vectors form a right-handed system, and the
negative of the volume if the vectors form a left-handed system. We summarize this in the following
proposition:
·
Let u, v, w be three vectors in R3 that define a parallelepiped. The box product (u × v) w is equal
to:
In any case, the volume of the parallelepiped can be computed as the absolute value of the box
·
product, given by |(u × v) w|.
Solution. According to the above discussion, we can take the cross product of any two of these vectors,
and then the dot product with the third vector. The result will be either plus or minus the desired volume.
Therefore we can obtain the volume by taking the absolute value.
We first compute the cross product of u and v:
1 1 (2)(−6) − (−5)(3) 3
u × v = 2 × 3 = (−5)(1) − (1)(−6) = 1
−5 −6 (1)(3) − (2)(1) 1
·
The box product (u × v) w is:
Solution.
· ·
(a) We have [u, v, w] = (u × v) w = [2, −1, 0]T [1, −1, 1]T = 3, so the box product is positive and the
system of vectors u, v, w is right-handed.
· ·
(b) We have [u, v, w] = (u × v) w = [1, −2, 1]T [0, 1, 1]T = −1, so the box product is negative and the
system is left-handed.
· ·
(c) We have [u, v, w] = (u × v) w = [−2, 2, −1]T [1, 1, 0]T = 0, so the box product is zero and the
vectors are coplanar.
♠
We finish this section with a law involving the dot product and the cross product. It represents a funda-
mental observation that comes directly from the geometric definition of the box product.
·
Let u, v, and w be vectors. Then (u × v) w = u (v × w). ·
· ·
Proof. This follows from observing that both (u × v) w and u (v × w) compute the same box product,
i.e., they either both give the volume of the parallelepiped or they both give the negative of the volume.
Alternatively, we can calculate each product explicitly:
·
(u × v) w = u2 v3 w1 − u3 v2 w1 + u3 v1 w2 − u1 v3 w2 + u1 v2 w3 − u2 v1 w3 ,
·
u (v × w) = u2 v3 w1 − u3 v2 w1 + u3 v1 w2 − u1 v3 w2 + u1 v2 w3 − u2 v1 w3 .
98 Vectors in Rn
In Chapter 7, you will learn that these expressions are a special case of a determinant. ♠
Exercises
Exercise 2.7.1 Which of the following systems of vectors are right-handed? Which are left-handed?
Exercise 2.7.5 Find the area of the parallelogram with vertices (−2, 3, 1), (2, 1, 1), (1, 2, −1), and (5, 0, −1).
Exercise 2.7.6 Find the area of the triangle determined by the three points, (1, 0, 3), (4, 1, 0) and (−3, 1, 1).
Exercise 2.7.7 Find the area of the triangle determined by the three points, (1, 2, 3), (2, 3, 4) and (3, 4, 5).
Did something interesting happen here? What does it mean geometrically?
Exercise 2.7.9 Verify directly from the coordinate description of the cross product that u × v is orthogonal
to both u and v. Then show by direct computation that this coordinate description satisfies
where θ is the angle included between the two vectors. Explain why ku × vk has the correct magnitude.
·
Exercise 2.7.10 Prove the following formula by direct calculation: u × (v × w) = (u w) v − (u v) w. ·
Exercise 2.7.11 Use the formula from Exercise 2.7.10 to prove that the cross product satisfies the so-called
Jacobi identity:
u × (v × w) + v × (w × u) + w × (u × v) = 0.
2.7. The cross product 99
1 1
Exercise 2.7.12 Find the volume of the parallelepiped determined by the vectors −7 , −2 , and
−5 −6
3
2 .
3
Exercise 2.7.13 Which of the following systems of vectors u, v, w are right-handed? Which are left-
handed? Which are coplanar?
Exercise 2.7.14 Suppose u, v, and w are three vectors whose components are all integers. Can you
conclude the volume of the parallelepiped determined from these three vectors will always be an integer?
Exercise 2.7.15 What does it mean geometrically if the box product of three vectors equals zero?
· · ·
(u × v) ((v × w) × (w × z)) = ((v × w) z) ((u × v) w).
·
Exercise 2.7.18 Simplify ku × vk2 + (u v)2 − kuk2 kvk2 .
Exercise 2.7.19 This problem uses calculus. For u, v, w functions of t, prove that the derivative satisfies
the following product rules:
(u × v)′ = u′ × v + u × v′
· · ·
(u v)′ = u′ v + u v′
3. Lines and planes in Rn
3.1 Lines
Outcomes
A. Find the vector, parametric, and symmetric equations of a line.
We can use the concept of vectors and points to find equations for lines in Rn . Consider a straight line L
that passes through a point P in the direction given by a non-zero vector d.
L
Q
P d
The line L is infinitely long in both directions, although the picture only shows a finite part of it. To find
−→
an equation for this line, first suppose that Q is an arbitrary point on L. Then the vector PQ is parallel to
d. In other words, there exists some real number t such that
−→
PQ = t d.
If p is the position vector of P and q is the position vector of Q, we can write
−→
PQ = q − p.
Putting together the last two equations, we get q − p = t d, which we can write as
q = p + t d.
This is called the vector equation of the line L. The vector d is called the direction vector, and t is called
a parameter. The parameter t can be any real number; each time we plug in a different number for t, we
get a different point Q on the line. The following picture shows the effect of the parameter:
101
102 Lines and planes in Rn
d t=3
P t=3
t=2
t=1
t=0
t = −1 p
0
q = p+t d
is the vector equation of a straight line L. Specifically, as the parameter t ranges over the real
numbers, q ranges over the position vectors of all the points Q on the line L. The vector d is called
the direction vector of the line.
Solution. The position vector of the point P is p = [2, 0, 3]T . The equation of the line is q = p + t d, which
we can write as
x 2 1
y = 0 +t 2 .
z 3 1
♠
−→
Solution. We can use P as the base point; its position vector is p = [1, 2, 0, 1]T . We can use d = PR =
[1, −6, 6, 2] as the direction vector. Then a vector equation of the line is q = p + t d, which we can also
write as
x 1 1
y 2 −6
z = 0 +t 6 .
w 1 2
♠
3.1. Lines 103
Find two other equations for the same line, by changing the parameter to 3s and to 1 − r.
If we let t = 1 − r, we get
x 2 1
y = 0 + (1 − r) 2
z 3 1
2 1 1
= 0 + 2 −r 2
3 1 1
3 −1
= 2 + r −2 .
4 −1
♠
104 Lines and planes in Rn
x1 = p1 + t d1 ,
x2 = p2 + t d2 ,
.
..
xn = pn + t dn ,
When written in this form, they are called the parametric equations of the line.
Solution. This is a same line as in Example 3.3. We can easily convert the vector equation
x 1 1
y 2 −6
z = 0 +t 6
w 1 2
to a set of parametric equations:
x = 1 + t,
y = 2 − 6t,
z = 6t,
w = 1 + 2t.
♠
Solution. The point P is on the line L if and only if there exists some t ∈ R such that
1 2 5
2 +t 3 = 8 .
1 1 4
3.1. Lines 105
Subtracting [1, 2, 1]T from both sides of the equation, this is equivalent to
2 4
t 3 = 6 .
1 3
2t = 4,
3t = 6,
t = 3.
This is a system of three linear equations in one variable, and we quickly see that it is inconsistent.
Therefore, the point P does not lie on the line L. ♠
Solution. The two lines intersect if and only if there exist t, s ∈ R such that
3 2 1 2
1 + t 0 = −1 + s 1 .
0 1 5 −2
Bringing s to the left-hand side, and subtracting [3, 1, 0]T from both sides of the equation, this is equivalent
to
2 2 −2
t 0 − s 1 = −2 .
1 −2 5
If we write this vector equation as a set of three parametric equations, it is a system of 3 linear equations
in 2 variables. The augmented matrix of the system is
2 −2 −2
0 −1 −2 .
1 2 5
and has the unique solution t = 1 and s = 2. Therefore, the lines intersect. (Other possible cases are: If
the system is inconsistent, the lines do not intersect. If the system has more than one solution, the lines
are identical). We find the point of intersection by plugging the parameter t = 1 into the equation of the
first line (or equivalently, but plugging s = 2 into the equation of the second line - doing it both ways is a
good way to double-check your answer). Therefore, the point of intersection is
3 2 5
1 +1 0 = 1 .
0 1 1
♠
There is one other form for a line which is useful, which is the symmetric form. Consider the line given
by
x = 1 + 2t,
y = 1 − t,
z = 3 + 2t.
We can solve each equation for t:
x−1
t= 2 ,
y−1
t= −1 ,
z−3
t= 2 .
Finally, we can eliminate t from the equations by setting all three equations equal to one another:
x−1 y−1 z−3
= = .
2 −1 2
The latter is really a system of 2 equations in 3 variables. This is the symmetric form of the equation of
the line. In the following example, we look at how to convert the equation of a line from symmetric form
to parametric form.
These are the parametric equations for the line. The vector equation is
x 2 3
y = 1 +t 2 .
z −3 1
♠
We can use the dot product to find the angle between two intersecting lines. This is simply the smallest
angle between (any of) their direction vectors. The only subtlety here is that if u is a direction vector for a
line, then so is −u, and thus we will find pairs of complementary angles. We will take the smaller of the
two angles.
v
π −θ
−u θ
u
and
x 0 2
L2 : y = 3 + s 1 .
z 2 −1
Q
d
P
L
R
−→
Solution. In order to determine the shortest distance from Q to L, we will first find the vector PQ and then
−→
find the projection of this vector onto L. The vector PQ is given by
1 0 1
−→
PQ = 3 − 4 = −1 .
5 −2 7
Then, if R is the point on L closest to Q, it follows that
−→ 2 2
−
→ −→
PR = projd PQ = ·
d PQ
d =
15
1 =
5
1 .
kdk2 9
2
3
2
Now, the distance from Q to L is given by
−→
−→
→
√
−
RQ
=
PQ − PR
= 26.
−
→ −
→
The point R is found by adding the vector PR to the position vector 0P for P as follows
0 2 10/3
4 + 5 1 = 17/3 .
3
−2 2 4/3
Therefore, R = ( 10 17 4
3 , 3 , 3 ). ♠
Exercises
Exercise 3.1.1 Find the vector equation for the line through (−7, 6, 0) and (−1, 1, 4). Then, find the
parametric equations for this line.
3.1.2 Find parametric equations for the line through the point (7, 7, 1) with direction vector
Exercise
1
d= 6 .
2
3.1. Lines 109
x = t + 2,
y = 6 − 3t,
z = −t − 6.
Find a direction vector for the line and a point on the line.
Exercise 3.1.4 The equation of a line in two dimensions is written as y = x − 5. Find a vector equation
for this line.
Exercise 3.1.5 Find parametric equations for the line through (6, 5, −2, 3) and (5, 1, 2, 1).
Find a new vector equation for the same line by doing the change of parameter t = 2 − s.
Exercise 3.1.7 Consider the line given by the following parametric equations:
x = 2t + 2,
y = 5 − 4t,
z = −t − 3.
1
Exercise 3.1.8 Find the point on the line segment from P = (−4, 7, 5) to Q = (2, −2, −3) which is 7 of the
way from P to Q.
Exercise 3.1.9 Suppose a triangle in Rn has vertices at P, Q, and R. Consider the lines which are drawn
from a vertex to the mid point of the opposite side. Show these three lines intersect in a point and find the
coordinates of this point.
Exercise 3.1.14 Let P = (0, 2, 1) be a point in R3 . Let L be the line through the points P0 = (1, 1, 1) and
P1 = (4, 1, 2). Find the shortest distance from P to L, and find the point Q on L that is closest to P.
Exercise 3.1.15 When we computed the angle between two lines in Example 3.10, we calculated two
different angles and took the smaller of the two. Show that one can get the same answer by taking the
absolute value of the dot product, i.e., by solving
cos θ = ·
|u v|
.
kuk kvk
3.2 Planes
Outcomes
A. Find the vector and parametric equations of a plane in Rn .
D. Find the angle between two planes, or between a line and a plane.
Much like the above discussion with lines, vectors can be used to determine planes in Rn . Consider a point
P and two direction vectors d and e that are not parallel to each other. Then there is a unique plane passing
through P and containing d and e:
3.2. Planes 111
e
P
d
The plane is infinite in each direction, although in the picture, we have only shown a small part of it. If p
is the position vector of P and q is the position vector of some other point in the plane, we have
q = p+t d+se
for some real numbers t and s. This is called the vector equation of the plane.
q = p+t d+se
−→ −
→
Solution. We can use P as the base point and PQ and PR as the direction vectors. We have
2 1 1 0 1 −1
−→ 2
− 2 = 0 and − →
1 − 2 = −1 .
PQ = 0 0 0 PR = 1 0 1
1 0 1 0 0 0
112 Lines and planes in Rn
x = 1 + t − s,
y = 2 − s,
z = s,
w = t.
♠
Note that the vector and parametric equations of a plane are not unique. For example, in Example 3.13,
−→
we could have equally used Q or R as the base point, and/or used QR as one of the direction vectors. In
each case we would have obtained a different equation for the same plane.
Solution. We already found the parametric equations for this plane in Example 3.13. To determine whether
the point S = (4, 4, −2, 1) lies on this plane, we must substitute its coordinates into the parametric equa-
tions:
4 = 1 + t − s,
4 = 2 − s,
−2 = s,
1 = t.
This is a system of linear equations. We solve it to find that it has the unique solution (t, s) = (1, −2).
Therefore, the point S lies on the given plane, and more specifically, it is the point that corresponds to the
parameters t = 1 and s = −2. ♠
In the special case of 3 dimensions, a plane can also be described by a point and a normal vector. A
normal vector of a plane is a vector that is perpendicular to the plane.
n
Q
P
3.2. Planes 113
Given a non-zero vector n in R3 and a point P, there exists a unique plane that contains P and has n as a
normal vector. We wish to find an equation for this plane. If Q is an arbitrary point on the plane, then by
−→ −→
·
definition, the normal vector is orthogonal to the vector PQ. Writing this as a formula, we have n PQ = 0.
−→
If p and q are the position vectors of P and Q, respectively, we have PQ = q −p, and therefore the equation
of the plane can be written as
·
n (q − p) = 0,
or equivalently,
·
n q = n p. ·
This is called the normal equation of the plane. Note that in this equation, n and p are given and fixed,
whereas q is a variable ranging over the position vectors of all points on the plane.
·
n q = n p. ·
This equation is called the normal equation of the plane.
Solution. Let p = [1, 3, 0]T be the position vector of P, and let q = [x, y, z]T be the position vector of some
· ·
arbitrary point Q in the plane. The normal equation is n q = n p, which we can write in component form:
2 x 2 1
1
·
1 y = 1 3 .
z 1
· 0
·
We can pre-compute the dot product on the right-hand side: n p = 1(2) + 3(1) = 0(1) = 5. Therefore,
the normal equation can also be written as
2 x
1 z
·
1 y = 5.
♠
Notice that the last equation in Example 3.16 can also be written in the form
2x + y + z = 5.
ax + by + cz = d,
Solution. We first need to find a normal vector for the plane. Since the normal vector must be perpendicular
−→ −
→
to the plane, it must be orthogonal to both PQ and PR. We can therefore use the cross product to compute
a normal vector for the plane:
2 1 5
−→ − →
n = PQ × PR = −2 × 1 = −1 .
−3 −1 4
n
R
P
Q
Now we can easily obtain the normal equation from any point on the plane (say P) and the normal vector
we just calculated:
5 x 5 0
4
·
−1 y = −1 1
z 4 3
·
We get the standard equation by computing the dot products on the left- and right-hand sides:
5x − y + 4z = 11.
It is worthwhile to double-check the answer by substituting each of the three original points P, Q, and R
into this equation. For example, for Q = (2, −1, 0), we obtain 5(2) − (−1) + 4(0), which is indeed 11.
♠
−1 z
·
3 y = 7.
Therefore,
2
n= 3
−1
is a normal vector for the plane. ♠
·
n q = n p,·
where p and q are the position vectors of P and Q, respectively. Given n, P, and Q as above, this equation
becomes
1 5 1 2
3
·
2 4 = 2 1 .
1 3
· 4
Since both sides of the equation are equal to 16, the equation is true. So the point Q is indeed in the plane
determined by n and P. ♠
Solution. This is the same thing as finding the general solution of a system of one linear equation in 3
variables. Since there is only a single equation x + 3y − 2z = 7, it is already in echelon form. The variables
y and z are free, so we set them equal to parameters: z = t and y = s. The variable x is a pivot variable, and
we get x = 7 + 2t − 3s. So the general solution of the equation is
x 7 2 −3
y = 0 +t 0 +s 1 .
z 0 1 0
Solution. Finding the intersection means finding all of the points (x, y, z) that are on both planes simulta-
neously. This is the same as solving the system of equations
x − 2y + z = 0,
2x − 3y − z = 4.
·
Then the equation of the line is q = p + t d and the equation of the plane is n q = 2. Substituting the
·
first equation into the second one, we get n (p + t d) = 2. Using distributivity of the dot product, we can
· · · ·
write this last equation as n p + t(n d) = 2. By computing the dot products n p = 6 and n d = −2, this
equation simplifies to 6 − 2t = 2, or t = 2. Therefore, the line intersects the plane when t = 2, or at the
point
x 1 −1 −1
y = 2 +2 1 = 4 .
z 0 2 4
3.2. Planes 117
An alternative method is to directly substitute the parametric equation of the line, x = 1 − t, y = 2 + t, and
z = 2t, into the equation of the plane, 2x + 2y − z = 2. In this case, we get 2(1 − t) + 2(2 + t) − (2t) = 2,
which we can solve for t to obtain t = 2. ♠
The next few examples are concerned with calculating angles between planes, angles between lines and
planes, and finding the distance between points and planes.
Solution. The angle between two planes is the same thing as the angle between their normal vectors.
n1 n2
The normal vectors are n1 = [7, −1, 0]T and n2 = [4, 3, 5]T . The angle between them is given by
cos θ = ·
n1 n2
=
25 1
= .
kn1 k kn2 k 50 2
Solution. To get the angle θ between the plane and the line, we can compute the angle φ between the
direction vector of the line and the normal vector of the plane, and then take θ = π2 − φ .
118 Lines and planes in Rn
n d
φ
θ
The direction vector of the line is [2, −1, −2]T and the normal vector of the plane is [2, 2, −1]. We have
cos φ =
n d· 4
= ,
knk kdk 9
π
and therefore φ = arccos( 49 ) ≈ 1.11 radians. We have θ = 2 − φ ≈ 0.46 radians, or about 26.4 degrees.
♠
Solution. In this problem, we are going to use the projection of one vector onto another, which was
introduced in Section 2.6.5. Pick an arbitrary point R on the plane. Then, it follows that
−→ −
→
QP = projn RP
−→
and
QP
is the shortest distance from P to the plane. Further, the position vector of the point Q can be
−→
computed as q = p − QP, where p is the position vector of P.
P
−
→ −→
RP QP
n
R
Q
2
From the above scalar equation, we have that n = 1 . Now, choose any point on the plane, for example,
2
R = (1, 0, 0) (notice that this satisfies 2x + y + 2z = 2). Then,
3 1 2
−
→
RP = 2 − 0 = 2 .
3 0 3
3.2. Planes 119
−→ −
→
Next, compute QP = projn RP.
→!
− 2 2
−→ −
→
QP = projn RP = ·
n RP
n =
12
1
4
= 1 .
knk2 9
2
3
2
−→
Then,
QP
= 4 so the shortest distance from P to the plane is 4. To find the point Q on the plane that is
closest to P, we have
3 2 1
−→ 4 1
q = p − QP = 2 − 1 = 2 .
3 3
3 2 1
Therefore, Q = ( 13 , 23 , 13 ). ♠
Exercises
Exercise 3.2.1 Find vector and parametric equations for the plane through the points P = (0, 1, 1), Q =
(−1, 2, 1), and R = (1, 1, 2).
Exercise 3.2.3 Determine which of the following points lie on the plane through the points P = (2, 6, 1),
R = (1, 4, 1), and Q = (1, 2, −1).
Exercise 3.2.4 Use cross products to find the normal vector to the plane going through the points P =
(1, 2, 3), Q = (−2, 1, 8) and R = (2, 2, 2).
Exercise 3.2.5 Find normal and standard equations of the plane through the point P = (1, 1, 2) and
orthogonal to n = [1, 0, −1]T .
Exercise 3.2.6 Find normal and standard equations for the plane through the points P = (2, 1, 0), Q =
(1, −1, 0), and R = (1, 1, −1).
120 Lines and planes in Rn
Exercise 3.2.8 The chapter mentions that the normal equation and standard equation of a plane only
work in R3 , and not in general Rn . Why does the equation ax + by + cz + dw = e not describe a plane in
R4 ?
Exercise 3.2.9 Find the intersection between the planes x + 3y + 4z = 3 and 2x + 5y − z = 2. Is the
intersection a line, a plane, or empty?
Exercise 3.2.13 In Example 3.25, we calculated the angle θ between a line and a plane by calculating
the angle φ between the direction vector of the line and the normal vector of the plane according to the
formula
cos φ =
n d ·
knk kdk
π
and then taking θ = 2 −φ.
(a) Explain what happens when the dot product is negative. How should we adjust the formula to ensure
that θ is always between 0 and π2 ?
(b) Show that one can get the answer in a single step with the formula
sin θ = ·
|n d|
.
knk kdk
Exercise 3.2.14 Find the shortest distance from the point P = (1, 1, −1) to the plane given by x+2y+2z =
6, and find the point Q on the plane that is closest to P.
Exercise 3.2.15 Use Exercise 2.7.15 to find an equation of a plane containing the two vectors p and q
and the point 0. Hint: If (x, y, z) is a point in this plane, the volume of the parallelepiped determined by
(x, y, z) and the vectors p, q equals 0.
4. Matrices
Outcomes
A. Identify the dimension and entries of a matrix.
We have solved systems of equations by writing them in terms of an augmented matrix and then doing
row operations. It turns out that matrices are important not only for systems of equations but also for many
other purposes.
where the ai j are scalars, called the entries or components of A. The size or dimension of a matrix
is defined as m × n, where m is the number of rows and n is the number of columns.
This is a 3 ×4-matrix because there are three rows and four columns. When specifying the size of a matrix,
we always list the number of rows before the number of columns.
Entries of the matrix are identified according to their position. The (i, j)-entry of a matrix is the entry
in the i th row and j th column, and is often denoted ai j . For example, in the above matrix, the (2, 3)-entry
is the entry in the second row and the third column, and is equal to 8. We sometimes use A = ai j as
121
122 Matrices
a short-hand notation for the entire m × n-matrix whose (i, j)-entry is equal to ai j for all i = 1, . . . m and
j = 1, . . . , n.
There are various operations which are done on matrices of appropriate sizes. Matrices can be added
and subtracted, multiplied by a scalar, and multiplied by other matrices. We will never divide a matrix by
another matrix, but we will see later how matrix inverses play a similar role.
For example,
0 0
0 0 = 0 0
6
0 0
0 0
because they are different sizes. Also,
0 1 1 0
6=
3 2 2 3
because, although they are the same size, their corresponding entries are not identical.
There are special names for matrices of certain dimensions: some matrices are called square matrices,
columns vectors, or row vectors.
We have already encountered column vectors in Chapter 2. When we use the term vector without fur-
ther qualification, we always mean a column vector. Also recall from Definition 2.1 that the set of n-
dimensional column vectors is called Rn .
4.2. Addition 123
Exercises
Exercise 4.1.2 Find scalars x, y, z such that the following two matrices are equal.
x −1 2 y
and .
2 4 z 4
1 2 1
Exercise 4.1.4 What is the (2, 3)-entry of the matrix −4 4 7 ?
6 −5 3
4.2 Addition
Outcomes
A. Perform the operations of matrix addition and subtraction.
C. Apply the algebraic properties of matrix addition to manipulate an algebraic expression in-
volving matrices.
To add two matrices, the matrices have to be of the same size. The addition works by simply adding
corresponding entries of the matrices.
Solution. Notice that both A and B are of size 2 × 3. Since A and B are of the same size, the addition is
possible. Using Definition 4.5, the addition is done as follows.
1 2 3 5 2 3 1+5 2+2 3+3 6 4 6
A+B = + = = .
1 0 4 −6 2 1 1 + (−6) 0 + 2 4 + 1 −5 2 5
♠
On the other hand, the matrices
1 2
3 4 −1 4 8
and
2 8 5
5 2
cannot be added, because one has size 3 × 2 while the other has size 2 × 3.
Note there is a zero matrix for every size. For example, there is a 2 × 3 zero matrix, a 3 × 4 zero matrix,
and so on.
Solution.
1 2 3 5 2 3 1−5 2−2 3−3 −4 0 0
A−B = − = = .
1 0 4 −6 2 1 1 − (−6) 0 − 2 4 − 1 7 −2 3
♠
Addition of matrices obeys the same properties as addition of vectors.
(A + B) +C = A + (B +C).
A + (−A) = 0.
Proof. To prove the commutative law of addition, let A and B be matrices of the same size. We want to
show that A + B = B + A. To do so, we use the definition of matrix addition given in Definition 4.5. We
have
A + B = ai j + bi j = bi j + ai j = B + A.
The proof of the other properties are similar, and are left as an exercise. ♠
Exercises
Exercise 4.2.1 For the following pairs of matrices, determine if the sum A + B and the difference A − B
are defined. If so, calculate them.
1 0 0 1
(a) A = , B= .
0 1 1 0
2 1 2 −1 0 3
(b) A = , B= .
1 1 0 0 1 4
126 Matrices
1 0
2 7 −1
(c) A = −2 3 , B= .
0 3 4
4 2
1 2 −1 0 3 0
Exercise 4.2.3 Let A = and B = .
−1 4 0 1 −1 1
Find a matrix X such that (A + X ) − (B + 0) = B + A. Hint: first use the properties of matrix addition to
simplify the equation and solve for X .
Exercise 4.2.4 Using only the properties given in Proposition 4.11, show that if A + B = 0, then B = −A.
Exercise 4.2.5 Using only the properties given in Proposition 4.11, show A + B = A implies B = 0.
Outcomes
A. Multiply a matrix by a scalar, and take linear combinations of matrices.
C. Apply the algebraic properties of matrix addition and scalar multiplication to manipulate an
algebraic expression involving matrices.
The multiplication of a scalar by a matrix is called the scalar multiplication of matrices. The new matrix
is obtained by multiplying every entry of the original matrix by the given scalar, as in the following
example.
1 2 3 4 3 6 9 12
3 5 2 8 7 = 15 6 24 21 .
6 −9 1 2 18 −27 3 6
4.3. Scalar multiplication 127
Solution.
1 2 3 5 2 3 2 4 6 15 6 9 −13 −2 −3
2A − 3B = 2 −3 = − = .
1 0 4 −6 2 1 2 0 8 −18 6 3 20 −6 5
♠
Scalar multiplication of matrices obeys the same properties as scalar multiplication of vectors.
k(A + B) = kA + kB.
(k + ℓ)A = kA + ℓA.
k(ℓA) = (kℓ)A.
Exercises
Exercise 4.3.1 For each matrix A, find the products (−2)A, 0A, and 3A.
128 Matrices
1 2
(a) A =
2 1
−2 3
(b) A =
0 2
0 1 2
(c) A = 1 −1 3
4 2 0
Exercise 4.3.3 Using only the properties given in Propositions 4.11 and 4.14, show that 0A = 0. Here the
0 on the left is the scalar 0 and the 0 on the right is the zero matrix of appropriate size.
Outcomes
A. Use two different methods to multiply a matrix and a vector.
B. Multiply two matrices using the componentwise method, the column method, or the row
method.
One of the most important uses of a matrix is to multiply a matrix by a vector. In fact, this is one of the
reasons matrices were invented. Let us start by considering an alternative way of writing a system of linear
equations.
4.4. Matrix multiplication 129
a11 x1 + . . . + a1n xn = b1 ,
a21 x1 + . . . + a2n xn = b2 ,
.
..
am1 x1 + . . . + amn xn = bm .
Notice that each vector used here is one column from the corresponding augmented matrix. There is
one vector for each variable in the system, along with the constant vector. The left-hand side is a linear
combination of column vectors. Linear combinations of column vectors are so important that we introduce
a special notation for them.
In other words, we can think of the vector x as encoding instructions for how to take a linear combination
of the columns of A. The product Ax is computed by taking x1 times the first column of A, plus x2 times
the second column of A, and so on. For this to work, A must have the same number of columns as x has
components.
Solution. We have
7
1 2 3 8 = 7 1 2 3 50
+8 +9 = .
4 5 6 4 5 6 122
9
♠
There is another way of looking at the product of a matrix and a vector. Instead of looking at the columns
of A, we can look at the rows.
Solution. We have
7
1 2 3 8 = 1 · 7 + 2 · 8 + 3 · 9 50
= .
4 5 6 4·7+5·8+6·9 122
9
a11 x1 + . . . + a1n xn = b1 ,
a21 x1 + . . . + a2n xn = b2 ,
.
..
am1 x1 + . . . + amn xn = bm .
The matrix form of a system of equations is therefore written as Ax = b, where A is the coefficient matrix
of the system, x is an n-dimensional column vector constructed from the variables of the system, and b is
an m-dimensional column vector constructed from the constant terms of the system. Any system of linear
equations can be written in this form.
The multiplication of a matrix and a vector from the previous section is a special case of the operation of
multiplying two matrices, which we now define.
For matrices A and B, in order to form the product AB, the number of columns of A must equal the number
of rows of B. Consider a product AB where A has dimensions m × n and B has dimensions n × p. Then the
dimensions of the product are given by
these must match!
d
( m × n) (n × p) = m × p.
Note that the two outside numbers give the dimensions of the product. If the two middle numbers do not
match, we cannot multiply the matrices.
132 Matrices
In other words, the (i, k)-entry of the matrix product AB is a kind of dot product of the i th row of A with
the k th column of B.
Solution. First, let us note that since A has size 2 × 3 and B has size 3 × 3, the product AB is well-defined
and has size 2 × 3. Let C = AB. We compute each of the six entries of C:
• The (1, 1)-entry is the first row of A times the first column of B: c11 = 1 · 1 + 2 · 0 + 1 · (−2) = −1.
• The (1, 2)-entry is the first row of A times the second column of B: c12 = 1 · 2 + 2 · 3 + 1 · 1 = 9.
• The (1, 3)-entry is the first row of A times the third column of B: c13 = 1 · 0 + 2 · 1 + 1 · 1 = 3.
• The (2, 1)-entry is the second row of A times the first column of B: c21 = 0 · 1 + 2 · 0 + 1 · (−2) = −2.
• The (2, 2)-entry is the second row of A times the second column of B: c22 = 0 · 2 + 2 · 3 + 1 · 1 = 7.
• The (2, 3)-entry is the second row of A times the third column of B: c23 = 0 · 0 + 2 · 1 + 1 · 1 = 3.
4.4. Matrix multiplication 133
Therefore, we have
−1 9 3
AB = .
−2 7 3
♠
As this example shows, calculating matrix products one component at a time can be an extremely repetitive
and tedious process. Fortunately, we can speed this up by considering whole columns at once.
Ab1 , Ab2 , . . . , Ab p .
In other words, the k th column of the matrix product AB is equal to A times the k th column of B.
♠
Of course, the answer in Example 4.24 is the same as that in Example 4.22. Please convince yourself that
both methods of matrix multiplication give the same answer, since they each ultimately calculate the same
134 Matrices
thing. Nevertheless, with a bit of practice, the column method is much faster, and you can even learn to
multiply matrices in your head! The key to understanding the column method is that each column of B
provides instructions for taking a linear combination of the columns of A. The method works especially
well if B contains many zeros and ones.
Since column vectors are simply n × 1-matrices, and row vectors are 1 × m-matrices, we can also
multiply a column vector by a row vector or vice versa.
Solution. Here we are multiplying a 3 × 1-matrix by a 1 × 4-matrix, so the result will be a 3 × 4-matrix.
Using the column method, we can compute this product as follows:
First column Second column Third column Fourth column
z }| {
z }| { z }| { z }| {
1 1 1 1 1 1 2 1 0
2 1 2 1 0 = 2 1 , 2 2 , 2 1 , 2 0 = 2 4 2 0
1 1 1 1 1 1 2 1 0
Solution. Here we are multiplying a 1 × 3-matrix by a 3 × 1-matrix, so the result will be a 1 × 1-matrix,
or in other words, a scalar. (We regard a scalar and a 1 × 1-matrix as the same thing). We have:
1
1 2 3 2 = 1 · 1 + 2 · 2 + 3 · (−1) = 2.
−1
Therefore, multiplying a row vector by a column vector works very similarly to an ordinary dot product
(except that the dot product is defined between two column vectors, not a row vector and a column vector).
♠
Solution. The product BA is not defined, since B is a 3 × 3-matrix and A is a 2 × 3-matrix. Since the
number of columns of B does not match the number of rows of A, the product is not defined. ♠
Notice that the matrices in Example 4.27 are the same as those in Example 4.22. This demonstrates an
important property of matrix multiplication: it is possible that AB is defined by BA is undefined. Even if
AB and BA are both defined, they may not be equal, as the following example shows. Therefore, matrix
multiplication is not commutative.
Solution. We have
1 2 0 1 2 1
AB = =
3 4 1 0 4 3
and
0 1 1 2 3 4
BA = = .
1 0 3 4 1 2
Therefore, AB and BA are not equal. Matrix multiplication is not commutative. ♠
We have seen two methods for matrix multiplication: one component at a time, and by the column method.
There is also a third method, called the row method. It is exactly symmetric to the column method.
1 2 0
0 2 1 0 3 1 = 0 1 2 0 + 2 0 3 1 + 1 −2 1 1 = −2 7 3 .
−2 1 1
The resulting two row vectors form the rows of AB. Thus,
−1 9 3
AB = .
−2 7 3
Once again this is the same answer as in Examples 4.24 and 4.22. All three methods give the same result.
But notice how in the row method, each row of A provides instructions for taking a linear combinations of
the rows of B. ♠
We finish this section by introducing an important square matrix called the identity matrix.
When it is necessary to distinguish which size of identity matrix is being discussed, we will use the
notation In for the n × n identity matrix.
which is exactly the same as the first column of A. Similarly, the second column of AI is
a11 a12 a11 a12 a12
a21 a22 0 = 0 a21 + 1 a22 = a22 ,
1
a31 a32 a31 a32 a32
♠
The calculation of the last example generalizes to matrices of all sizes, and is summarized in the following
proposition.
Solution. We have
3 1 2 1 2 1 2 −3 8 1 2 −19 18
A = A·A·A = = = .
−2 3 −2 3 −2 3 −8 5 −2 3 −18 −1
We have already seen that matrix multiplication is not in general commutative, i.e., AB and BA may be
different, even if they are both defined. Sometimes it can happen that AB = BA for specific matrices A and
B. In this case, we say that A and B commute.
The following are some properties of matrix multiplication. Notice that these properties hold only
when the size of matrices are such that the products are defined.
138 Matrices
(AB)C = A(BC).
Im A = A = AIn ,
where A is an m × n-matrix.
Proof. First, we will prove the associative law. In the proof, it will be useful to use summation notation.
We write
n
∑ xi = x1 + x2 + . . . + xn
i=1
for the sum of the n numbers x1 , . . . , xn . Assume A is an m × n-matrix, B is an n × p-matrix, and C is a
p × q-matrix. Then both (AB)C and A(BC) are m × q-matrices. We must show that they have the same
entries. The (i, ℓ)-entry of the matrix (AB)C is
p p n p n
((AB)C)iℓ = ∑ (AB)ik ckℓ = ∑ ( ∑ ai j b jk )ckℓ = ∑ ∑ ai j b jk ckℓ .
k=1 k=1 j=1 k=1 j=1
Both sums are equal, since they are both summing over all the terms where j = 1, . . . , n and k = 1, . . . , p.
Therefore, (AB)C = A(BC). The fact that identity matrices act as multiplicative units was already men-
tioned in Proposition 4.33. We leave compatibility with scalar multiplication as an exercise. To prove the
first distributive law, assume A is an m × n-matrix, and B and C are n × p-matrices. Then both A(B +C)
and AB + AC are m × p-matrices. We have
n n n n
(A(B +C))ik = ∑ ai j (B +C) jk = ∑ ai j (b jk + c jk ) = ∑ ai j b jk + ∑ ai j c jk = (AB + AC)ik.
j=1 j=1 j=1 j=1
4.4. Matrix multiplication 139
Thus A(B +C) = AB + AC as claimed. The proof of the other distributive law is similar. ♠
Exercises
1 2 0
Exercise 4.4.1 Let A = . Multiply A by each of the following vectors.
−1 1 1
1 0 2 0 1
(a) 0 , (b)
1 , (c) 1 , (d) 0 , (e) 0 .
0 −1 0 0 3
Exercise 4.4.2 Write the following system of equations in vector form and matrix form.
2x + 3y + z = 4,
x − 2z = 3,
2y + z = 1.
Exercise 4.4.3 Compute the following by columns and by rows. Convince yourself that both method give
the same result.
a11 a12 a13 x1
a21 a22 a23 x2 .
a31 a32 a33 x3
Exercise 4.4.11 For each pair of matrices, find the (1, 2)-entry and (2, 3)-entry of the product AB.
4.4. Matrix multiplication 141
1 2 −1 4 6 −2
(a) A = 3 4
0 ,B= 7 2 1
2 5 1 −1 0 0
1 3 1 2 3 0
(b) A = 0 2 4 , B = −4 16 1
1 0 5 0 2 2
Exercise 4.4.16 Give an example of a matrix A such that A2 = I and yet A 6= I and A 6= −I.
Exercise 4.4.18 Suppose A and B are square matrices of the same size. Which of the following are
necessarily true?
(c) (AB)2 = A2 B2 .
(d) (A + B)2 = A2 + AB + BA + B2 .
(e) A2 B2 = A(AB)B.
(g) (A + B)(A − B) = A2 − B2 .
142 Matrices
Outcomes
A. Determine whether a matrix is invertible, and compute the inverse if it exists.
D. Determine whether a matrix is a left inverse, right inverse, or inverse of another matrix.
We now define a matrix operation which in some ways plays the role of division. We cannot divide by a
matrix, but we can multiply by the inverse of a matrix, which is almost as good.
BA = I and AB = I.
If this is the case, we also write B = A−1 . When a matrix has an inverse, it is called invertible.
and
2 −1 1 1 1 0
BA = = = I.
−1 1 1 2 0 1
This shows that B is indeed an inverse of A. ♠
Unlike multiplication of scalars, it can happen that A 6= 0 but A does not have an inverse. This is illustrated
in the following example.
4.5. Matrix inverses 143
Solution. One might think A has an inverse because it does not equal zero. However, note that
1 1 −1 0
= .
1 1 1 0
If an inverse A−1 existed, we would have the following:
−1 −1
= I
1 1
−1 −1
= (A A)
1
−1 −1
= A A
1
0
= A−1
0
0
= .
0
This says that
−1 0
= ,
1 0
which is impossible! Therefore, A does not have an inverse. ♠
Can a matrix have more than one inverse? It turns out that this is not the case: the following theorem
shows that if A has an inverse, then the inverse is unique. We can therefore speak of “the” inverse, rather
than just “an” inverse, of A.
In Example 4.37, we verified that a matrix A had an inverse. But we did not actually compute the inverse:
the inverse B was already given, and we merely checked that AB = I and BA = I. We now explore a method
for finding the inverse when it is not already known what it is.
144 Matrices
x z
Solution. To find A−1 , we need to find a matrix such that
y w
1 −2 x z 1 0
= .
2 −3 y w 0 1
We can multiply these two matrices, and see that in order for this equation to be true, we must solve the
systems of equations
x − 2y = 1,
2x − 3y = 0,
and
z − 2w = 0,
2z − 3w = 1.
Writing the augmented matrix for these two systems gives
1 −2 1
2 −3 0
for the first system and
1 −2 0
2 −3 1
for the second one. Note that both systems have A as their coefficient matrix. Since both systems have the
same coefficient matrix, they both require exactly the same row operations, and we can use the method
of Example 1.33 to solve both systems at the same time. To do so, we create a single augmented matrix
containing both of the right-hand sides:
1 −2 1 0
.
2 −3 0 1
Then we perform row operations until the coefficient matrix is in reduced echelon form:
1 −2 1 0 R2 ← R2 −2R1 1 −2 1 0 R1 ← R1 +2R2 1 0 −3 2
≃ ≃ . (4.1)
2 −3 0 1 0 1 −2 1 0 1 −2 1
This corresponds to the following reduced echelon forms for the two original systems of equations:
1 0 −3 1 0 2
and .
0 1 −2 0 1 1
The solution of the first system is x = −3 and y = −2. The solution for the second system is z = 2 and
w = 1. If we take the values found for x, y, z, and w and put them into our inverse matrix, we see that the
inverse is
−1 x z −3 2
A = = .
y w −2 1
4.5. Matrix inverses 145
Notice that this is exactly the right-hand side in the last augmented matrix of (4.1). In other words, all we
really had to do to find the inverses were the row operations in (4.1). The inverse can be read off directly
from the result. ♠
The example suggests a general method for finding the inverse of a matrix, which we summarize in the
following algorithm.
[A | I] .
[I | B] .
If this can be done, then A is invertible and A−1 = B. If it is not possible (i.e., if the reduced echelon
form of A has less than n pivot entries), then A is not invertible.
This algorithm shows how to find the inverse if it exists. It also tells us if A does not have an inverse.
Solution. We set up the augmented matrix and reduce it to reduced echelon form.
1 2 2 1 0 0
[A | I] = 1 0 2 0 1 0
3 1 −1 0 0 1
R2 ← R2 −R1
1 2 2 1 0 0
R3 ← R3 −3R1
≃ 0 −2 0 −1 1 0
0 −5 −7 −3 0 1
R1 ← 7R1
R3 ← −2R3
7 14 14 7 0 0
≃ 0 −2 0 −1 1 0
0 10 14 6 0 −2
R1 ← R1 +7R2
7 0 14 0 7 0
R3 ← R3 +5R2
≃ 0 −2 0 −1 1 0
0 0 14 1 5 −2
7 0 0 −1 2 2
R1 ← R1 −R3
≃ 0 −2 0 −1 1 0
0 0 14 1 5 −2
146 Matrices
R1 ← 71 R1
R2 ← − 12 R2 1 0 0 − 17 2 2
R3 ← 1 7 7
14 R3 1
≃ 0 1 0 2 − 12 0 .
1 5
0 0 1 14 14 − 17
Notice that the last augmented matrix is of the form [I | B], where the left-hand side is the 3 × 3 identity
matrix. Therefore, the inverse is the 3 × 3-matrix on the right-hand side, given by
− 17 2 2
7 7
−1 1
A = 2 − 12 0 .
1 5 1
14 14 − 7
♠
When looking for the inverse of a matrix, it can happen that the left-hand side cannot be row reduced to
the identity matrix. The following is an example of this situation.
At this point, we see that the coefficient matrix has rank 2, i.e., there are only two pivot entries. This
means there is no way to obtain I on the left-hand side of this augmented matrix. Hence, there is no way
to complete the algorithm, and the inverse of A does not exist. ♠
If the algorithm provides an inverse, it is always possible to double-check that your answer is correct. To
do so, use the method demonstrated in Example 4.37. Check that the products AA−1 and A−1 A both equal
the identity matrix. Through this method, you can always ensure that you have calculated A−1 properly.
4.5. Matrix inverses 147
One way in which the inverse of a matrix is useful is to find the solution of a system of linear equations.
Recall from Definition 4.20 that we can write a system of equations in matrix form, which is in the form
Ax = b.
Suppose we find the inverse A−1 of the matrix A. Then we can multiply both sides of this equation by A−1
on the left and simplify to obtain
x = A−1 b.
Therefore we can find x, the solution to the system, by computing x = A−1 b. Note that once we have found
A−1 , we can easily get the solution for different right-hand sides (different b). It is always just A−1 b.
x+z = 1
x−y+z = 3
x+y−z = 2
1. I is invertible and I −1 = I .
So far, we have only talked about the inverses of square matrices. But what about matrices that are not
square? Can they be invertible? It turns out that non-square matrices can never be invertible. However,
they can have left inverses or right inverses.
BA = I.
Solution. We compute
1 0
1 0 0 1 0
AB = 0 1 = = I,
0 1 0 0 1
0 0
1 0 1 0 0
1 0 0
BA = 0 1 = 0 1 0 6= I.
0 1 0
0 0 0 0 0
Therefore, B is a right inverse, but not a left inverse, of A. ♠
Recall from Definition 4.36 that B is called an inverse of A if it is both a left inverse and a right inverse. A
crucial fact is that invertible matrices are always square.
• If A is invertible, then m = n.
Proof. To prove the first claim, assume that A is left invertible, i.e., assume that BA = I for some n × m-
matrix B. We must show that m ≥ n. Assume, for the sake of obtaining a contradiction, that this is not the
case, i.e., that m < n. Then the matrix A has more columns than rows. It follows that the homogeneous
system of equations Ax = 0 has a non-trivial solution; let x be such a solution. We obtain a contradiction
by a similar method as in Example 4.38. Namely, we have
x = I x = (BA)x = B(Ax) = B0 = 0,
contradicting the fact that x was non-trivial. Since we got a contradiction from the assumption that m < n,
it follows that m ≥ n.
The second claim is proved similarly, but exchanging the roles of A and B. The third claim follows
directly from the first two claims, because every invertible matrix is both left and right invertible. ♠
Of course, not all square matrices are invertible. In particular, zero matrices are not invertible, along with
many other square matrices.
150 Matrices
Exercises
Exercise 4.5.1 For each of the following pairs of matrices, determine whether B is an inverse of A.
(a)
2 4 1 3 −4
A= , B= .
1 3 2 −1 2
(b)
1 −2 −1 2
A= , B= .
4 −7 −4 7
(c)
4 1 3 1 −1 −1
A = 2 1 2 , B= 0 1 −2 .
1 0 1 −1 1 2
Exercise 4.5.2 Suppose AB = AC and A is an invertible n × n-matrix. Does it follow that B = C? Explain
why or why not.
Exercise 4.5.4 For each of the following matrices, find the inverse if possible. If the inverse does not exist,
explain why.
2 1 0 1 2 1 2 1 0 1 2
A= , B= , C= , D= , E= .
−1 3 5 3 3 0 4 2 1 2 5
Exercise 4.5.5 For each of the following matrices, find the inverse if possible. If the inverse does not exist,
explain why.
1 2 0 2
1 2 3 1 0 3 1 2 3 1 1 2 0
A = 2 1 4 , B = 2 3 4 , C = 2 1 4 , D = 2 1 −3 2 .
1 0 2 1 0 2 4 5 10
1 2 1 2
a b
Exercise 4.5.6 Let A be a 2 × 2 invertible matrix, with A = . Find a formula for A−1 in terms of
c d
a, b, c, d.
Exercise 4.5.7 Using the inverse of the matrix, find the solution to the systems:
4.5. Matrix inverses 151
(a)
2 4 x 1
= ,
1 1 y 2
(b)
2 4 x 2
= .
1 1 y 0
Exercise 4.5.8 Using the inverse of the matrix, find the solution to the systems:
(a)
1 0 3 x 1
2 3 4 y = 0 ,
1 0 2 z 1
(b)
1 0 3 x 3
2 3 4 y = −1 .
1 0 2 z −2
Exercise 4.5.9 Show that if A is an n × n invertible matrix and X and B are n × 1-matrices such that
AX = B, then X = A−1 B.
Exercise 4.5.12 Is it possible to have matrices A and B such that AB = I, while BA = 0? If it is possible,
give an example of such matrices. If it is not possible, explain why.
Exercise 4.5.13 Show that (ABC)−1 = C−1 B−1 A−1 by verifying that
Exercise 4.5.14 If A is invertible, show that A2 is invertible and (A2 )−1 = (A−1 )2 .
Exercise 4.5.15 If A is invertible, show (A−1 )−1 = A. Hint: Use the uniqueness of inverses.
Exercise 4.5.16 Determine whether B is a right inverse, left inverse, both, or neither of A.
(a)
1 −1
4 2 1
A= , B = −1 1 .
2 1 1
−1 2
(b)
1 0
1 2 −1
A = 0 1 , B= .
0 1 0
0 2
(c)
1 2 7 −2
A= , B= .
3 7 −3 1
(d)
1 2 −1 2
A= , B= .
3 1 1 −1
Exercise 4.5.17 Show that right inverses are not unique by giving an example of matrices A, B,C such
that both B and C are right inverses of A, but B 6= C.
Exercise 4.5.18 Solve the following system of equations by using the inverse of a suitable matrix.
8x + 2y + 3z = −1
y − 2z = 2
x+z = 1
Exercise 4.5.19 Suppose that A, B,C, D are n × n-matrices, and that all relevant matrices are invertible.
Further, suppose that (A + B)−1 = CB−1 . Solve this equation for A (in terms of B and C), B (in terms of A
and C), and C (in terms of A and B).
Exercise 4.5.20 Which of the following matrices is right invertible? Find a right inverse if one exists. If
possible, find two different right inverses.
1 2
1 2 3 1 1 2 1 2
A= B= C= 1 0 D=
0 1 0 2 2 4 2 3
0 1
Exercise 4.5.21 Which of the following matrices is left invertible? Find a left inverse if one exists. If
possible, find two different left inverses.
1 0 1 0
1 2 1 1 1
A= B= C= 0 1 D= 2 0
0 1 0 2 3
2 1 3 0
4.6. Elementary matrices 153
Outcomes
A. Use multiplication by elementary matrices to apply row operations.
C. Write the reduced echelon form of a matrix A in the form R = UA, where U is invertible.
Recall from Definition 1.14 that there are three kinds of elementary row operations on matrices:
The purpose of this section is to show that each of these row operations corresponds to a special type of
invertible matrix called an elementary matrix.
So the effect of multiplying A by E on the left is exactly the same as switching rows 2 and 3. We say that
E is the elementary matrix for switching rows 2 and 3. ♠
Example 4.51: Elementary matrix for adding a multiple of one row to another row
Let
1 0 0
E = 0 1 0 .
0 k 1
What is the effect of multiplying E by an arbitrary 3 × n-matrix A?
Solution. Following Definition 4.52, all we have to do is apply the desired row operation to the 4 × 4-
identity matrix:
1 0 0 0 1 0 5 0
0 1 0 0 R1 ← R1 +5R3 0 1 0 0
≃ = E.
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
♠
We can double-check that multiplying E by any 4 × n-matrix does indeed have the desired effect:
1 0 5 0 a11 a12 · · · a1n a11 + 5a31 a12 + 5a32 · · · a1n + 5a3n
0 1 0 0 a21 a22 · · · a2n a21 a22 ··· a2n
= .
0 0 1 0 a31 a32 · · · a3n a31 a32 ··· a3n
0 0 0 1 a41 a42 · · · a4n a41 a42 ··· a4n
The fact that this always works is the content of the following theorem.
Suppose we have applied a row operation to a matrix A. Consider the row operation required to return A to
its original form, i.e., to undo the row operation. It turns out that this action is described by the inverse of
an elementary matrix. The following theorem ensures that the inverse of each elementary matrix is itself
an elementary matrix.
In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on I. E −1 is
obtained by performing the row operation which would carry E back to I.
• If E is obtained by switching rows i and j, then E −1 is also obtained by switching rows i and j.
• If E is obtained by multiplying row i by the scalar k, then E −1 is obtained by multiplying row i by
the scalar 1k .
156 Matrices
• If E is obtained by adding k times row i to row j, then E −1 is obtained by subtracting k times row i
from row j.
Solution. E is obtained from the 2 × 2 identity matrix by multiplying the second row by 2. In order to
carry E back to the identity, we need to multiply the second row of E by 12 . Hence, E −1 is given by
−1 1 0
E = .
0 12
♠
Suppose an m × n-matrix A is row reduced to its reduced echelon form. By tracking each row operation
completed, this row reduction can be performed through multiplication by elementary matrices. The
following theorem uses this fact.
Solution. To find the reduced echelon form R, we row reduce A. For each step, we will record the appro-
priate elementary matrix. First, switch rows 1 and 2.
0 1 1 0
↔ R2
1 0 R1 ≃ 0 1 .
2 0 2 0
4.6. Elementary matrices 157
0 1 0
The corresponding elementary matrix is E1 = 1 0 0 , i.e.,
0 0 1
0 1 0 0 1 1 0
1 0 0 1 0 = 0 1 .
0 0 1 2 0 2 0
Next, subtract 2 times the first row from the third row.
1 0 1 0
R3 ← R3 −2R1
0 1 ≃ 0 1 .
2 0 0 0
1 0 0
The corresponding elementary matrix is E2 = 0 1 0 , i.e.,
−2 0 1
1 0 0 1 0 1 0
0 1 0 0 1 = 0 1 .
−2 0 1 2 0 0 0
Notice that the resulting matrix is R, the required reduced echelon form of A. We can then write
R = E2 E1 A
= UA.
It remains to compute U :
1 0 0 0 1 0 0 1 0
U = E2 E1 = 0 1 0 1 0 0 = 1 0 0 .
−2 0 1 0 0 1 0 −2 1
♠
While the process used in the above example is reliable and simple when only a few row operations are
used, it becomes cumbersome in a case where many row operations are needed to carry A to R. The
following theorem provides an alternate way to find the matrix U .
Let’s revisit the above example using the process outlined in Theorem 4.59.
Recall from Algorithm 4.41 that an n × n-matrix A is invertible if and only if A can be carried to the
n × n identity matrix using elementary row operations. Combining this with our discussion of elementary
matrices we see that A is invertible if and only if it can be written as a product of elementary matrices.
This is the content of the following theorem.
Proof. If A is an invertible n × n-matrix, then its reduced echelon form is the n × n identity matrix I. By
Theorem 4.57, we can write I = UA, where U = Ek · · · E2 E1 is a product of elementary matrices. Then
By Theorem 4.55, if Ei is an elementary matrix, then so is Ei−1 . Therefore, A has been written as a product
of elementary matrices. Conversely, if A can be written as a product of elementary matrices, then A is
clearly invertible, because each elementary matrix is invertible. ♠
4.6. Elementary matrices 159
Solution. Following the process of Theorem 4.61, we first row-reduce A to its reduced echelon form,
recording each row operation as an elementary matrix.
0 1 0 1 1 0 0 1 0
R1 ↔ R2
1 1 0 ≃ 0 1 0 with elementary matrix E1 = 1 0 0 ,
0 −2 1 0 −2 1 0 0 1
1 1 0 1 0 0 1 −1 0
R1 ← R1 −R2
0 1 0 ≃ 0 1 0 with elementary matrix E2 = 0 1 0 ,
0 −2 1 0 −2 1 0 0 1
1 0 0 R3 ← R3 +2R2
1 0 0 1 0 0
0 1 0 ≃ 0 1 0 with elementary matrix E3 = 0 1 0 .
0 −2 1 0 0 1 0 2 1
Notice that the reduced echelon form of A is I. Hence I = UA where U = E3 E2 E1 . It follows that
A = U −1 = E1−1 E2−1 E3−1 , and so we have succeeded in writing A as a product of elementary matrices
0 1 0 1 1 0 1 0 0
A = E1−1 E2−1 E3−1 = 1 0 0 0 1 0 0 1 0 .
0 0 1 0 0 1 0 −2 1
In this section, we will use elementary matrices to prove a useful theorem about the inverse of a square
matrix. We start with an observation about the echelon form of a right invertible matrix.
Proof. Let R be the reduced echelon form of A. Then by Theorem 4.57, we can write R = UA for some
invertible square matrix U . By assumption, we have AB = I, and therefore
If R had a row of zeros, then so would the product R(BU −1). But since the identity matrix I does not have
a row of zeros, neither does R. ♠
160 Matrices
Proof. Assume A and B are square matrices such that AB = I. Let R be the reduced echelon form of A.
Then by Theorem 4.57, we can write R = UA where U is an invertible matrix. Since AB = I, we know by
Lemma 4.63 that R does not have a row of zeros. Since R is a square reduced echelon form with no row
of zeros, each column must be a pivot column, and it follows that R = I. Hence, UA = I, and therefore A
is left invertible. Moreover, we have
B = IB = (UA)B = U (AB) = U I = U ,
Exercises
Exercise 4.6.1 For each of the following pairs of matrices, suppose a row operation is applied to A and
the result is B. Find the elementary matrix E that represents this row operation.
(a)
2 3 1 2
A= , B= .
1 2 2 3
(b)
4 0 8 0
A= , B= .
2 1 2 1
(c)
1 −3 1 −3
A= , B= .
0 5 2 −1
Exercise 4.6.2 For each of the following pairs of matrices, suppose a row operation is applied to A and
the result is B.
4.6. Elementary matrices 161
(a)
1 2 1 1 2 1
A= 0 5 1 , B = 2 −1 4 .
2 −1 4 0 5 1
(b)
1 2 1 1 2 1
A= 0 5 1 , B = 0 10 2 .
2 −1 4 2 −1 4
(c)
1 2 1 1 2 1
A= 0 5 1 , B= 0 5 1 .
2 −1 4 1 − 12 2
(d)
1 2 1 1 2 1
A= 0 5 1 , B= 2 4 5 .
2 −1 4 2 −1 4
Exercise 4.6.3 Find the reduced echelon form of each of the following matrices A, and write it in the form
R = UA where U is invertible.
1 3 1 0 1 1 1 0
0 1 −2
(a) 0 2 4 , (b) 3 −1 , (c) , (d) 0 1 1 .
1 3 2
2 6 −2 2 6 1 0 1
Exercise 4.6.4 Write each of the following matrices as a product of elementary matrices, if possible, or
else say why it is not possible.
1 2 0 0 1 2
1 3 2 1
(a) , (b) , (c) 0 1 3 , (d) 1 3 7 .
0 2 5 3
2 4 1 1 1 3
162 Matrices
Outcomes
A. Calculate the transpose of a matrix.
Another important operation on matrices is that of taking the transpose. The transpose of a matrix is
obtained by turning the rows into columns and vice versa.
1. (AT )T = A.
2. (A + B)T = AT + BT .
3. (rA)T = rAT .
4. (AB)T = BT AT .
5. 0T = 0.
6. I T = I .
Recall that a column vector is the same thing as a n × 1-matrix. Using the transpose, we can make precise
the connection between the dot product and the matrix product. Namely, let
v1 w1
v = ... and w = ...
vn wn
by column vectors. Then
w1
·
v w = v1 w1 + . . . + vn wn = v1 · · · vn ... = vT w.
wn
In other words, the dot product of column vectors v and w is the same thing as the matrix product vT w.
We can also use the notion of transpose to define what it means for a matrix to be symmetric and
antisymmetric.
Exercises
Exercise 4.7.1 Let X = −1 −1 1 and Y = 0 1 2 . Find X T Y and XY T if possible.
(a) −3AT .
(b) 3B − AT .
(c) E T B.
(d) EE T .
(e) BT B.
(f) CAT .
(g) DT BE.
Exercise 4.7.3 Which of the following matrices are symmetric, antisymmetric, both, or neither?
0 1 2 1 1 2 0 0
A= , B= , C= , D= .
−1 0 1 3 −2 0 0 0
Exercise 4.7.4 Suppose A is a matrix that is both symmetric and antisymmetric. Show that A = 0.
Exercise 4.7.5 Let A be an n × n-matrix. Show A equals the sum of a symmetric and an antisymmetric
matrix. Hint: Show that 12 (AT + A) is symmetric and then consider using this as one of the matrices.
Exercise 4.7.6 Show that the main diagonal of every antisymmetric matrix consists of only zeros. Recall
that the main diagonal consists of every entry of the matrix which is of the form aii .
Exercise 4.7.7 Show that for m × n-matrices A, B and scalars r, s, the following holds:
(rA + sB)T = rAT + sBT .
Exercise 4.7.8 Let A be a real m × n-matrix and let u ∈ Rn and v ∈ Rm . Show (Au) v = u (AT v). · ·
Exercise 4.7.9 Show that if A is an invertible n × n-matrix, then so is AT and (AT )−1 = (A−1 )T .
Exercise 4.7.10 Suppose A is invertible and symmetric. Show that A−1 is symmetric.
4.8. Matrix arithmetic modulo p 165
Outcomes
A. Perform matrix operations over the field Z p .
In Section 1.8, you learned that most of linear algebra can be done over scalars from any field K, and not
just the real numbers. You also learned that Z p , the set of integers modulo p, is a field whenever p is a
prime number.
Indeed, all of the operations on matrices that we covered in this chapter make sense over any field:
addition, scalar multiplication, matrix multiplication, inverses, elementary matrices, and the transpose.
Solution. For example, the (1, 1)-entry of AB is calculated by multiplying the first row of A by the first
column of B, i.e.,
c11 = 1 · 3 + 0 · 4 + 4 · 2 = 3 + 0 + 3 = 1,
keeping in mind that all arithmetic operations are done in Z5 . We repeat the same for the other entries and
obtain
3 1
1 0 4 1 4
AB = 4 0 = .
2 3 1 0 4
2 2
♠
Solution. We use exactly the method of Algorithm 4.41, i.e., we set up the augmented matrix [A | I] and
reduce it to reduced echelon form. The only thing we have to keep in mind is that all operations are done
166 Matrices
modulo 7. Also, as usual, instead of dividing by a scalar, we must multiply by its inverse.
1 4 2 1 0 0
[A | I] = 0 6 2 0 1 0
5 0 3 0 0 1
1 4 2 1 0 0
R3 ← R3 +2R1
≃ 0 6 2 0 1 0
0 1 0 2 0 1
1 4 2 1 0 0
R2 ↔ R3
≃ 0 1 0 2 0 1
0 6 2 0 1 0
R3 ← R3 +R2
1 4 2 1 0 0
≃ 0 1 0 2 0 1
0 0 2 2 1 1
−1 1 4 2 1 0 0
R3 ← 2 R3 = 4R3
≃ 0 1 0 2 0 1
0 0 1 1 4 4
1 0 2 0 0 3
R1 ← R1 −4R2
≃ 0 1 0 2 0 1
0 0 1 1 4 4
R1 ← R1 −2R3
1 0 0 5 6 2
≃ 0 1 0 2 0 1 .
0 0 1 1 4 4
Therefore, the inverse is
5 6 2
A−1 = 2 0 1 .
1 4 4
As usual, we can double-check that we didn’t make any mistakes by calculating
1 4 2 5 6 2 1 0 0
AA−1 = 0 6 2 2 0 1 = 0 1 0 = I.
5 0 3 1 4 4 0 0 1
So indeed, we have calculated the inverse correctly. ♠
Exercises
(a) 3A,
(b) A2 ,
(c) AB,
(d) BC,
(e) C−1 .
Cryptography is about encoding a message so that it is hard for a third party to read. The original message
is called the plaintext and the encrypted message is called the ciphertext. The process of turning a
plaintext into the corresponding ciphertext is called encryption, and the process of turning a ciphertext
into the corresponding plaintext is called decryption. An encryption and decryption method is also called
a cipher. Modern ciphers are designed in such a way that the cipher itself is not secret, but the encryption
depends on a secret key. A cipher should be designed so that decryption is easy for a person who knows
the key, but difficult for everybody else. The art of designing ciphers is called cryptography, and the art
of breaking ciphers is called cryptanalysis.
In order to be able to define ciphers using algebraic operations, we start by encoding strings as se-
quences of numbers. To that end, we assign a number to each letter of the alphabet, as well as the special
symbols “space”, “comma”, and “period”, according to the following scheme.
In practical applications, one would probably use a larger set of symbols and a standard encoding such as
ASCII or UTF-8. But the above 29 symbols will be sufficient for our purposes. It will also come in handy
that 29 is prime.
that changing one letter of the plaintext always changes exactly one letter of the ciphertext. This is not a
desirable property, because it makes the cipher easy to break. Therefore, modern ciphers are designed to
satisfy a property called diffusion: changing one letter of the plaintext should change many letters of the
ciphertext.
In a block cipher, the plaintext is first divided into blocks of equal size, and then each block is en-
crypted separately. The block size is the number of plaintext symbols in each block. If the length of the
plaintext is not divisible by the block size, we pad the final block with additional spaces. In the context
of a block cipher, the diffusion property means that changing one symbol of a plaintext block potentially
affects every symbol of the ciphertext block. The following is an example of a block cipher.
The matrix A is called the encryption matrix of the cipher. It inverse A−1 is called the decryption
matrix.
Solution. We start by converting the message “Meet me tomorrow” to a sequence of scalars. We have
M = 13, E = 5, and so on. The encoded plaintext is 13, 5, 5, 20, 0, 13, 5, 0, 20, 15, 13, 15, 18, 18, 15, 23. Next,
we divide the plaintext into blocks of length 3. Since the length of the plaintext is not a multiple of three,
we pad the final block with spaces, i.e., with zeros.
Plaintext blocks: (13, 5, 5), (20, 0, 13), (5, 0, 20), (15, 13, 15), (18, 18, 15), (23, 0, 0).
To compute the ciphertext, we regard each plaintext block as a 3-dimensional column vector and multiply
by the encryption matrix A. All calculations are done modulo 29. For example, for the first block, we have
13 2 4 1 13 22
A 5 = 3 1 5 5 = 11 ,
5 1 3 2 5 9
4.9. Application: Cryptography 169
so the first ciphertext block is (22, 11, 9). We repeat the same with the remaining plaintext blocks.
20 24 5 1 15 10
A 0 = 9 , A 0 = 28 , A 13 = 17 ,
13 17 20 16 15 26
18 7 23 17
A 18 = 2 , A 0 = 11 .
15 15 0 23
Ciphertext blocks: (22, 11, 9), (24, 9, 17), (1, 28, 16), (10, 17, 26), (7, 2, 15), (17, 11, 23).
Solution. The process is analogous to encryption, except that we need to use the decryption matrix A−1
instead of A. We first calculate A−1 , keeping in mind that scalars are from the field Z29 . The method is the
same as in Example 4.71; we skip the individual steps in the interest of brevity.
2 4 1 1 0 0 1 0 0 23 20 11
[A | I] = 3 1 5 0 1 0 ≃ . . . ≃ 0 1 0 4 17 28 = I | A−1 .
1 3 2 0 0 1 0 0 1 26 8 11
Next, we convert the 15 ciphertext symbols “ RNOLFPHHCIGH DE ” to scalars and divide them into blocks
of length 3:
Ciphertext blocks: (18, 14, 15), (12, 6, 16), (8, 8, 3), (9, 7, 8), (0, 4, 5).
Now we decrypt each ciphertext block by a matrix multiplication with A−1 .
18 18 12 21 8 0
A−1 14 = 5 , A−1 6 = 18 , A−1 8 = 20 ,
15 20 16 14 3 15
9 0 0 19
A−1 7 = 2 , A−1 4 = 5 .
8 1 5 0
Plaintext blocks: (18, 5, 20), (21, 18, 14), (0, 20, 15), (0, 2, 1), (19, 5, 0).
170 Matrices
Converting these back to letters, and omitting the trailing space, we find that the plaintext is “return to
base”. ♠
It is important to note that, despite its good diffusion properties, the Hill cipher is not secure. The cipher
has many weaknesses. For one, because A0 = 0, a block of spaces in the plaintext will always be encrypted
as a block of spaces in the ciphertext, regardless of the encryption matrix A. More importantly, the cipher
is subject to a so-called known plaintext attack. If an eavesdropper intercepts some ciphertext for which
a small amount of the corresponding plaintext happens to be known, it is immediately possible to recover
the key and therefore decrypt the rest of the ciphertext. Carrying out this attack only requires some basic
knowledge of linear algebra. The following example illustrates how this is done.
“ EFNOR.AHIFNEPL.TSZS,RSKT.ZBBRFVUPFVZLFHNTV ” .
Eve knows that Alice uses a Hill cipher with block length 3, but she does not know the secret
encryption matrix. Eve also knows that Alice begins all of her correspondence with “My dear
love”. Decrypt the message.
Solution. The first three blocks of the ciphertext are “ EFNOR.AHI ”, i.e.,
Eve also knows that the first three blocks of the plaintext are “ MY DEAR L ”, i.e.,
These facts allow Eve to deduce the following information about the unknown decryption matrix A−1 :
5 13 15 4 1 18
A−1 6 = 25 , A−1 18 = 5 , A−1 8 = 0 .
14 0 28 1 9 12
Since Eve remembers the column method of matrix multiplication, she knows that these three equations
can be written as a single equation in matrix form:
5 15 1 13 4 18
A−1 6 18 8 = 25 5 0 .
14 28 9 0 1 12
Note that this equation is of the form A−1C = P. (Here, C stands for “ciphertext” and P for “plaintext”).
Multiplying both sides of the equation by C−1 on the right, we get A−1 = PC−1 . Thus, assuming that C is
invertible, Eve can easily compute the decryption matrix A−1 . Eve computes:
−1
5 15 1 19 8 23
C−1 = 6 18 8 = 0 5 2 .
14 28 9 22 1 0
4.9. Application: Cryptography 171
Armed with the decryption matrix A−1 , Eve can now decrypt Alice’s entire message, using the same
method as in Example 4.75. The plaintext is “My dear love, run away with me at midnight”. ♠
As the example shows, the Hill cipher is not secure at all. The main problem is that the cipher is linear, i.e.,
each component of a ciphertext block is a simple linear combination of the components of the plaintext
block. This linearity property enables Eve to break the cipher by solving a system of linear equations.
For this reason, all modern block ciphers have a non-linear component. Often this takes the form of
so-called S-boxes. An S-box is an operation that scrambles the symbols of the alphabet in a non-linear
way. For example, consider the following S-box, which is an operation from Z29 to Z29 :
x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
S(x) 17 9 27 2 20 12 21 26 16 18 4 24 23 7 19 14 28 29 1 15 10 22 6 5 25 11 13 3 8
The inputs of the S-box are shown in the top row, and the corresponding outputs in the bottom row. For
example, this S-box maps the input 7 to the output 26. We write S(7) = 26.
• Key mixing: add the next three components of the key to the components of the vector.
1 2 3
• Diffusion: multiply the vector by the fixed 3 × 3-matrix A = 3 1 2 .
2 3 1
• S-box application: apply the S-box to each component of the vector.
Finally, apply one more key mixing step at the end. The resulting vector is the ciphertext block.
The cipher can be visualized as follows:
k1 S k4 S k7 S k10
k3 S k6 S k9 S k12
Note that the three basic steps (key mixing, diffusion, and S-box application) are repeated several times;
each such repetition is called a round of the block cipher. The more rounds a block cipher has, the better
172 Matrices
its diffusion and non-linearity properties. The final round is short: it only consists of a key mixing step,
with no final diffusion or S-box application. The reason is that performing a final diffusion and S-box
application would not add anything to the security of the cipher. An attacker could simply undo these last
two steps, since they do not depend on the key.
The matrix A is called the diffusion matrix of the cipher. Note that, unlike for the Hill cipher, the
matrix A is fixed once and for all and is not part of the key. Instead, the key consists of scalars that are
added to the current block at the beginning of each round.
Solution. We first represent the plaintext as a sequence of blocks, padding the final block with zeros:
Plaintext blocks: (9, 0, 12), (9, 11, 5), (0, 13, 1), (20, 8, 0).
To encrypt the first block, we start with the vector [9, 0, 12]T and apply the following steps:
Round 1:
• Key mixing: the first three components of the key are 1, 1, 3. We add them to the plaintext.
9 1 10
0 + 1 = 1 .
12 3 15
Round 2:
• Diffusion:
1 2 3 11 7
3 1 2 7 = 28 .
2 3 1 23 8
• S-box application:
S(7) 26
S(28) = 8 .
S(8) 16
Round 3:
• Diffusion:
1 2 3 4 22
3 1 2 15 = 19 .
2 3 1 25 20
• S-box application:
S(22) 6
S(19) = 15 .
S(20) 10
• Key mixing: the next three components of the key are 9, 11, 11.
6 9 15
15 + 11 = 26 .
10 11 21
Therefore, the first ciphertext block is (15, 26, 21). We repeat the same procedure with the remaining
plaintext blocks, and obtain the following ciphertext blocks:
Ciphertext blocks: (15, 26, 21), (7, 24, 1), (2, 16, 23), (7, 20, 22).
our toy cipher, and indeed, all such ciphers rely on key mixing, diffusion, and non-linear S-boxes as their
key components.
For example, AES uses an alphabet size of 256 instead of 29 (i.e., it operates on bytes, rather than
elements of Z29 ). Although Z256 is not a field (because 256 is not prime), it nevertheless turns out that
there exists a field with 256 elements, and AES uses it for its algebraic operations. Our toy cipher’s
block size of 3 is much too small to achieve effective diffusion; modern real-world ciphers use block sizes
between 16 and 32 bytes (128 to 256 bits). The design of the S-boxes is a bit of a black art; at minimum,
they must be designed to withstand two common types of cryptanalysis known as linear cryptanalysis
and differential cryptanalysis. Among other things, this means that the S-box should be “as far from
linear” as possible.
A detailed discussion of the design and cryptanalysis of modern block ciphers is far beyond the scope
of this book, but we hope that you have gotten a taste of this fascinating subject, and the role that linear
algebra over finite fields plays in it.
Exercises
Exercise 4.9.1 Encrypt the message “Rendezvous at dawn” using the Hill cipher with block size 3 and
encryption matrix
2 1 1
A = 1 3 1 .
1 1 4
Exercise 4.9.2 Decrypt the message “ ERM DXYBJUWW.JWQLD,HL ” using the Hill cipher with block size 3
and encryption matrix
2 1 1
A = 1 3 1 .
1 1 4
Exercise 4.9.3 Eve intercepts the following encrypted message sent by Bob:
“ TGVXKHGSW,JU,JHYJSCDSBQIRPEV ”
Eve knows that Alice uses a Hill cipher with block length 2, but she does not know the secret encryption
matrix. Eve also knows that Bob begins all of his letters with “Hello”. Decrypt the message.
Exercise 4.9.4 Encrypt the message “Lost contact” using the block cipher of Definition 4.77 and the key
2, 3, 4, 1, 1, 1, 5, 5, 5, 4, 3, 2.
Exercise 4.9.5 Decrypt the message “ NRQEUAPOM GLFN ”, using the block cipher of Definition 4.77 and
the key 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4. Hint: To decrypt, we must perform all the encryption steps in reverse.
To undo a key mixing step, we subtract the relevant key components. To undo an S-box application, we
apply the S-box in reverse. To undo a diffusion step, multiply by the inverse of the diffusion matrix.
5. Spans, linear independence, and bases in Rn
5.1 Spans
Outcomes
A. Determine the span of a set of vectors.
Let u and v be two non-parallel vectors in Rn . We can picture the set of their linear combinations as
follows:
3u + 3v
1u + 3v
1u + 3v
3u + 2v
0u + 3v
−1u + 3v 3u + 1v
−1u + 2v 3u + 0v
−1u + 1v v 3u − 1v
u
0 2u − 1v
−1u + 0v
1u − 1v
0u − 1v
−1u − 1v
As the picture shows, the linear combinations of u and v form a 2-dimensional plane through the origin.
We say that this plane is spanned by the vectors u and v. This concept generalizes to more than two
vectors. For example, three vectors may span a 3-dimensional space (although sometimes, they span only
a 2-dimensional space, or even a line). This motivates the following definition.
175
176 Spans, linear independence, and bases in Rn
Solution. (a) For a vector to be in span {u, v}, it must be a linear combination of u and v. Therefore,
w ∈ span {u, v} if and only if we can find find scalars a, b such that a u + b v = w. We must therefore solve
the equation
1 3 2
a 1 +b 2 = 3 .
1 1 4
We write this as an augmented matrix and solve.
1 3 2 1 0 5
1 2 3 ≃ . . . ≃ 0 1 −1 .
1 1 4 0 0 0
The solution is a = 5 and b = −1. This means that w = 5u + (−1)v. Therefore, w is an element of
span {u, v}.
(b) We repeat the same method with the vector z. This time, we have to find a, b such that a u + b v = z.
The system of equations is
1 3 1 1 3 1
1 2 1 ≃ . . . ≃ 0 −1 0 ,
1 1 2 0 0 1
which is inconsistent. Therefore, there is no solution. We conclude that z is not an element of span {u, v}.
♠
Solution. Let w = [x, y, z]T be any vector. Proceeding as in the previous example, we know that w is an
element of span {u, v} if and only if the equation
1 3 x
a 1 +b 2 = y
1 1 z
is consistent. Note that the variables of this equation are a, b; we regard x, y, z as constants for the moment.
We write the augmented matrix of this system and reduce to echelon form:
1 3 x R2←R2 −R1 1 3 x 1 3 x
R ←R −R R ←R −2R
1 2 y 2 ≃3 1 0 −1 y − x 3 ≃3 2 0 −1 y−x ,
1 1 z 0 −2 z − x 0 0 (z − x) − 2(y − x)
From the echelon form, we see that the system is consistent if and only if (z − x) − 2(y − x) = 0, or
equivalently x − 2y + z = 0. Therefore, the vector w is in span {u, v} if and only if x − 2y + z = 0. In other
words, the span of u and v is the plane x − 2y + z = 0. ♠
Solution. Observe that w = 2 u + 3 v. Therefore, w is already in the span of u and v. Two sets are equal if
they have the same elements, i.e., each element of the first set is an element of the second set and vice versa.
Therefore, to show span {u, v, w} = span {u, v}, we must show (a) that every element of span {u, v, w} is
an element of span {u, v} and (b) vice versa.
(a) Let z be an arbitrary element of span {u, v, w}. Then, by definition of span, there exist scalars a, b, c
such that
z = a u + b v + c w.
z = a u + b v + c(2 u + 3 v)
= (a + 2c)u + (b + 3c)v.
It follows that z is a linear combination of u and v, and therefore, z ∈ span {u, v}.
(b) Clearly every linear combination of u and v is also a linear combination of u, v, and w, namely,
taking the coefficient of w to be 0. Therefore, every element of span {u, v} is an element of span {u, v, w}.
Because we have shown that every element of span {u, v, w} is an element of span {u, v} and vice
versa, it follows that span {u, v, w} and span {u, v} are the same set of vectors. ♠
In the situation of the last example, we say that the vector w is redundant; it does not contribute anything
to span {u, v, w}. Geometrically, the three vectors u, v, and w lie in a plane. Since the two vectors u and v
178 Spans, linear independence, and bases in Rn
are sufficient to span this plane, the third vector w is not really needed. We will study this situation more
systematically in the next section.
Solution. Consider what happens when we compute the sum of three numbers. We usually write this as
b1 + b2 + b3 . We can also compute the sum of three numbers by starting from 0 and then adding each of
the three numbers to it. I.e., the sum can be computed as 0 + b1 + b2 + b3 . Similarly, we can write the
sum of two numbers as 0 + b1 + b2 , and the sum of just one number as 0 + b1 . Continuing the pattern, it
follows that the sum of zero numbers should be 0:
Sum of 3 numbers: 0 + b1 + b2 + b3 .
Sum of 2 numbers: 0 + b1 + b2 .
Sum of 1 numbers: 0 + b1 .
Sum of 0 numbers: 0.
The sum of zero numbers is also called the empty sum. It is equal to the unit of addition, i.e., 0. By an
analogous argument, the empty sum of vectors is equal to the unit of vector addition, i.e., to the zero vector
0. In general, if we have k vectors u1 , . . . , uk , the span consists of all vectors of the form a1 u1 + . . . + ak uk ,
which is a sum of k vectors. In case k = 0, the span consists only of the empty sum, i.e., the zero vector 0,
which we also call the empty linear combination. Therefore, the span of the empty set of vectors is {0}.
♠
Exercises
Which of the following vectors are in span {u1 , . . . , u4 }? For each vector that is in the span, exhibit a linear
combination of u1 , . . . , u4 that equals this vector.
1 2 1
(a) x = 2 , (b) y = −3 , (c) z = 5 .
−1 −4 −2
1 2
Exercise 5.1.2 Describe the span of the vectors u = 0 and v = −1 in R3 .
2 1
5.2. Linear independence 179
2 1 1
Exercise 5.1.4 Let u = 1 , v = 3 , and w = 0 . Show that span {u, v, w} = span {u, v}.
0 5 −1
Exercise 5.1.5 Suppose {u1 , . . ., uk } is a set of vectors from Rn . Show that 0 ∈ span {u1 , . . . , uk }.
Exercise 5.1.6 In this exercise, we use scalars from the field Z5 of integers modulo 5 instead of real
numbers (see Section 1.8, “Fields”). Consider the vectors
1 0 2 3
u1 = 4 , u2 = 2 , u3 = 1 , u4 = 1 .
3 3 3 0
Which of the following vectors are in span {u1 , . . . , u4 }? For each vector that is in the span, exhibit a linear
combination of u1 , . . . , u4 that equals this vector.
0 1 2
(a) x = 1 , (b) y = 1 , (c) z = 0 .
4 2 4
Outcomes
A. Find the redundant vectors in a set of vectors.
In Example 5.4, we encountered three vectors u, v, and w such that span {u, v, w} = span {u, v}. If this
happens, then the vector w does not contribute anything to the span of {u, v, w}, and we say that w is
redundant. The following definition generalizes this notion.
180 Spans, linear independence, and bases in Rn
u j = a1 u1 + a2 u2 + . . . + a j−1 u j−1
for some scalars a1 , . . . , a j−1 . We say that the sequence of vectors u1 , . . . , uk is linearly depen-
dent if it contains one or more redundant vectors. Otherwise, we say that the vectors are linearly
independent.
0 3 1 4 3 2
Solution.
• The vector u1 is redundant, because it is a linear combination of earlier vectors. (Although there are
no earlier vectors, recall from Example 5.5 that the empty sum of vectors is equal to the zero vector
0. Therefore, u1 is indeed an (empty) linear combination of earlier vectors.)
• The vector u2 is not redundant, because it cannot be written as a linear combination of u1 . This is
because the system of equations
0 1
0 2
0 2
0 3
has no solution.
• The vector u3 is not redundant, because it cannot be written as a linear combination of u1 and u2 .
This is because the system of equations
0 1 1
0 2 1
0 2 1
0 3 1
has no solution.
• The vector u5 is not redundant, because This is because the system of equations
0 1 1 2 0
0 2 1 3 1
0 2 1 3 2
0 3 1 4 3
has no solution.
In summary, the vectors u1 , u4 , and u6 are redundant, and the vectors u2 , u3 , and u5 are not. It follows
that the vectors u1 , . . . , u6 are linearly dependent. ♠
The last example shows that it can be a lot of work to find the redundant vectors in a sequence of k vectors.
Doing so in the naive way require us to solve up to k systems of linear equations! Fortunately, there is a
much faster and easier method, the so-called casting-out algorithm.
Solution. Following the casting-out algorithm, we write the vectors u1 , . . . , u6 as the columns of a matrix
and reduce to echelon form.
0 1 1 2 0 3 0 1 1 2 0 3
0 2 1 3 1 3 3
≃ . . . ≃ 0 0 1 1 −1 .
0 2 1 3 2 2 0 0 0 0 1 −1
0 3 1 4 3 2 0 0 0 0 0 0
The pivot columns are columns 2, 3, and 5. The non-pivot columns are columns 1, 4, and 6. Therefore,
the vectors u1 , u4 , and u6 are redundant. Note that this is the same answer we got in Example 5.7. ♠
The above version of the casting-out algorithm only tells us which of the vectors (if any) are redundant,
but it does not give us a specific way to write the redundant vectors as linear combinations of previous
vectors. However, we can easily get this additional information if we reduce the matrix all the way to
reduced echelon form. We call this version of the algorithm the extended casting-out algorithm.
182 Spans, linear independence, and bases in Rn
Solution. Once again, we write the vectors u1 , . . . , u6 as the columns of a matrix. This time we use the
extended casting-out algorithm, which means we reduce the matrix to reduced echelon form instead of
echelon form.
0 1 1 2 0 3 0 1 0 1 0 1
0 2 1 3 1 3 0 0 1 1 0 2
0 2 1 3 2 2 ≃ . . . ≃ 0 0 0 0 1 −1 .
0 3 1 4 3 2 0 0 0 0 0 0
As before, the non-pivot columns are columns 1, 4, and 6, and therefore, the vectors u1 , u4 , and u6 are
redundant. The non-redundant vectors are u2 , u3 , and u5 . Moreover, the entries in the sixth column are 1,
2, and −1. Note that this means that the sixth column can be written as 1 times the second column plus 2
times the third column plus (−1) times the fourth column. The same coefficients can be used to write u6
as a linear combination of previous non-redundant columns, namely:
u6 = 1 u2 + 2 u3 − 1 u5 .
Also, the entries in the fourth column are 1 and 1, which are the coefficients for writing u4 as a linear
combination of previous non-redundant columns, namely:
u4 = 1 u2 + 1 u3 .
Finally, there are no non-zero entries in the first column. This means that u1 is the empty linear combina-
tion
u1 = 0.
♠
Our definition of redundant vectors depends on the order in which the vectors are written. This is because
each redundant vector must be a linear combination of earlier vectors in the sequence. For example, in
5.2. Linear independence 183
v is redundant because v = 13 w − 23 u, but neither u nor w are redundant. Note that none of the vectors have
changed; only the order in which they are written is different. Yet w is the redundant vector in the first
sequence, and v is the redundant vector in the second sequence.
Because we defined linear independence in terms of the absence of redundant vectors, you may sus-
pect that the concept of linear independence also depends on the order in which the vectors are written.
However, this is not the case. The following theorem gives an alternative characterization of linear inde-
pendence that is more symmetric (it does not depend on the order of the vectors).
Proof. Let A be the n × k-matrix whose columns are u1 , . . . , uk . We know from the theory of homogeneous
systems that the system a1 u1 + . . . + ak uk = 0 has no non-trivial solution if and only if every column of
the echelon form of A is a pivot column. By the casting-out algorithm, this is the case if and only if none
of the vectors u1 , . . . , uk are redundant, i.e., if and only if the vectors are linearly independent. ♠
has a non-trivial solution. If it does, the vectors are linearly dependent. On the other hand, if there is only
the trivial solution, the vectors are linearly independent. We write the augmented matrix and solve:
1 0 1 2 0 1 0 1 2 0
1 1 2 3 0
≃ ... ≃ 0 1 1 1 0 .
2 1 3 7 0 0 0 1 0 0
0 1 2 1 0 0 0 0 2 0
Since every column is a pivot column, there are no free variables; the system of equations has a unique
solution, which is a1 = a2 = a3 = a4 = 0, i.e., the trivial solution. Therefore, the vectors u1 , . . . , u4 are
linearly independent. ♠
Solution. As in the previous example, we must check whether the equation a1 u1 + a2 u2 + a3 u3 = 0 has a
non-trivial solution. Once again, we write the augmented matrix and solve:
1 1 0 0 1 1 0 0
1 3 4 0 ≃ ... ≃ 0 2 4 0 .
0 1 2 0 0 0 0 0
Since column 3 is not a pivot column, a3 is a free variable. Therefore, the system has a non-trivial solution,
and the vectors are linearly dependent.
With a small amount of extra work, we can find an actual non-trivial solution of a1 u1 + a2 u2 + a3 u3 =
0. All we have to do is set a3 = 1 and do a back substitution. We find that (a1 , a2 , a3 ) = (2, −2, 1) is a
solution. In other words,
2u1 − 2u2 + u3 = 0.
We can also use this information to write u3 as a linear combination of previous vectors, namely, u3 =
−2u1 + 2u2 . ♠
The characterization of linear independence in Theorem 5.12 is mostly useful for theoretical reasons.
However, it can also help in solving problems such as the following.
Solution. By Theorem 5.12, to check whether the vectors are linearly independent, we must check whether
the equation
a(u + v) + b(2u + w) + c(v − 5w) = 0 (5.1)
5.2. Linear independence 185
has non-trivial solutions. If it does, the vectors are linearly dependent, if it does not, they are linearly
independent. We can simplify the equation as follows:
Since u, v, and w are linearly independent, we know, again by Theorem 5.12, that equation (5.2) only has
the trivial solution. Therefore,
a + 2b = 0,
a + c = 0,
b − 5c = 0.
We can solve this system of three equations in three variables, and we find that it has the unique solution
a = b = c = 0. Therefore, a = b = c = 0 is the only solution to equation (5.1), which means that the vectors
u + v, 2u + w, and v − 5w are linearly independent. ♠
Proof.
1. This follows from Theorem 5.12, because whether or not the equation a1 u1 + . . . + ak uk = 0 has a
non-trivial solution does not depend on the order in which the vectors are written.
2. If one of the vectors in the sequence u1 , . . . , u j were redundant, then it would be redundant in the
longer sequence u1 , . . . , uk as well.
3. Let A be the n × k-matrix that has the vectors u1 , . . . , uk as its columns and suppose that k > n. Then
the rank of A is at most n, so the echelon form of A has some non-pivot columns. Therefore, the
system a1 u1 + . . . + ak uk = 0 has non-trivial solutions, and the vectors are linearly dependent by
Theorem 5.12.
♠
186 Spans, linear independence, and bases in Rn
Solution. Since these are 3 vectors in R2 , they are linearly dependent by Proposition 5.16. No calculation
is necessary. ♠
In general, there is more than one way of writing a given vector as a linear combination of some spanning
vectors. For example, consider
1 0 1 1
u1 = 1 , u2 = 1 , u3 = 2 , v = 3 .
0 1 1 2
v = −u1 + 2u3 ,
v= u2 + u3 ,
v= u1 + 2u2 ,
v = 2u1 + 3u2 − u3 .
However, when the vectors u1 , . . . , uk are linearly independent, this does not happen. In this case, the linear
combination is always unique, as the following theorem shows.
Proof. We already know that every vector v ∈ span {u1 , . . . , uk } can be written as a linear combination of
u1 , . . . , uk , because that is the definition of span. So what must be proved is the uniqueness. Suppose,
therefore, that there are two ways of writing v as such a linear combination, i.e., that
v = a1 u1 + a2 u2 + . . . + ak uk and
v = b1 u1 + b2 u2 + . . . + bk uk .
Since u1 , . . . , uk are linearly independent, we know by Theorem 5.12 that the last equation only has the
trivial solution, i.e., a1 − b1 = 0, a2 − b2 = 0, . . . , ak − bk = 0. It follows that a1 = b1 , a2 = b2 , . . . , ak = bk .
5.2. Linear independence 187
We have shown that any two ways of writing v as a linear combination of u1 , . . . , uk are equal. Therefore,
there is only one way of doing so. ♠
Consider the span of some vectors u1 , . . . , uk . As we just saw in the previous subsection, the span is
especially nice when the vectors u1 , . . . , uk are linearly independent, because in that case, every element v
of the span can be uniquely written in the form v = a1 u1 + . . . + ak uk .
But what if we have a span of some vectors u1 , . . . , uk that are not linearly independent? It turns out
that we can always find some linearly independent vectors that span the same set. In fact, this can be done
by simply removing the redundant vectors from u1 , . . . , uk . This is the subject of the following theorem.
Proof. Remove the redundant vectors one by one, from right to left. Each time a redundant vector is
removed, the span does not change; the proof of this is similar to Example 5.4. Moreover, the resulting
sequence of vectors span u j1 , . . . , u jℓ is linearly independent, because if any of these vectors were a
linear combination of earlier ones, then it would have been redundant in the original sequence of vectors,
and would have therefore been removed. ♠
3 −6 1 5
Therefore, the redundant vectors are u2 and u4 . We remove them (“cast them out”) and are left with u1 and
u3 . Therefore, by Theorem 5.19, {u1 , u3 } is linearly independent and span {u1 , u3 } = span {u1 , . . . , u4 }.
♠
188 Spans, linear independence, and bases in Rn
Exercises
Exercise 5.2.1 Which of the following vectors are redundant? If there are redundant vectors, write each
of them as a linear combination of previous vectors.
1 2 1 1
u1 = 0 , u2 = 0 , u3 = 2 , u4 = 6 .
1 2 1 1
Exercise 5.2.2 Which of the following vectors are redundant? If there are redundant vectors, write each
of them as a linear combination of previous vectors.
−1 −3 0 0
−2
u1 = , u2 = −4 , u3 = −1 , u4 = 2 .
2 3 4 −3
3 3 3 −6
Exercise 5.2.3 Use the method of Theorem 5.12 to determine whether the following vectors are linearly
independent. If they are linearly dependent, find a non-trivial linear combination of the vectors that is
equal to 0.
2 −1 −3 5
u1 = 3 , u2 = 0 , u3 = −4 , u4 = 6 .
2 2 −2 2
Exercise 5.2.4 Use the method of Theorem 5.12 to determine whether the following vectors are linearly
independent. If they are linearly dependent, find a non-trivial linear combination of the vectors that is
equal to 0.
1 1 3 1
−1 6 5 0
u1 =
0 , u2 = 7 , u3 = 8 , u4 = 1 .
1 1 3 1
Exercise 5.2.5 Are the following vectors linearly independent? If not, write one of them as a linear
combination of the others.
1 1 1
u = 3 , v = 4 , w = 1 .
1 2 −1
Exercise 5.2.6 Find a linearly independent set of vectors that has the same span as the given vectors.
2 1 3 3
u1 = 0 , u2 = 3 , u3 = 3 , u4 = −3 .
3 5 8 1
5.2. Linear independence 189
Exercise 5.2.7 Find a linearly independent set of vectors that has the same span as the given vectors.
1 2 1 1
3 6 0 2
u1 =
3 , u2 = 6 , u3 = −3 , u4 = 1 .
1 2 1 1
Exercise 5.2.12 In this exercise, we use scalars from the field Z3 of integers modulo 3 instead of real
numbers (see Section 1.8, “Fields”). Use the extended casting-out algorithm to determine which of the
following vectors are redundant. If there are redundant vectors, write each of them as a linear combination
of previous vectors.
2 1 2 0
u1 = 1 , u2 = 1 , u3 = 0 , u4 = 1 .
0 2 2 1
190 Spans, linear independence, and bases in Rn
Exercise 5.2.13 Let u, v, w be linearly independent vectors in Rn . Are the vectors u + v, 2u + w, and
w − 2v linearly independent?
Exercise 5.2.14 Let u, v, w be linearly independent vectors in Rn . Are the vectors u + v, u + w, and w + v
linearly independent?
Exercise 5.2.15 Suppose A is an m × n-matrix and {w1 , . . . , wk } is a linearly independent set of vectors
in Rm . Now suppose Azi = wi . Show {z1 , . . . , zk } is also linearly independent.
5.3 Subspaces of Rn
Outcomes
A. Determine whether a subset of Rn is a subspace.
As we saw earlier, the span of 0 vectors in Rn is a point, namely the set {0}. The span of one non-zero
vector is a line through the origin, and the span of two linearly independent vectors is a plane through the
origin.
Span of 0 vectors: a point Span of one vector: a line Span of two vectors: a plane
We also call these sets, respectively, a 0-dimensional subspace, a 1-dimensional subspace, and a 2-
dimensional subspace of Rn . The purpose of this section is to generalize this concept of subspace to
arbitrary dimensions.
3. V is closed under scalar multiplication, i.e., for all u ∈ V and scalars k, we have k u ∈ V .
5.3. Subspaces of Rn 191
Notice that the subset V = {0} is a subspace of Rn (called the zero subspace). Every line or plane through
the origin is a subspace. Moreover, the entire space Rn is a subspace of itself. A subspace that is not the
entire space Rn is referred to as a proper subspace of Rn .
Proof. Let S = span {u1 , . . . , uk }. To verify that S is a subspace of Rn , we must check that the three
conditions of Definition 5.21 hold.
• We have 0 ∈ S because 0 = 0u1 + . . . + 0uk .
• Suppose u, w ∈ S. By definition of span, there exist scalars a1 , . . . , ak and b1 , . . . , bk such that u =
a1 u1 + . . . + ak uk and w = b1 u1 + . . . + bk uk . Therefore,
u + w = u = (a1 + b1 )u1 + . . . + (ak + bk )uk .
It follows that u + w ∈ S = span {u1 , . . . , uk }, so that S is closed under addition.
• Suppose u ∈ S and t is a scalar. Then by definition of span, there exists scalars a1 , . . . , ak such that
u = a1 u1 + . . . + ak uk . Then
t u = (ta1)u1 + . . . + (tak )uk ,
and thus t u ∈ S. It follows that S is closed under scalar multiplication.
Since S = span {u1 , . . . , uk } satisfies all three conditions, it follows that it is a subspace of Rn . ♠
Solution. The line L is simply the span of the vector d, i.e., L = span {d}. Therefore, it is a subspace by
Proposition 5.22. ♠
Proof. To show that V is a subspace of Rn , we check the three conditions of Definition 5.21.
192 Spans, linear independence, and bases in Rn
• We have 0 ∈ V because A0 = 0.
• To show that V is closed under scalar multiplication, suppose u ∈ V and t is a scalar. Then by
definition of V , we have Au = 0. It follows that
A(t u) = t(Au) = t 0 = 0.
Therefore, t u ∈ V .
Solution. None of them are subspaces. Neither the line (a) nor the plane (b) contains the origin 0, so they
fail to satisfy the first condition of subspaces. The set of vectors in (c) contains 0. It is also closed under
5.3. Subspaces of Rn 193
addition. However, it fails to be closed under scalar multiplication. For example, let u = [1, 1, 1]T . Then
u ∈ W , but (−1)u 6∈ W . ♠
Exercises
Is M a subspace of R4 ? Explain.
Is M a subspace of R4 ? Explain.
Exercise 5.3.4 In this exercise, we use scalars from the field Z2 of integers modulo 2 instead of real
numbers (see Section 1.8, “Fields”). Which of the following sets are subspaces of (Z2 )3 ?
194 Spans, linear independence, and bases in Rn
1 0 1 0
(a) V1 = 1 , 1 , 0 , 0 .
0 1 1 0
1 1 1 0
(b) V2 = 1 , 1 , 0 , 0 .
1 0 0 0
0
(c) V3 = 0 .
0
1 1 0 0
(d) V3 = 1 , 0 , 1 , 0 .
1 0 0 1
Exercise 5.3.5 Suppose V ,W are subspaces of Rn . Let V ∩W be the set of all vectors that are in both V
and W . Show that V ∩W is also a subspace.
Exercise 5.3.6 Let V be a subset of Rn . Show that V is a subspace if and only if it is non-empty and the
following condition holds: for all u, v ∈ V and all scalars a, b ∈ R,
au + bv ∈ V .
Exercise 5.3.7 Let u1 , . . . , uk be vectors in Rn , and let S = span {u1 , . . ., uk }. Show that S is the smallest
subspace of Rn that contains u1 , . . . , uk . Specifically, this means you have to show: if V is any other
subspace of Rn such that u1 , . . . , uk ∈ V , then S ⊆ V .
Outcomes
A. Find a basis for a subspace of Rn .
B. Use the casting-out algorithm to find a basis for a subspace given as a span.
C. Use basic solutions to find a basis for a subspace given as the solution space of a homogeneous
system of equations.
D. Find the coordinates of a vector with respect to a basis.
E. Find the dimension of a subspace of Rn .
F. Extend a set of linearly independent vectors to a basis.
G. Shrink a spanning set to a basis by removing redundant vectors.
H. Determine whether k vectors form a basis of a k-dimensional space.
5.4. Basis and dimension 195
We saw in Proposition 5.22 that spans are subspaces of Rn . Interestingly, the converse is also true: every
subspace of Rn is the span of some finite set of vectors.
V = span {u1 , . . . , uk } .
1. Otherwise, V contains some non-zero vector. Pick a non-zero vector u1 in V . If V = span {u1 }, we
are done.
2. Otherwise, pick a vector u2 in V that is not in span {u1 }. If V = span {u1 , u2 }, we are done.
3. Otherwise, pick a vector u3 in V that is not in span {u1 , u2 }. If V = span {u1 , u2 , u3 }, we are done.
Continue in this way. Note that after the j th step of this process, the vectors u1 , . . . , u j are linearly inde-
pendent. This is because, by construction, no vector is in the span of the previous vectors, and therefore
no vector is redundant. By Proposition 5.16(3), there can be at most n linearly independent vectors in Rn .
Therefore the process must stop after k steps for some k ≤ n. But then V = span {u1 , . . . , uk }, as desired.
♠
In summary, every subspace of Rn is spanned by a finite, linearly independent collection of vectors. Such
a collection of vectors is called a basis of the subspace.
Proof. To see that it is a basis of Rn , first notice that the vectors e1 , e2 , . . . , en span Rn . Indeed, every
vector v = [x1 , . . . , xn ]T ∈ Rn can be written as v = x1 e1 + . . . + xn en . Second, the vectors e1 , e2 , . . . , en are
evidently linearly independent, because none of these vectors can be written as a linear combination of
previous vectors. Since the vectors span Rn and are linearly independent, they form a basis of Rn . ♠
form a basis of R3 .
Solution. We must check that the vectors u1 , u2 , u3 are linearly independent and span R3 . To check linear
independence, we use the casting-out algorithm.
1 0 −1 1 0 −1
2 1 0 ≃ 0 1 2 .
1 0 1 0 0 2
Since all columns are pivot columns, there are no redundant vectors, so u1 , u2 , u3 are linearly independent.
To check that they span all of R3 , let w = [x, y, z]T be an arbitrary element of R3 . We must show that w is
a linear combination of u1 , u2 , u3 . This amounts to solving the system of equations
a1 u1 + a2 u2 + a3 u3 = w,
or in augmented matrix form,
1 0 −1 x 1 0 −1 x
2 1 0 y ≃ 0 1 2 y − 2x .
1 0 1 z 0 0 2 z−x
5.4. Basis and dimension 197
The system is clearly consistent, so it has a solution, and therefore w is indeed a linear combination of
u1 , u2 , u3 . Since w was an arbitrary vector of R3 , it follows that u1 , u2 , u3 span R3 . ♠
Generalizing the last example, we find that a set of n vectors forms a basis of Rn if and only if the matrix
having those vectors as its columns is invertible. This is the content of the following proposition.
Solution. Let S = span {u1 , . . . , u5 }. By Theorem 5.19, we know that if we remove the redundant vectors
from {u1 , . . . , u5 }, then the remaining vectors will be linearly independent and will still span S. In other
words, the remaining vectors will be a basis for S. We use the casting-out algorithm to identity the
redundant vectors:
2 −1 1 3 −1 2 −1 1 3 −1 2 −1 1 3 −1
0 0 3 5 1 ≃ 0 0 3 5 1 ≃ 0 0 3 5 1 .
−2 1 5 7 3 0 0 6 10 2 0 0 0 0 0
Since columns 2, 4, and 5 are the non-pivot columns, it follows that the vectors u2 , u4 , and u5 are redun-
dant. Therefore, the desired basis is {u1 , u3 }. ♠
x + y − z + 3w − 2v = 0,
x + y + z − 11w + 8v = 0,
4x + 4y − 3z + 5w − 3v = 0.
From the reduced echelon form, we see that y, w, and v are free variables. The general solution is:
x −3 4 −1
y 0 0 1
z = t −5 + s 7 + r 0 .
w 0 1 0
v 1 0 0
Thus, the solution space is spanned by the vectors
−3 4 −1
0
0
1
−5 , 7 , 0 .
0 1 0
1 0 0
Moreover, these vectors are evidently linearly independent, because each vector contains a 1 in a position
where all the previous vectors have 0 (and therefore, none of the vectors can be written as a linear com-
bination of previous vectors). It follows that the above three vectors form a basis of the solution space.
♠
Note that the basis vectors of the solution space are exactly what we called the basic solutions in Sec-
tion 1.6.
Let V be a subspace of Rn . A basis of V is essentially the same thing as a coordinate system for V .
To see why, let B = {u1 , . . . , uk } be some basis of V . This means that the vectors u1 , . . . , uk are linearly
independent and span V . Because the basis vectors are spanning, every vector v ∈ V can be written as a
linear combination of basis vectors
v = a1 u1 + . . . + ak uk .
Moreover, because the basis vectors are linearly independent, it follows by Theorem 5.18 that the coeffi-
cients a1 , . . . , ak are unique. We say that a1 , . . . , ak are the coordinates of v with respect to the basis B,
and we write
a1
[v]B = a2 .
a3
1
2
u3
u2
−2 1
−1
0 u1
1
−1 2
−1
−2
Basis as coordinate system
5.4. Basis and dimension 199
♠
In case the basis is the standard basis, the coordinates are just the usual ones, as the following example
illustrates:
Example 5.35: Find a vector from its coordinates in the standard basis
Find the vector v that has coordinates
1
[v]B = −1 ,
2
We have to calculate
1
v = 1e1 − 1e2 + 2e3 = −1 .
2
We see that the coordinates of any vector with respect to the standard basis are just the usual components
of the vector. ♠
We can also ask to find the coordinates of a given vector in a given basis.
200 Spans, linear independence, and bases in Rn
5.4.4. Dimension
One of the most important properties of bases is that any two bases for the same space must be of the same
size. To show this, we will need the the following fundamental result, called the Exchange Lemma. This
lemma states that spanning sets have at least as many vectors as linearly independent sets.
Proof. Since each u j is an element of span {v1 , . . . , vs }, there exist scalars ai j such that
u j = a1 j v1 + . . . + as j vs .
Let A = ai j . Note that this matrix has s rows and r columns, i.e., it is an s × r-matrix. Now suppose,
for the sake of obtaining a contradiction, that r > s. Then by Theorem 1.35, the system Ax = 0 has a
non-trivial solution x, i.e., there exists x 6= 0 such that Ax = 0. In other words, for all i = 1, . . ., s,
ai1 x1 + . . . + air xr = 0.
Therefore,
x1 u1 + . . . + xr ur = x1 (a11 v1 + . . . + as1 vs ) + . . . + xr (a1r v1 + . . . + asr vs )
5.4. Basis and dimension 201
Proof. This follows right away from the Exchange Lemma. Indeed, observe that B1 = {u1 , . . . , us } is a
spanning set for V while B2 = {v1 , . . . , vr } is linearly independent, so s ≥ r. Similarly B2 = {v1 , . . . , vr } is
a spanning set for V while B1 = {u1 , . . . , us } is linearly independent, so r ≥ s. ♠
Because every basis of V has the same number of vectors, we give this number a special name. It is called
the dimension of V .
Solution. We know that V is a subspace of R3 , because it is the solution space of a system of a homoge-
neous system of equations (in this case, one equation in three variables). We can take y = t and z = s as
the free variables and solve for x = y − 2z = t − 2s. Therefore, a general element of V is of the form
x t − 2s 1 −2
y = t = t 1 +s 0 .
z s 0 1
202 Spans, linear independence, and bases in Rn
Thus,
1 −2
V = span 1 , 0 .
0 1
Since the two spanning vectors are linearly independent, they form a basis of V , and thus dim(V ) = 2.
♠
Note that the dimension of the solution space of a system of equations is equal to the number of parameters
in the general solution, which is equal to the number of free variables. For this reason, the dimension is
also sometimes called the number of degrees of freedom.
Solution. Let
1 1 8 −6 1 1
2 3 19 3 5
u1 = , u3 = , u4 = −15 , u5 = , u6 = ,
−1 , u2 = −1 −8 6 0 0
1 1 8 −6 1 1
so that W = span {u1 , . . . , u6 }. We use the casting-out algorithm to remove any redundant vectors from
u1 , . . . , u6 . The remaining vectors will be linearly independent, and therefore a basis of the span.
1 1 8 −6 1 1 1 0 5 −3 0 −2
2 3 19 −15 3 5 2
≃ 0 1 3 −3 0 .
−1 −1 −8 6 0 0 0 0 0 0 1 1
1 1 8 −6 1 1 0 0 0 0 0 0
Therefore, the vectors u3 , u4 , and u6 are redundant, and {u1 , u2 , u5 } is a basis of W . It follows that
dim(W ) = 3. ♠
Of course, the theorem does not mean that the basis is unique. Usually, a subspace of Rn will have many
different bases. The theorem just states that there exists at least one.
Sometimes, when we are looking for a basis of a space, we may already have a number of linearly
independent vectors. We would like to obtain a basis by adding some additional linearly independent
vectors to the ones we already have. The following lemma guarantees that this can always be done.
Proof. By Theorem 5.43, we know that V has some basis, say {v1 , . . . , vk }. However, this may not be the
basis we are looking for, because maybe it does not contain the vectors u1 , . . . , uℓ . Consider the sequence
of ℓ + k vectors
u1 , . . . , uℓ , v1 , . . . , vk .
Since V is spanned by the vectors v1 , . . . , vk , it is certainly also spanned by the larger set of vectors
u1 , . . . , uℓ , v1 , . . . , vk . From Theorem 5.19, we know that we can obtain a basis of V by removing the re-
dundant vectors from u1 , . . . , uℓ , v1 , . . . , vk . On the other hand, u1 , . . . , uℓ are linearly independent, so none
of them can be redundant. It follows that the resulting basis of V contains all of the vectors u1 , . . . , uℓ . In
other words, we have found a basis of V that is an extension of u1 , . . . , uℓ , which is what had to be shown.
♠
Solution. Let {e1 , . . . , e4 } be the standard basis of R4 . We obtain the desired basis by applying the casting-
out algorithm to u1 , u2 , e1 , e2 , e3 , e4 :
1 1 1 0 0 0 1 1 1 0 0 0
1 2 0 1 0 0
≃ 0 1 −1 1 0 0 .
−1 −2 0 0 1 0 0 0 0 1 1 0
2 4 0 0 0 1 0 0 0 0 2 1
204 Spans, linear independence, and bases in Rn
Therefore, we cast out the vectors e1 and e4 and keep the rest. The resulting basis is
1 1 0 0
2 1 0
1
{u1 , u2 , e2 , e3 } = , , , .
−1 −2 0 1
2 4 0 0
However, this is not the basis we are looking for, because it does not extend {u1 , u2 }. To get a basis of V
that extends {u1 , u2 }, we perform the casting-out algorithm on the vectors u1 , u2 , v1 , v2 , v3 :
1 −2 −2 −1 1 1 −2 −2 −1 1
−1 1 1 0 0 1 1 −1
≃ ... ≃ 0 1 .
1 1 0 1 0 0 0 1 1 −2
0 1 0 0 1 0 0 0 0 0
We also have a kind of opposite of Lemma 5.44: every spanning set can be shrunk to a basis.
Proof. This is merely a restatement of Theorem 5.19. We obtain the linearly independent subset by remov-
ing the redundant vectors, which can be achieved by the casting-out algorithm. See also Example 5.20.
♠
The following proposition tells us something about the size of a linearly independent set of vectors or the
size of a spanning set of vectors.
Proof. Both properties follow from the Exchange Lemma (Lemma 5.37). Since V is k-dimensional, it has
some basis consisting of k vectors v1 , . . . , vk .
(a) Suppose u1 , . . . ur are linearly independent vectors in V . Since u1 , . . . ur are linearly independent and
v1 , . . . , vk are spanning, the Exchange Lemma implies that r ≤ k.
(b) Suppose the vectors u1 , . . . us span V . Since v1 , . . . , vk are linearly independent and u1 , . . . us are
spanning, the Exchange Lemma implies that k ≤ s.
♠
The next proposition often comes in handy when we need to check that some set of vectors is a basis for
a subspace V , where the dimension of V is already known. If dim(V ) = k, we know that any basis has to
have size k. Interestingly, to check that a set of k vectors is a basis of V , it is sufficient to check either that
it is linearly independent or that it is spanning. This can save half the work in checking that some set of
vectors is a basis (but it only works if the number of vectors is exactly k, the dimension of V ).
Proof. The first claim is an easy consequence of Lemma 5.44. Assume that u1 , . . . , uk are linearly inde-
pendent. By Lemma 5.44, we can add zero or more vectors to u1 , . . . , uk to obtain a basis of V . On the
206 Spans, linear independence, and bases in Rn
other hand, since V is k-dimensional, every basis must have exactly k elements, so that the only possibility
is that we have added zero vectors. Therefore, {u1 , . . . , uk } is already a basis of V , as claimed.
To prove the second claim, assume that V = span {u1 , . . . , uk }. By Theorem 5.27, there exists a linearly
independent subset of {u1 , . . . , uk } that spans V , i.e., that is a basis for V . But since dim(V ) = k, every
basis must have exactly k elements, so that the only possible such subset is {u1 , . . . , uk } itself. Therefore,
{u1 , . . . , uk } is a basis of V , as claimed. ♠
It is important to note that Proposition 5.49 does not say that every linearly independent set of vectors in
V is a basis. For example, a set of k − 1 or fewer linearly independent vectors will not be spanning. Also,
the proposition does not say that every spanning set of vectors in V is a basis. For example, a set of k + 1
or more spanning vectors will not be linearly independent. Rather, what the proposition is saying is that if
we have exactly k vectors in a k-dimensional space, then linear independence implies spanning and vice
versa.
Solution. This is similar to Example 5.30. But because we know that R3 is a 3-dimensional space, and
because we have exactly 3 vectors, by Proposition 5.49, we only need to check either whether u1 , u2 , u3
are linearly independent or whether they are spanning. We check whether they are linearly independent
by using the casting-out algorithm.
1 1 1 1 1 1
2 1 1 ≃ 0 −1 −1 .
3 1 2 0 0 −1
Since the matrix has rank 3, the vectors u1 , u2 , u3 are linearly independent. Therefore, by Proposition 5.49,
they form a basis of R3 . ♠
The following proposition is also a consequence of Lemma 5.44. It says that smaller subspaces have
smaller dimension.
Proof. Consider any basis u1 , . . . , uk of V . Because V ⊆ W , the vectors u1 , . . . , uk are linearly independent
elements of W , and therefore can be extended to a basis of W by Lemma 5.44. The resulting basis
of W has at least k elements, i.e., dim(V ) ≤ dim(W ). To prove the last claim, assume moreover that
dim(V ) = dim(W ). In that case, dim(W ) = k, so that the k linearly independent vectors u1 , . . . , uk form a
5.4. Basis and dimension 207
basis of W by Proposition 5.49. Since both V and W are spanned by u1 , . . . , uk , we must have V = W , as
claimed. ♠
Exercises
Exercise 5.4.1 For each of the following subspaces of R4 , find a basis and determine the dimension.
2 −1 5 −1
2 1
1 0
(a) V1 = span
1 , −1 , 3 , −2 .
1 −1 3 −2
0 −1 2 0
−1 3 1
1
(b) V2 = span
1 , −2 , 5 , 2 .
−1 2 −5 −2
−2 −9 −33 −22
4 15 10
1
(c) V3 = span
1 , 3 , 12 ,
.
8
−3 −9 −36 −24
−1 −4 −3 −1 −7
1 3 2 1 5
(d) V4 = span , , , , .
−1 −2 −1 −2 −3
−2 −4 −2 −4 −6
Exercise 5.4.2 Find a basis and the dimension of each of the following subspaces of Rn .
4u + v − 5w
(a) S1 = 12u + 6v − 6w u, v, w ∈ R .
4u + 4v + 4w
2u + 6v + 7w
−3u − 9v − 12w
(b) S2 = 2u + 6v + 6w
u, v, w ∈ R .
u + 3v + 3w
2u + v
(c) S3 =
6v − 3u + 3w u, v, w ∈ R .
3v − 6u + 3w
Exercise 5.4.3 Find a basis and the dimension of each of the following subspaces of Rn .
208 Spans, linear independence, and bases in Rn
u
(a) W1 =
v u + v = 0 and u − 2w = 0 .
w
u
(b) W2 = v u + v + w = 0 .
w
u
v
(c) S = u + v = w + x and u + w = v + x .
w
x
Exercise 5.4.5 Find the coordinates of each of v, w with respect to the basis B = {u1 , u2 , u3 }, where
4 −1 1 0 2
v = 3 , w = −1 , u1 = 0 , u2 = 1 , u3 = 2 .
8 3 3 1 1
Exercise 5.4.9 Use one of the basis tests of Proposition 5.49 to determine whether the vectors
4 −2 1
u1 = −2 , u2 = 4 , and u3 = −2
1 1 4
form a basis of R3 .
Exercise 5.4.10 Use one of the basis tests of Proposition 5.49 to determine whether the vectors
2 −1 −1
u1 = −1 , u2 = 2 , and u3 = −1
−1 −1 2
form a basis of R3 .
Exercise 5.4.11 In this exercise, we use scalars from the field Z5 of integers modulo 5 instead of real
numbers (see Section 1.8, “Fields”). Find a basis and the dimension of each of the following subspaces
of (Z5 )n .
2 1 4 2
3 4 1 2
(a) V1 = span , , ,
1 3 2 1 .
4 2 3 0
u
(b) V2 =
v 2u + v = 0 and u + 4w = 0 .
w
u
(c) V3 = v u + 2v + 3w = 0 .
w
Exercise 5.4.12 In this exercise, we use scalars from the field Z7 of integers modulo 7 instead of real
numbers. Find the coordinates of v with respect to the basis B = {u1 , u2 , u3 } in (Z7 )3 , where
3 2 1 4
v = 1 , u1 = 4 , u2 = 3 , u3 = 1 .
4 2 3 2
210 Spans, linear independence, and bases in Rn
Exercise 5.4.13 In this exercise, we use scalars from the field Z2 of integers modulo 2 instead of real
numbers. Extend {u1 , u2 } to a basis of (Z2 )4 , where
1 0
0 1
u1 =
1 , u2 = 1 .
1 1
Exercise 5.4.15 If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.
Exercise 5.4.16 Suppose V and W both have dimension equal to 7 and they are subspaces of R10 . What
are the possibilities for the dimension of V ∩ W ? Hint: Remember that a linear independent set can be
extended to form a basis.
Outcomes
A. Find a basis for the column space, row space, and null space of a matrix.
There are three important spaces we can associate to a matrix. They are called the column space, row
space, and null space, and are defined as follows.
5.5. Column space, row space, and null space of a matrix 211
Note that the column space is a subspace of Rm and the null space is a subspace of Rn . The row space, on
the other hand, is a set of row vectors. It can be regarded as a subspace of Rn , but only if we regard Rn as
the set of n-dimensional row vectors (and not column vectors, as usual).
Before we give an example, recall that two matrices are called row equivalent if one can be obtained
from the other by performing a sequence of elementary row operations. The point of elementary row
operations is that they do not affect the row space or the null space of the matrix. (They do, however,
affect the column space). The following proposition makes this more precise.
Proof. The fact that elementary row operations do not change the null space is a special case of Theo-
rem 1.11, applied to a homogeneous system. To prove that they do not change the row space is also easy;
we just need to look at each kind of elementary row operation. For example, adding a multiple of one row
to another clearly does not change the span of the rows. ♠
Example 5.54: Basis of column space, row space, and null space
Find a basis for the column space, row space, and null space of the matrix
1 2 1 3 2
A = 1 3 6 0 2 .
3 7 8 6 6
To find a basis for the column space, we use the casting-out algorithm. The reduced echelon form of A is
1 0 −9 9 2
0 1 5 −3 0 . (5.3)
0 0 0 0 0
Note that the first two columns of the reduced echelon form are pivot columns. Therefore, by the casting-
out algorithm, the first two columns of A form a basis for the column space. Thus, the following is a basis
212 Spans, linear independence, and bases in Rn
Proposition 5.55: Dimension of column space, row space, and null space
Let A be an m × n-matrix. Then the dimensions of the column space, row space, and null space of
A are as follows:
dim(col(A)) = rank(A),
dim(row(A)) = rank(A),
dim(null(A)) = n − rank(A).
5.5. Column space, row space, and null space of a matrix 213
Proof. Let r = rank(A). Following the same method as in Example 5.54, we can use the casting-out
algorithm to find a basis for the column space. Since the reduced echelon form of A has r pivot columns,
the basis has r elements, and therefore dim(col(A)) = r. Also, the reduced echelon form has r non-zero
rows (since each non-zero row contains exactly one pivot entry). These form a basis of the row space, and
therefore dim(row(A)) = r. Finally, the dimension of the null space is equal to the number of parameters
in the general solution of the system of equations Ax = 0. There is one parameter for each non-pivot
column, and since A has n columns and r pivot columns, it follows that dim(null(A)) = n − r. ♠
Among other things, the proposition states that the “row rank” of a matrix (the dimension of its row space)
is always equal to the “column rank” (the dimension of the column space). This fact is not at all obvious
when one first considers the definition of a matrix. It is often called the rank theorem and is one of
the deep and mysterious facts of linear algebra. It means, for example, that if we do elementary column
operations instead of elementary row operations, we end up with exactly the same number of pivots. Since
the “row rank” and “column rank” are always equal, we are justified in simply calling this quantity the
“rank” of the matrix.
There is also a name for the dimension of the null space. It is called the nullity of the matrix, and is
written nullity(A). The last part of Proposition 5.55 is also called the rank-nullity theorem, and is often
written in the form
rank(A) + nullity(A) = n.
Theorem 5.57:
The following are equivalent for an m × n-matrix A.
1. rank(A) = n.
Theorem 5.58:
The following are equivalent for an m × n-matrix A.
1. rank(A) = m.
Exercises
Exercise 5.5.1 Determine the rank and nullity and find a basis of the column space, row space, and null
space of each of the following matrices.
(a)
1 3 2
A= 3 9 6
1 3 2
(b)
1 3 0 2
B= 3 9 1 7
1 3 1 3
5.5. Column space, row space, and null space of a matrix 215
(c)
1 0 3
3 1 10
C=
1
1 4
1 −1 2
(d)
0 0 −1 0 1
1 2 3 −2 −18
D=
1
2 2 −1 −11
−1 −2 −2 1 11
(e)
1 0 3 0
3 1 10 0
E=
−1
1 −2 1
1 −1 2 −2
(a)
2 3
A=
4 6
(b)
1 0 −1
A = −1 1 3
3 2 1
(c)
2 4 0
A = 3 6 −2
1 2 −2
(d)
2 −1 3 5
2 0 1 2
A=
6
4 −5 −6
0 2 −4 −6
Exercise 5.5.3 In this exercise, we use scalars from the field Z5 of integers modulo 5 instead of real
numbers (see Section 1.8, “Fields”). Determine the rank and nullity and find a basis of the column space,
row space, and null space of the following matrix over Z5 .
2 4 1
A= 1 2 3
3 1 2
216 Spans, linear independence, and bases in Rn
Exercise 5.5.7 For invertible matrices B and C of appropriate size, show that rank(A) = rank(BA) =
rank(AC).
Hint: Consider the subspace col(B) ∩ null(A) and suppose a basis for this subspace is {w1 , . . . , wk }.
Let {z1 , . . . , zk } be such that Bzi = wi . Now suppose {u1 , . . . , ur } is a basis for null(B), and argue that
null(AB) ⊆ span {u1 , . . . , ur , z1 , . . . , zk }.
6. Linear transformations in Rn
Outcomes
A. Determine whether a vector function T : Rn → Rm is a linear transformation.
In calculus, a function (or map) f : R → R is a rule that maps a real number x ∈ R to a real number
f (x) ∈ R. In linear algebra, we can generalize this concept to vectors. A vector function T : Rn → Rm is
a rule that inputs an n-dimensional vector v ∈ Rn and outputs an m-dimensional vector T (v) ∈ Rm . The
following are some examples of vector functions:
2
x x x+y x x+z
x e
T1 = x + y , T2 y = x + y + z , T3 y = √ . (6.1)
y 2 y
y z 0 z
Of these, the first is a function T1 : R2 → R3 , the second is a function T2 : R3 → R3 , and the third is a
function T3 : R3 → R2 . We can evaluate a vector function by applying it to a vector, for example,
2 2
1 1 0 0
1 0
T1 = 1 + 2 = 3 , T1 = 1+1 = 1 ,
2 1
22 4 12 1
and so on. The study of arbitrary vector functions and their derivatives and integrals is the subject of
multivariable calculus. In linear algebra, we will only be concerned with linear vector functions, which
are also called linear transformations or linear maps. They are defined as follows.
2. T preserves scalar multiplication, i.e, for all v ∈ Rn and scalars k, we have T (kv) = kT (v).
217
218 Linear transformations in Rn
Solution.
1
(a) The function T1 is not a linear transformation. For example, let v = . Then
0
1 4 1
1 2
T1 (v) = T1 = 1 and T1 (2v) = = 2 = 6 2 1 .
0 0
0 0 0
Since T1 (2v) 6= 2T1 (v), the vector function T1 does not preserve scalar multiplication, and therefore
it is not a linear transformation.
(b) The function T2 is a linear transformation. For example, to prove that T2 preserves addition, consider
two arbitrary vectors
x1 x2
v = y1 and w = y2 .
z1 z2
We have
x1 + x2 (x1 + x2 ) + (y1 + y2 )
T2 (v + w) = T2 y1 + y2 = (x1 + x2 ) + (y1 + y2 ) + (z1 + z2 )
z1 + z2 0
and
x1 + y1 x2 + y2 (x1 + y1 ) + (x2 + y2 )
T2 (v) + T2 (w) = x1 + y1 + z1 + x2 + y2 + z2 = (x1 + y1 + z1 ) + (x2 + y2 + z2 ) .
0 0 0
Since the two sides are evidently equal, T2 preserves addition. The fact that it preserves scalar
multiplication can be shown by a similar calculation.
0 1
(c) The function T3 is not a linear transformation. For example, consider v = 1 and w = 1 .
0 0
Then
1
e
T3 (v + w) = T3 2 = √ ,
2
0
and
1 e e+1
T3 (v) + T3 (w) = + = .
1 1 2
Since T3 (v + w) 6= T3 (v) + T3 (w), the vector function T3 does not preserve addition, and therefore it
is not linear.
6.1. Linear transformations 219
♠
An easy fact about linear transformation is that they preserve the origin, i.e., they satisfy T (0) = 0. This
can be seen, for example, by considering T (0) = T (0 + 0) = T (0) + T (0) and then subtracting T (0) from
both sides of the equation. This gives an easier way to see that T3 in the above example is not a linear
transformation, since T3 (0) 6= 0. On the other hand, of course not every function that preserves the origin
is linear. For example, T1 is not linear although it satisfies T1 (0) = 0.
The following characterization of linearity is often useful, as it permits us to check just one property
instead of two.
Proof. First, assume that T is linear. Then from preservation of addition and scalar multiplication, we
have T (av + bw) = T (av) + T (bw) = aT (v) + bT (w). Conversely, assume that T satisfies T (av + bw) =
aT (v) + bT (w) for all vectors v, w and scalars a, b. Then we get preservation of addition by setting
a = b = 1, and preservation of scalar multiplication by setting b = 0. ♠
Exercises
Exercise 6.1.1 Which of the following vector functions are linear transformations?
2x + y x x + y2 x
x 0
T1 = x − 2y , T2 y = (x + y)z , T3 y = .
y 0
−x − y z 0 z
Exercise 6.1.2 Consider the following functions T : R3 → R2 . Explain why each of these functions T is
not linear.
x
x + 2y + 3z + 1
(a) T y =
2y − 3x + z
z
x 2 + 3z
x + 2y
(b) T y =
2y + 3x + z
z
x
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z
220 Linear transformations in Rn
x
x + 2y + 3z
(d) T y =
2y + 3x − ln z
z
Exercise 6.1.3 Let A be an m × n-matrix. Show the vector function T : Rn → Rm defined by T (v) = Av is
a linear transformation.
Exercise 6.1.4 Let u ∈ Rn be a fixed vector. Show that the function T defined by T (v) = v − proju (v) is a
linear transformation.
Exercise 6.1.5 Let u ∈ Rn be a fixed non-zero vector. The function T defined by T (v) = u + v has the
effect of translating all vectors by adding u. Show this is not a linear transformation.
Outcomes
A. Find the matrix corresponding to a linear transformation T : Rn → Rm .
Proof. This follows from the laws of matrix multiplication. Namely, by the distributive law, we have
A(v+w) = Av +Aw, showing that T preserves addition. And by the compatibility of matrix multiplication
and scalar multiplication, we have A(kv) = k(Av), showing that T preserves scalar multiplication. ♠
In fact, matrix transformations are not just an example of linear transformations, but they are essentially
the only example. One of the central theorems in linear algebra is that all linear transformations T : Rn →
Rm are in fact matrix transformations. Therefore, a matrix can be regarded as a notation for a linear
transformation, and vice versa. This is the subject of the following theorem.
Proof. Suppose T : Rn → Rm is a linear transformation and consider the standard basis {e1 , . . . , en } of Rn .
For all i, define ui = T (ei ), and let A be the matrix that has u1 , . . . , un as its columns. We claim that A is
the desired matrix, i.e., that T (v) = Av holds for all v ∈ Rn .
To see this, let
x1
v = ...
xn
be some arbitrary element of Rn . Then v = x1 e1 + . . . + xn en , and we have:
T (v) = T (x1 e1 + . . . + xn en )
= T (x1 e1 ) + . . . + T (xn en ) by linearity
= x1 T (e1 ) + . . . + xn T (en ) by linearity
= x1 u1 + . . . + xn un by definition of ui
= Av by the column method of matrix multiplication.
In summary, the matrix corresponding to the linear transformation T has as its columns the vectors
T (e1 ), . . . , T (en ), i.e., the images of the standard basis vectors. We can visualize this matrix as follows:
| |
A = T (e1 ) · · · T (en ) .
| |
Solution. By Theorem 6.5, the columns of A are T (e1 ), T (e2 ), and T (e3 ). Therefore,
1 9 1
A= .
2 −3 1
♠
222 Linear transformations in Rn
for all v ∈ R3 .
Solution.
proju (v) = ·
u v
u.
·
u u
6.2. The matrix of a linear transformation 223
In the situation we are interested in, u is a fixed vector, and v is the input to the function T . Given
any two vectors v, w, and using the distributive laws of the dot product and scalar multiplication, we
have:
proju (v + w) = ·
u (v + w)
u= ·
u v u w
+ · u= ·
u v
u+ ·
u w
u = proju (v) + proju (w).
u u· ·
u u u u · ·
u u ·
u u
Therefore, the function T (v) = proju (v) preserves addition. Also, given any scalar k, we have
proju (kv) = ·
u (kv)
u= k
u v· u=k
u v ·
u = k proju (v).
u u · u u· u u ·
Therefore, the function T preserves scalar multiplication. It follows that T is a linear transformation.
(b) To find the matrix of T , we must compute the images of the standard basis vectors T (e1 ), . . . , T (e3 ).
We compute
1
T (e1 ) = proju (e1 ) =
u e1 ·
u =
1
2 ,
u u · 14
3
1
T (e2 ) = proju (e2 ) =
u e2 ·
u =
2
2 ,
u u · 14
3
1
T (e3 ) = proju (e3 ) =
u e3 ·
u =
3
2 .
u u · 14
3
Exercises
Exercise 6.2.1 For each of the following vector functions T : Rn → Rn , show that T is a linear transfor-
mation and find the corresponding matrix A such that T (x) = Ax.
Exercise 6.2.5 Consider the following linear transformations T : R3 → R2 . For each, determine the
matrix A such that T (x) = Ax.
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
x
7x + 2y + z
(b) T y =
3x − 11y + 2z
z
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
x
2y − 5x + z
(d) T y =
x+y+z
z
Exercise 6.2.6 Find the matrix for T (w) = projv (w), where v = [1, −2, 3]T .
Exercise 6.2.7 Find the matrix for T (w) = projv (w), where v = [1, 5, 3]T .
6.3. Geometric interpretation of linear transformations 225
Outcomes
In this section, we will examine some special examples of linear transformations in R2 and R3 including
rotations and reflections.
h i
e2 T (e1 ) = 01
T
e1 7−→ h i
T (e2 ) = −1
0
“before” “after”
The picture illustrates how the function T rotates the entire plane (including the pink letter “F”) by 90
degrees counterclockwise. The picture also illustrates that when we apply the rotation T to the first and
second standard basis vectors e1 and e2 , we obtain the vectors
0 −1
T (e1 ) = and T (e2 ) = .
1 0
The matrix of T has these vectors as its columns. Therefore, the matrix of T is
0 −1
A= .
1 0
226 Linear transformations in Rn
Finally, we can use this to find a formula for the counterclockwise 90 degree rotation T :
x 0 −1 x −y
T = = .
y 1 0 y x
To illustrate how this works, consider the top right corner of the letter “F”. It has the coordinates (0.6, 1).
Applying the function T to the coordinate vector, we get
0.6 0 −1 0.6 −1
T = = .
1 1 0 1 0.6
These are precisely the coordinates of the corresponding point on the letter “F” after the rotation. ♠
Solution. The before-and-after picture for a reflection about the y-axis looks like this:
h i
e2 T (e2 ) = 01
T
e1 7−→ h i
T (e1 ) = −1
0
We see that
−1 0
T (e1 ) = −e1 = and T (e2 ) = e2 = .
0 1
Therefore, the matrix of T is
−1 0
A= .
0 1
The formula for a reflection about the y-axis is:
x −1 0 x −x
T = = .
y 0 1 y y
♠
6.3. Geometric interpretation of linear transformations 227
h i
sin θ
T (e2 ) = −cos θ
h i
e2 T (e1 ) = cos θ
sin θ
T θ
e1 7−→
Solution. To draw each before-and-after picture, we can start by drawing the images of the two standard
basis vectors e1 and e2 , which are the columns of the transformation matrix. We have also drawn the image
of the letter “F”, to better illustrate the effect of each transformation.
228 Linear transformations in Rn
(a) (b)
T (e2 )
e2 T (e1 ) e2
A B
e1 7−→ e1 7−→
T (e2 ) T (e1 )
The transformation A is a reflection about the line x = y. The transformation B is a scaling by a factor of
2. The transformation C is also a scaling, but by a different factor in the x- and y-directions. It scales the
x-direction by a factor of 12 (or equivalently, shrinks it by a factor of 2), and scales the y-direction by a
factor of 2. The transformation D is called a shearing. It keeps one line (the x-axis) fixed, while shifting
all other points by varying distances along lines that are parallel to the x-axis. ♠
Solution. Here is the before-and-after picture. A rotation in 3-dimensional space is usually harder to
visualize than in the plane, but fortunately, the rotation is about the z-axis, so all the “action” is taking
place in the xy-plane.
0 − sin θ
e3 T (e3 ) = 0 T (e2 ) = cos θ
1 0
e2
T
7−→
e1 θ
cos θ
T (e1 ) = sin θ
0
6.3. Geometric interpretation of linear transformations 229
Exercises
Exercise 6.3.1 Find the matrix for the linear transformation that rotates every vector in R2 by an angle
of π /3.
Exercise 6.3.2 Find the matrix for the linear transformation that reflects every vector in R2 about the
x-axis.
Exercise 6.3.3 Find the matrix for the linear transformation that reflects every vector in R2 about the line
y = −x.
Exercise 6.3.4 Find the matrix for the linear transformation that stretches R2 by a factor of 3 in the
vertical direction.
Exercise 6.3.5 Find the matrix of the linear transformation that reflects every vector in R3 about the
xy-plane.
Exercise 6.3.6 Find the matrix of the linear transformation that reflects every vector in R3 about plane
x = z.
Exercise 6.3.7 Describe the linear transformation that is given by each of the following matrices. Draw
a before-and-after picture for each.
1 −1 0 2 1 0 1 0
(a) A = , (b) B = , (c) C = , (d) D = .
1 1 1 0 −1 1 0 0
a
Exercise 6.3.8 Let u = be a unit vector in R2 . Find the matrix that reflects all vectors about this
b
vector, as shown in the following picture.
u
230 Linear transformations in Rn
Outcomes
A. Use properties of linear transformations to solve problems.
We begin by noting that linear transformations preserve the zero vector, negation, and linear combinations.
−7 −7
Solution. Using the third property in Proposition 6.14, we can find T 3 by writing 3 as
−9 −9
1 4
a linear combination of 3 and 0 . By solving the appropriate system of equations, we find that
1 5
−7 1 4
3 = 3 −2 0 .
−9 1 5
6.4. Properties of linear transformations 231
Therefore,
−7 1 4
T 3 = T 3 − 2 0
−9 1 5
4 4 −4
1 4 4 5
= T 3 − 2T 0 = = −6 .
0 − 2 −1 2
1 5
−2 5 −12
♠
Suppose that we first apply a linear transformation T to a vector, and then the linear transformation S to
the result. The resulting two-step transformation is also a linear transformation, called the composition of
T and S.
T ◦ S : Rk → Rm
that is defined by
(T ◦ S)(v) = T (S(v)),
for all v ∈ Rk .
Notice that the resulting vector will be in Rm . Be careful to observe the order of transformations. The
composite transformation T ◦ S means that we are first applying S, and then T . Composition of linear
transformations is written from right to left. The composition T ◦ S is sometimes pronounced “T after S”.
Solution. Let Aθ be the matrix of a rotation by θ , and let Aφ be the matrix of a rotation by angle φ . We
calculated these matrices in Example 6.11. Then a rotation by the angle θ + φ is given by the product of
these two matrices:
cos θ − sin θ cos φ − sin φ
Aθ Aφ =
sin θ cos θ sin φ cos φ
cos θ cos φ − sin θ sin φ − cos θ sin φ − sin θ cos φ
= .
sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ
On the other hand, we can compute the matrix for a rotation by angle θ + φ directly:
cos(θ + φ ) − sin(θ + φ )
Aθ +φ = .
sin(θ + φ ) cos(θ + φ )
The fact that these matrices are equal amounts to the well-known trigonometric identities for the sum of
two angles, which we have here derived using linear algebra concepts:
Solution. It would be quite difficult to picture the transformation T in one step. Fortunately, we don’t have
to do this. All we have to do is find the matrix for each rotation separately, then multiply the two matrices.
We have the be careful to multiply the matrices in the correct order.
Let B be the matrix for a 30-degree rotation about the z-axis. It is given exactly as in Example 6.13:
√
3 1
cos 30◦ − sin 30◦ 0 − 0
2 √2
B = sin 30 ◦ ◦ 0 1 3
0
cos 30 = 2 2 .
0 0 1 0 0 1
Let C be the matrix for a 45-degree rotation about the x-axis. It is analogous to Example 6.13, except that
the rotation takes place in the yz-plane instead of the xy-plane.
1 0 0 1 0 0
C= 0 cos 45 ◦ − sin 45◦ = 0 √1
− √1 .
2 2
0 sin 45◦ cos 45◦ 0 √1 √1
2 2
Finally, to apply the linear transformation T to a vector v, we must first apply B and then C. This means
that T (v) = C(Bv). Therefore, the matrix corresponding to T is CB. Note that it is important that we
6.4. Properties of linear transformations 233
multiply the matrices corresponding to each subsequent rotation from right to left.
√ √
3
1 0 0 3 1 − 21 0
2 − √ 2 0
2 √
√
A = CB = 0 √1 − √1 1 3
0 =
1 √3 − √1 .
2 2 2 2 2 2 2√ 2 2
1 1 1 3 1
0 √ √ 0 0 1 √ √ √
2 2 2 2 2 2 2
♠
We can also consider the inverse of a linear transformation. The inverse of T , if it exists, is a linear
transformation that undoes the effect of T .
(S ◦ T )(v) = v
and
(T ◦ S)(v) = v.
Then S is called the inverse of T , and we write S = T −1 .
Exercises
Exercise 6.4.1 Find the matrix for the linear transformation that reflects every vector in R2 about the
x-axis and then reflects about the y-axis.
Exercise 6.4.2 Find the matrix for the linear transformation that rotates every vector in R2 by an angle
of 2π /3 and then reflects about the x-axis.
Exercise 6.4.3 Find the matrix for the linear transformation that rotates every vector in R2 by an angle
of π /6 and then reflects about the x-axis followed by a reflection about the y-axis.
Exercise 6.4.4 Find the matrix for the linear transformation that reflects every vector in R2 about the
x-axis and then rotates by an angle of π /4.
Exercise 6.4.5 Find the matrix of the linear transformation that rotates every vector in R3 counterclock-
wise about the z-axis when viewed from the positive z-axis by an angle of 30 degrees and then reflects
about the xy-plane.
Exercise 6.4.6 Prove the three properties in Proposition 6.14, using only the definition of a linear trans-
formation (i.e., the fact that it preserves addition and scalar multiplication).
3 1
Exercise 6.4.7 Let T be the linear transformation with matrix A = and S the linear transfor-
−1 2
0 −2 2
mation with matrix B = . Find the matrix of S ◦ T . Compute (S ◦ T )(v) for v = .
4 2 −1
1 2
Exercise 6.4.8 Let T be a linear transformation and suppose T = . Suppose S is the
−4 −3
1 2 1
linear transformation with matrix B = . Find (S ◦ T )(v) for v = .
−1 3 −4
6.5. Application: Perspective rendering 235
p' Object
y
x z
Camera
Image plane
236 Linear transformations in Rn
Object coordinates
It is convenient to describe each object in its own coordinate system, called the object coordinate system.
To illustrate this concept, we will consider a cube of side length 2, centered at the origin. The 8 corners of
this cube have the following coordinates in the object coordinate system:
1 1 1 1 −1 −1 −1 −1
1 , 1 , −1 , −1 , 1 , 1 , −1 , −1 .
1 −1 1 −1 1 −1 1 −1
For later reference, let us call this the standard cube. The following picture shows the standard cube
within its object coordinate system:
x
y
Before we render an object, we need to place it in some appropriate location relative to the camera. We do
this by specifying four vectors q, ax , ay , and az in R3 . Here, q is the origin of the object coordinate system,
relative to the camera coordinate system. The vectors ax , ay , and az are the axes of the object coordinate
system, relative to the camera coordinate system, as shown in the following illustration:
az
ax ay
y
q
z
x
Camera
x px
Thus, given a point with object coordinates v = y , we can find its camera coordinates p = py by
z pz
the following formula:
p = q + xax + yay + zaz .
6.5. Application: Perspective rendering 237
If we write A for the 3 × 3-matrix whose columns are ax , ay , and az , we can also write this formula more
succinctly as
p = q + Av.
We can also write f : R3 → R3 for the function that converts object coordinates to camera coordinates,
i.e.,
f (v) = q + Av.
We note that this is not a linear function, because f (0) 6= 0. The function f is called an affine function,
which means that it is a linear function v 7→ Av followed by a translation v 7→ q + v.
238 Linear transformations in Rn
Rendering
px
Once we know the camera coordinates p = py of a point, we need to render the point, i.e., find its
pz
coordinates in the image plane.
p
y
p′
z
x
Camera
Image plane
Since the camera is located at the origin, the line that passes through the camera and the point p has the
parametric equation
t px
r = tp = t py .
t pz
1
Since the image plane is the plane z = 1, we must set t such that t pz = 1, i.e., t = pz . Therefore, the
coordinates of the rendered point are
px /pz
1
p′ = p = py /pz .
pz
1
Finally, since the image plane
is 2-dimensional,
we can forget the now useless z-coordinate, and render
px /pz
the point at the coordinates in the 2-dimensional image plane.
py /pz
Animation
We placed our object in the camera coordinate system using a coordinate transformation function
f (v) = q + Av.
One of the advantages of using such a coordinate transformation (as opposed to specifying the object
points directly in the camera coordinate system) is that this makes it very easy to move the objects around,
rotate them, scale and shrink them, etc. For example:
1. To move the object to a different location, we only have to change the vector q.
2. To rotate the object about its own z-axis, we only have to replace v by Rθ v, where Rθ is the matrix
for a rotation about the z-axis by angle θ :
cos θ − sin θ 0
Rθ = sin θ cos θ 0 .
0 0 1
240 Linear transformations in Rn
Similarly to Rθ , we can also insert other transformation matrices (for example, we could rotate the object
about its x-axis instead of its z-axis, scale the object, etc). We can even make an animation by rendering
the object repeatedly for different values of these parameters.
The transformation matrix A is as in Example 6.24. Moreover, the cube should make one quarter
rotation about its z-axis during the time of the animation, i.e., it should be transformed by Rθ , where
π
θ = 10 t . Compute 6 frames of the animation, for t = 0, t = 1, . . . , t = 5.
Solution. For each of the animation frames t ∈ {0, 1, 2, 3, 4, 5}, we do a calculation very similar to that of
Examples 6.24 to convert the cube coordinates to camera coordinates, using the coordinate transformation
t =2 t = 3 t = 4t = 5
t =1
t =0
Note that there is a bit of distortion in the first and last cubes. This is because the camera is very close to
the image plane (the scene has been “filmed” with a wide-angle camera). The distortion goes away if you
close one eye and bring the other eye very close to the page. ♠
7. Determinants
Outcomes
A. Calculate the determinant of 2 × 2-matrices and 3 × 3-matrices.
Let A be an n × n-matrix. The determinant of A, denoted by det(A), is a very important number which
we will explore throughout this chapter.
The determinant of a2 × 2-matrix is given by the following formula.
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a31 a22 a13 − a32 a23 a11 − a33 a21 a12 .
241
242 Determinants
Here, we have written down the matrix A, then repeated the first two columns next to it. The blue lines
correspond to the positive terms of the determinant: a11 a22 a33 , a12 a23 a31 , and a13 a21 a32 . The pink lines
correspond to the negative terms: a31 a22 a13 , a32 a23 a11 , and a33 a21 a12 .
Solution. We have
0 1 2
det(A) = 3 1 0 = 0 · 1 · (−1) + 1 · 0 · 1 + 2 · 3 · 1 − 1 · 1 · 2 − 1 · 0 · 0 − (−1) · 3 · 1 = 7.
1 1 −1
Exercises
Outcomes
A. Compute minors and cofactors of matrices.
Determinants of larger matrices can be computed in terms of the determinants of smaller matrices. We
begin with the following definition.
Hence, there is a minor associated with each entry of A. The following example illustrates this definition.
Solution. First we will find M12 . By definition, this is the determinant of the 2 × 2-matrix that results when
we delete the first row and the second column of A. This minor is given by
1 2 3
4 2
M12 = 4 3 2 = = 4 · 1 − 3 · 2 = − 2.
3 2 1 3 1
Similarly, M33 is the determinant of the 2 × 2-matrix that is obtained by deleting the third row and the third
column of A. This minor is therefore
1 2 3
1 2
M33 = 4 3 2 = = − 5.
3 2 1 4 3
We now define the ij th cofactor of a matrix A, which is either plus or minus the ij th minor.
244 Determinants
Ci j = (−1)i+ j Mi j
In other words, the ij th cofactor is equal to the corresponding minor if i + j is even, and the negative of the
minor if i + j is odd. For remembering the signs, the following picture is sometimes helpful:
+ − + −
− + − +
+ − + − .
− + − +
Solution. We have already computed the corresponding minors in Example 7.6. For the cofactors, we
have:
C12 = (−1)1+2 M12 = − M12 = − (−2) = 2,
C33 = (−1)3+3 M33 = + M33 = + (−5) = − 5.
Note that 1 + 2 is odd, so C12 = −M12 . On the other hand, 3 + 3 is even, so C33 = M33 . ♠
You may wish to find the remaining cofactors of the above matrix. Remember that there is a cofactor for
every entry in the matrix.
We have now established the tools we need to find the determinant of an n × n-matrix.
When calculating the determinant, you can choose to expand any row or any column. Regardless of which
row or column you expand, you will always get the same number, which is the determinant of the matrix
A. This method of evaluating a determinant by expanding along a row or a column is also called cofactor
expansion or Laplace expansion.
Solution. First, we will calculate det(A) by expanding along the first column. Using Definition 7.9, the
determinant is
As mentioned in Definition 7.9, we can choose to expand along any row or column. Let’s try now by
expanding along the second row. The calculation is as follows.
2 3 1 3 1 2
det(A) = − 4 +3
3 1 − 2 3 2 = − 4(−4) + 3(−8) − 2(−4) = 0.
2 1
Solution. Using the cofactor method, we can expand this determinant along any row or column. But notice
that the third row contains two zeros. This makes the cofactor expansion particularly convenient. So let us
expand along the third row. We have:
1 2 3 0
2 3 0 1 3 0 1 2 0 1 2 3
2 4 2 3
= 0 4 2 3 −3 2 2 3 +0 2 4 3 −5 2 4 2 .
0 3 0 5
0 3 2 3 3 2 3 0 2 3 0 3
3 0 3 2
Note that we only need to compute two of the 3 × 3 determinants, since the remaining two are multiplied
by 0. We can compute each 3 × 3 determinant using the method of Definition 7.3. We find:
1 2 3 0
1 3 0 1 2 3
2 4 2 3
= −3 2 2 3 − 5 2 4 2 = −3 · 10 − 5 · (−24) = 90.
0 3 0 5
3 3 2 3 0 3
3 0 3 2
♠
We remark that the cofactor expansion is mainly useful for calculating determinants of small matrices, or
matrices containing many zeros. Indeed, imagine calculating the determinant of a 10 × 10-matrix by the
cofactor method. This requires calculating the determinants of ten 9 × 9-matrices, each of which requires
calculating the determinants of nine 8 × 8-matrices, each of which requires calculating the determinants of
eight 7 × 7-matrices, and so on. Calculating the determinant of a 10 × 10-matrix by the cofactor method
would therefore require 10 · 9 · 8 · . . . · 2 · 1 = 3628800 steps!
In the next few sections, we will explore some important properties and characteristics of the determi-
nant, including a much more efficient method of calculating determinants of large matrices.
Exercises
1 2 4
Exercise 7.2.1 Let A = 0 1 3 . Find the following minors and cofactors:
−2 5 1
(a) M11 ,
(b) M21 ,
(c) M32 ,
(d) C11 ,
(e) C21 ,
(f) C32 .
7.2. Minors and cofactors 247
0 −1 3 1
1 0 2 2
Exercise 7.2.2 Let A =
2
. Find M11 , M21 , M32 , C11 , C21 , and C32 .
3 −1 0
1 1 0 1
Exercise 7.2.3 Compute the determinants of the following matrices using cofactor expansion along any
row or column.
1 2 0
(a) 3 −2 2
0 3 1
1 −2 2
(b) 3 0 0
4 3 1
1 2 −2 2
1 3 2 3
(c)
4 0
1 0
1 2 1 2
Exercise 7.2.4 Find the following determinant by expanding (a) along the first row and (b) along the
second column.
1 2 1
2 1 3
2 1 1
Exercise 7.2.5 Find the following determinant by expanding (a) along the first column and (b) along the
third row.
2 3 1 1
4 3 1 2
1 1 0 1
3 2 1 2
Exercise 7.2.6 Find the following determinant by expanding (a) along the second row and (b) along the
first column.
1 2 −1 1
0 1 2 1
0 2 1 3
1 4 0 2
Exercise 7.2.7 Compute the determinant by cofactor expansion. Pick the easiest row or column to use.
1 0 0 1
2 1 1 0
0 0 0 2
2 1 3 1
248 Determinants
Outcomes
A. Calculate the determinant of an upper or lower triangular matrix.
There is a certain type of matrix for which finding the determinant is a very simple procedure: a triangular
matrix.
Similarly, a square matrix is lower triangular if all entries above the main diagonal are 0.
The following theorem provides a useful way to calculate the determinant of a triangular matrix.
Solution. By Theorem 7.14, it suffices to take the product of the elements on the main diagonal. Thus
det(A) = 1 · 2 · 3 · (−1) = − 6.
7.3. The determinant of a triangular matrix 249
♠
For comparison, let us compute the determinant without Theorem 7.14, i.e., by using cofactor expansion.
If we expand the determinant along the first column, we get:
2 6 −7 2 3 16 2 3 16 2 3 16
det(A) = 1 0 3 33 − 0 0 3 33 + 0 2 6 −7 − 0 2 6 −7 .
0 0 −1 0 0 −1 0 0 −1 0 3 33
We can in turns expand this 3 × 3 determinant by the cofactor method along the first column:
2 6 −7
3 33 6 −7 6 −7
0 3 33 = 2
0 −1 − 0 0 −1 + 0 3 33 .
0 0 −1
Of course this is just the same as the product of the diagonal entries of A, which is the point of Theo-
rem 7.14.
Exercises
Outcomes
A. Determine the effect of a row operation on the determinant of a matrix.
Recall that there are three kinds of elementary row operations on matrices:
The following theorem examines the effect of these row operations on the determinant of a matrix.
det(B) = − det(A).
det(B) = k det(A).
det(B) = det(A).
Notice that the second part of this theorem is true when we multiply one row of the matrix by k. If we
were to multiply two rows of A by k to obtain B, we would have det(B) = k2 det(A).
Solution. If we switch the second and third rows, we obtain a triangular matrix, of which the determinant
is easy to compute. By Theorem 7.16, switching two rows negates the determinant. We therefore have:
1 5 5 1 5 5
0 0 −3 = − 0 2 7 = − (1 · 2 · (−3)) = 6.
0 2 7 0 0 −3
Solution. We can use elementary row operations to reduce this matrix to triangular form:
1 4 −2 1 4 −2 1 4 −2 1 4 −2
R2 ← R2 −R1 R3 ← R3 −2R1 R3 ← R3 +R2
1 8 1 ≃ 0 4 3 ≃ 0 4 3 ≃ 0 4 3 .
2 4 −9 2 4 −9 0 −4 −5 0 0 −2
Each of the row operations is of the form “add a multiple of one row to another row”, and therefore does
not change the determinant. We therefore have:
1 4 −2 1 4 −2 1 4 −2 1 4 −2
1 8 1 = 0 4 3 = 0 4 3 = 0 4 3 = 1 · 4 · (−2) = −8.
2 4 −9 2 4 −9 0 −4 −5 0 0 −2
♠
In general, we can convert any square matrix to triangular form using elementary row operations. In fact,
it is always possible to do so using only elementary operations of the first and third kind (swap two rows
or add a multiple of one row to another). This gives us a very efficient way to compute determinants. If
the matrices are large, this method is much more efficient than the cofactor method.
Solution. We use elementary row operations to reduce the matrix to triangular form:
0 2 1 4 1 1 −2 −1 1 1 −2 −1
2 2 −4 −1 R1 ↔ R3 2 2 −4 −1 R2 ← R2 −2R1 0 0 0 1
≃ ≃
1 1 −2 −1 0 2 1 4 0 2 1 4
1 3 2 5 1 3 2 5 1 3 2 5
1 1 −2 −1 1 1 −2 −1
R4 ← R4 −R1 0 0 0 1 1
≃ R4 ←≃R4 −R3 0 0 0
0 2 1 4 0 2 1 4
0 2 4 6 0 0 3 2
1 1 −2 −1 1 1 −2 −1
R2 ↔ R3 0 2 1 4 R3 ↔ R4 0 2 1 4
≃ ≃
0 0 0 1 0 0 3 2
0 0 3 2 0 0 0 1
By Theorem 7.16, the determinant changes signs each time we swap two rows. The determinant is un-
changed when we add a multiple of one row to another. Therefore, we have
0 2 1 4 1 1 −2 −1 1 1 −2 −1
2 2 −4 −1 2 2 −4 −1 0 0 0 1
= − = −
1 1 −2 −1 0 2 1 4 0 2 1 4
1 3 2 5 1 3 2 5 1 3 2 5
1 1 −2 −1 1 1 −2 −1
0 0 0 1 0 0 0 1
= − = −
0 2 1 4 0 2
1 4
0 2 4 6 0 0 3 2
1 1 −2 −1 1 1 −2 −1
0 2 1 4 0 2 1 4
= + = − = −6.
0 0 0 1 0 0 3 2
0 0 3 2 0 0 0 1
In practice, the last calculation could have been done in a single step. All we had to do is count the number
of swap operations we performed during the row operations. If there is an odd number of swap operations,
the sign of the determinant changes; otherwise, it stays the same. ♠
Exercises
Outcomes
A. Use the determinant of a square matrix to decide whether the matrix is invertible.
B. From the determinants of two matrices, calculate the determinant of their product.
F. Without calculation, find the determinant of a matrix containing a row or column of zeros, or
a matrix containing a row (or column) that is a scalar multiple of another row (or column).
One reason that the determinant is such an important quantity is that it permits us to tell whether a square
matrix is invertible.
Proof. We know that every matrix A can be converted to echelon form by elementary row operations. We
also know from Theorem 7.16 that no elementary row operation changes whether the determinant is zero
or not. Let R be an echelon form of A. Because R is an echelon form, it is also an upper triangular matrix.
Case 1: A is invertible. In that case, the rank of R is n, and every diagonal entry of R is a pivot entry
(therefore non-zero). It follows that det(R) 6= 0, which implies det(A) 6= 0. Case 2: A is not invertible.
In that case, the triangular matrix R contains a row of zeros. It follows that det(R) = 0, and therefore
det(A) = 0. ♠
have
2 −5 1 −5
det(C) = 3
−1 = 3 · 4 − 1 · 12 = 0.
0 2 2 2
Therefore, C is not invertible. ♠
As an application of Theorem 7.20, we note that the determinant of an n × n-matrix can be used to predict
whether a homogeneous system of equations has non-trivial solutions.
Proof. We know from Theorem 1.35 that the homogeneous system has a non-trivial solution if and only if
rank(A) < n. This is the case if and only if A is not invertible, i.e., if and only if det(A) = 0. ♠
Another reason the determinant is important is that it compatible with matrix product.
Proof. We first prove this in case A = E is an elementary matrix. Remember from Section 4.6 that ele-
mentary matrices correspond to elementary row operations.
1. If E is an elementary matrix for swapping two rows, then det(E) = −1. Also, by Theorem 7.16(1),
det(EB) = − det(B). Therefore det(EB) = det(E) det(B).
2. If E is an elementary matrix for multiplying a row by a non-zero scalar k, then det(E) = k. Also, by
Theorem 7.16(2), det(EB) = k det(B). Therefore det(EB) = det(E) det(B).
3. If E is an elementary matrix for adding a multiple of one row to another, then det(E) = 1. Also, by
Theorem 7.16(3), det(EB) = det(B). Therefore det(EB) = det(E) det(B).
Now consider the case where A is an arbitrary matrix. Case 1: A is invertible. Then by Theorem 4.61,
we can write A as a product of elementary matrices A = E1 E2 · · · Ek . By repeatedly using the formula
det(EB) = det(E) det(B) that we proved above, we have
Case 2: A is not invertible. Then AB is also not invertible (because if C were an inverse of AB, we
would have ABC = I, and therefore, BC would be an inverse of A). Therefore, by Theorem 7.20, we have
det(A) = 0 and det(AB) = 0. It follows that det(AB) = det(A) det(B). ♠
7.5. Properties of determinants 255
2. det(I) = 1.
4. det(kA) = kn det(A).
5. det(AT ) = det(A).
Proof. Property 1 is a restatement of Theorem 7.23. Property 2 follows from Theorem 7.14, because the
identity matrix is an upper triangular matrix. Property 3: The first part is Theorem 7.20. For the second
part, assume A is invertible. Then by properties 1 and 2, det(A) det(A−1 ) = det(AA−1 ) = det(I) = 1. The
claim follows by dividing both sides of the equation by det(A). Property 4 follows from Theorem 7.16(2),
because kA is obtained from A by multiplying all n rows by k. Each time we multiple one row by k, the
determinant is multiplied by k. Property 5 follows because expanding det(A) along columns amounts to
the same thing as expanding det(AT ) along rows. ♠
We end this section with a few useful ways of spotting matrices of determinant 0.
256 Determinants
1. If A has a row consisting only of zeros, or a column consisting only of zeros, then det(A) = 0.
2. If A has a row that is a scalar multiple of another row, or a column that is a scalar multiple of
another column, then det(A) = 0.
Proof. The first property follows by cofactor expansion: simply expand the determinant along the row
or column that consists only of zeros. For the second property, assume that A has a row that is a scalar
multiple of another row. We can then perform an elementary row operation to create a row of zeros.
By Theorem 7.16(3), the determinant is unchanged, so that det(A) = 0. In the case that A has a column
that is a scalar multiple of another column, we apply the same reasoning to AT and use the fact that
det(A) = det(AT ). ♠
Exercises
(a)
a c
B=
b d
(b)
c d
B=
a b
(c)
a b
B=
a+c b+d
(d)
a b
B=
2c 2d
(e)
b a
B=
d c
7.5. Properties of determinants 257
Exercise 7.5.2 Let A be an n × n-matrix and suppose there are n − 1 rows such that the remaining row is
a linear combinations of these n − 1 rows. Show det(A) = 0.
Exercise 7.5.4 Using only Theorems 7.14 and 7.23, show that det(kA) = kn det(A) for an n × n-matrix A
and scalar k.
Exercise 7.5.5 Construct two random 2 × 2-matrices A and B and verify that det(A) det(B) = det(AB).
Exercise 7.5.6 Is it true that det(A + B) = det(A) + det(B)? If this is so, explain why. If it is not so, give
a counterexample.
Exercise 7.5.7 An n × n-matrix is called nilpotent if there exists some positive integer k such that Ak = 0.
If A is a nilpotent matrix, what are the possible values of det(A)?
Exercise 7.5.8 A square matrix is said to be orthogonal if AT A = I. Thus the inverse of an orthogonal
matrix is its transpose. What are the possible values of det(A) if A is an orthogonal matrix?
Exercise 7.5.9 Let A and B be two n × n-matrices. We say that A is similar to B, in symbols A ∼ B, if there
exists an invertible matrix P such that A = P−1 BP. Show that if A ∼ B, then det(A) = det(B).
For which values of a and b is this matrix invertible? Hint: after you compute the determinant, you can
factor out (a − 1) and (b − 1) from it.
Exercise 7.5.11 Assume A, B, and C are n × n-matrices and ABC is invertible. Use determinants to show
that each of A, B, and C is invertible.
Exercise 7.5.12 Suppose A is an upper triangular matrix. Show that A−1 exists if and only if all elements
of the main diagonal are non-zero. Is it true that A−1 will also be upper triangular? Explain. Could the
same be concluded for lower triangular matrices?
258 Determinants
Exercise 7.5.13 Specify whether each statement is true or false. If true, provide a proof. If false, provide
a counterexample.
(a) If A is a 3 × 3-matrix with determinant zero, then one column must be a multiple of some other
column.
(b) If any two columns of a square matrix are equal, then the determinant of the matrix equals zero.
Outcomes
A. Find the cofactor matrix and the adjugate of a matrix.
The determinant of a matrix also provides a way to find the inverse of a matrix. Recall the definition of
the inverse of a matrix from Definition 4.36. If A is an n × n-matrix, we say that A−1 is the inverse of A if
AA−1 = I and A−1 A = I.
We now define a new matrix called the cofactor matrix of A. The cofactor matrix of A is the matrix
whose ij th entry is the ij th cofactor of A.
7.6. Application: A formula for the inverse of a matrix 259
We will use the cofactor matrix to create a formula for the inverse of A. First, we define the adjugate of
A, denoted adj(A), to be the transpose of the cofactor matrix:
adj(A) = cof(A)T .
Solution. We first find cof(A). To do so, we need to compute the cofactors of A. We have:
1 2 3
0 1
C11 = + M11 = 3 0 1 = = − 2,
1 2 1 2 1
1 2 3
3 1
C12 = − M12 = − 3 0 1 = − = − 2,
1 1
1 2 1
1 2 3
3 0
C13 = + M13 = 3 0 1 = = 6,
1 2
1 2 1
1 2 3
2 3
C21 = − M21 = − 3 0 1 = − = 4,
2 1
1 2 1
♠
The following theorem provides a formula for A−1 using the determinant and the adjugate of A.
Proof. Recall that the (i, j)-entry of adj(A) is equal to C ji . Thus the (i, j)-entry of B = A adj(A) is:
By the cofactor expansion theorem, we see that this expression for Bi j is equal to the determinant of the
matrix obtained from A by replacing its jth row by [ai1 , ai2 , . . . ain ], i.e., by its ith row.
If i = j then this matrix is A itself and therefore Bii = det(A). If on the other hand i 6= j, then this
matrix has its ith row equal to its jth row, and therefore Bi j = 0 in this case. Thus we obtain:
A adj(A) = det(A)I.
adj(A) A = det(A)I.
This proves the first part of the theorem. For the second part, assume that A is invertible. Then by
Theorem 7.20, det(A) 6= 0. Dividing the formula from the first part of the theorem by det(A), we obtain
1 1
A adj(A) = adj(A) A = I,
det(A) det(A)
and therefore
1
A−1 = adj(A).
det(A)
This completes the proof. ♠
7.6. Application: A formula for the inverse of a matrix 261
Since it is very easy to make a mistake in this calculation, we double-check our answer by computing
A−1 A:
− 16 1
3
1
6 1 2 3 1 0 0
−1
A A = −6 −6 1 1 2 = 0 1 0 = I.
3 3 0 1
1 1
2 0 −2 1 2 1 0 0 1
♠
We therefore have
0 2 2 0 1 1
1 1
A−1 = adj(A) = 2 −2 −1 = 1 −1 − 21 .
det(A) 2
−2 4 2 −1 2 1
♠
It is always a good idea to double-check your work. At the end of the calculation, it is very easy to compute
A−1 A and check whether it is equal to I. If they are not equal, be sure to go back and double-check each
step. One common mistake is to forget to take the transpose of the cofactor matrix, so be sure not to forget
this step.
In practice, it is usually much faster to compute the inverse by the method of Section 4.5.2, because
this only requires solving a single system of equations, rather than computing a large number of cofactors.
However, there are some situations where the adjugate formula is useful. One such situation is when the
matrix has complicated entries that are functions rather than numbers. The following example illustrates
this.
7.6. Application: A formula for the inverse of a matrix 263
The adjugate is the transpose of the cofactor matrix, and therefore the inverse is
−t
1 0 0 e 0 0
1 1
A(t)−1 = adj(A(t)) = t 0 et cost −et sint = 0 cost − sint .
det(A(t)) e
0 et sint et cost 0 sint cost
♠
Another situation where the adjugate formula is useful is the case of a 2 × 2-matrix. In this case both the
determinant and the adjugate are especially easy to compute. For a 2 × 2-matrix
a b
A = ,
c d
we have
det(A) = ad − bc
d −b
adj(A) = .
−c a
Therefore, A is invertible if and only if ad − bc 6= 0, and in that case, the inverse is given by
−1 1 d −b
A = . (7.1)
ad − bc −c a
Exercises
Exercise 7.6.1 Find the cofactor matrix and the adjugate of each of the following matrices.
1 0 1 1 1 3
1 0
A= , B = −1 1 −2 , C = 2 3 1 .
3 2
2 −1 3 1 −1 1
Exercise 7.6.2 For each of the following matrices, determine whether it is invertible by checking whether
the determinant is non-zero. If the determinant is non-zero, use the adjugate formula to find the inverse.
1 2 3 1 2 0 1 3 3 1 2 3 1 0 3
A = 0 2 1 , B = 0 2 1 , C = 2 4 1 , D = 0 2 1 , E = 1 0 1 .
3 1 0 3 1 1 0 1 1 2 6 7 3 1 0
Exercise 7.6.3 Determine whether each of the following matrices is invertible. If so, use the adjugate
formula to find the inverse. If the inverse does not exist, explain why.
1 2 3 1 2 1
1 1
A= , B = 0 2 1 , C = 2 3 0 .
1 2
4 1 1 0 1 2
Exercise 7.6.4 Use the adjugate formula to find the inverse of the matrix
3 0 3
A = −1 2 −3 .
−5 4 −3
Exercise 7.6.5 Use the adjugate formula to find the inverse of the matrix
1 1 0
A= 3 1 2 .
2 −2 5
7.6. Application: A formula for the inverse of a matrix 265
Does there exist a value of t for which this matrix fails to be invertible? Explain.
Does there exist a value of t for which this matrix fails to be invertible? Explain.
Does there exist a value of t for which this matrix fails to be invertible? Explain.
Exercise 7.6.10 Use the adjugate formula to find the inverse of the matrix
t
e 0 0
A= 0 cost sint .
0 cost − sint cost + sint
Outcomes
A. Use Cramer’s rule to solve a system of equations with invertible coefficient matrix.
Another application of determinants is Cramer’s rule for solving a system of equations. Recall that we
can represent a system of linear equations in the form Ax = b, where x is a vector of variables. Cramer’s
rule gives a formula for the solutions x in the special case that the coefficient matrix A is a square invertible
matrix. Note that Cramer’s rule does not apply if you have a system of equations in which there is a
different number of equations than variables (in other words, when A is not square), or when A is not
invertible.
det(Ai )
xi = ,
det(A)
Proof. Since A is invertible, the solution to the system Ax = b is given by x = A−1 b. By Theorem 7.29,
we have
1
A−1 = adj(A),
det(A)
and therefore
1
x= adj(A)b.
det(A)
Let xi be the i th component of x and b j be the j th component of b. Recall that the ij th entry of adj(A) is
C ji , the ji th cofactor of A. By definition of matrix multiplication, we have
1
xi = (C1i b1 + . . . +Cni bn ).
det(A)
By the formula for the expansion of a determinant along a column, this is equal to
∗ · · · b1 · · · ∗
1 . ..
.. ,
xi = .. . .
det(A)
∗ · · · bn · · · ∗
where the i th column of A is replaced with the column vector b. But this last formula is exactly Cramer’s
rule. ♠
7.7. Application: Cramer’s rule 267
Solution. The matrices A1 , A2 , and A3 are obtained by respectively replacing the first, second, and third
column of A by b. We compute
1 2 1 3 2 1
det(A) = 3 2 1 = 4, det(A1 ) = 5 2 1 = 4
1 4 1 6 4 1
1 3 1 1 2 3
det(A2 ) = 3 5 1 = 6, det(A3 ) = 3 2 5 = −4.
1 6 1 1 4 6
Then by Cramer’s rule,
Solution. We are asked to find the value of z in the solution. By Cramer’s rule, we have
1 0 1
0 e cost t
t
0 −et sint t 2 et (t 2 cost − t sint)
z = = = e−t (t 2 cost − t sint).
1 0 0 e2t
0 et cost et sint
0 −et sint et cost
♠
268 Determinants
Exercises
Exercise 7.7.1 True or false? “Cramer’s rule is useful for finding solutions to systems of linear equations
in which there is an infinite set of solutions.”
x + 2y = 1
2x − y = 2
x + 2y + z = 3
2x − y − z = 1
x+z = 1
Outcomes
A. Determine whether a vector is an eigenvector of a matrix.
When we multiply a square matrix A by a non-zero vector v, we obtain another vector Av. Most of the
time, the vectors Av and v are unrelated; they could point in completely different directions. However,
sometimes it can happen that Av is a scalar multiple of v. In that case, v is called an eigenvector of A. We
will see later in this chapter that we can learn a lot about the matrix A by considering its eigenvectors.
Av = λ v.
269
270 Eigenvalues, eigenvectors, and diagonalization
Solution. We compute
4 0 3 0
Av1 = −2 , Av2 = 3 , Av3 = 4 , Av4 = 0 .
−4 3 3 0
• We see that Av1 is a scalar multiple of v1 , namely Av1 = 2v1 . Therefore, v1 is an eigenvector of A
with corresponding eigenvalue λ = 2.
• Similarly, Av2 = 3v2 , so v2 is an eigenvector of A with corresponding eigenvalue λ = 3.
• On the other hand, Av3 is not a scalar multiple of v3 . Hence, v3 is not an eigenvector of A.
• Finally, although Av4 is a scalar multiple of v4 , the zero vector is not considered an eigenvector.
♠
Solution. We have to solve the equation Av = 2v. We can use algebra to rewrite this as
Av = 2v ⇐⇒ Av − 2v = 0
⇐⇒ (A − 2I)v = 0
0 0 0 0
⇐⇒ −1 1 1 v = 0 .
2 −2 −2 0
This is a homogeneous system of equations with general solution
1 1
v = s 1 +t 0 ,
0 1
8.1. Eigenvectors and eigenvalues 271
where s and t are parameters. These (except the zero vector) are exactly the eigenvectors corresponding to
the eigenvalue λ = 2. ♠
As the last example shows, the eigenvectors for a given eigenvalue λ , plus the zero vector, form a subspace
of Rn . This is called the eigenspace of λ .
Eλ = {v | Av = λ v} .
It is a subspace of Rn .
Instead of finding all eigenvectors for a given eigenvalue, it is often sufficient to find a basis for the
eigenspace. We also sometimes call the basis vectors of the eigenspace basic eigenvectors.
Exercises
Which of the following vectors are eigenvectors of A? Find the corresponding eigenvalues.
1 1 2 0
v1 = 1 , v2 = 1 , v3 = 2 , v4 = −1 .
−1 0 −1 1
Exercise 8.1.8 Let A be an invertible n × n-matrix, and assume λ is an eigenvalue of A. Show that λ 6= 0
and that λ −1 is an eigenvalue of A−1 .
Exercise 8.1.9 If A is an n × n-matrix and c is a non-zero constant, compare the eigenvalues of A and cA.
Exercise 8.1.10 Let A, B be invertible n × n-matrices which commute. That is, AB = BA. Suppose v is an
eigenvector of B. Show that then Av must also be an eigenvector for B.
Exercise 8.1.11 Suppose A is an n × n-matrix and it satisfies Am = A for some m a positive integer larger
than 1. Show that if λ is an eigenvalue of A then λ equals either 0, 1, or −1.
Outcomes
A. Find the characteristic polynomial, eigenvalues, and eigenvectors of a matrix.
In the previous section, we saw how to find the eigenvectors corresponding to a given eigenvalue λ , if
λ is already known. But we have not yet seen how to find the eigenvalues of a matrix. However, the
calculations in Examples 8.3 and 8.5 suggest a way forward. We can see that the following are equivalent:
274 Eigenvalues, eigenvectors, and diagonalization
1. λ is an eigenvalue of A.
Indeed, the equivalence between 1 and 2 is just the definition of an eigenvalue, and the equivalence between
2 and 3 is just algebra. By Corollary 7.22, we know that the system (A − λ I)v = 0 has a non-trivial solution
if and only if det(A − λ I) = 0. Therefore, we have proved the following theorem:
det(A − λ I) = 0.
Therefore, λ is an eigenvalue if and only if λ 2 + λ − 6 = 0. We can find the roots of this equation using
the quadratic formula, or equivalently, by factoring the left-hand side:
λ 2 + λ − 6 = 0 ⇐⇒ (λ + 3)(λ − 2) = 0.
Solution. We have
−λ −1
det(A − λ I) = = λ 2 + 1.
1 −λ
Since λ 2 + 1 = 0 does not have any solutions in the real numbers, the matrix A has no real eigenvalues.
(However, if we were working over the field of complex numbers rather than real numbers, this matrix
would have eigenvalues λ = ±i). ♠
As the examples show, the quantity det(A − λ I) is always a polynomial in the variable λ . A polynomial
is an expression of the form
p(λ ) = an λ n + an−1 λ n−1 + . . . + a1 λ + a0 ,
where a0 , . . . , an are constants called the coefficients of the polynomial. The polynomial det(A − λ I) has
a special name:
p(λ ) = det(A − λ I)
3. For each eigenvalue λ , find a basis for the eigenvectors by solving the homogeneous system
(A − λ I)v = 0.
To double-check your work, make sure that Av = λ v for each eigenvalue λ and associated eigen-
vector v.
and the eigenvalues of A are λ = 1, λ = −1, and λ = 4. We now find the eigenvectors for each eigenvalue.
You can verify that the roots of this polynomial are λ1 = 0, λ2 = 2, λ3 = 4. Notice that while eigenvectors
can never equal 0, it is possible to have an eigenvalue equal to 0. Now we will find the basic eigenvectors.
• For λ1 = 0: We must solve the equation (A − 0I)v = 0. This equation becomes Av = 0. We write
the augmented matrix for this system and reduce to echelon form:
2 2 −2 0 1 0 −1 0
1 3 −1 0 ≃ . . . ≃ 0 1 0 0 .
−1 1 1 0 0 0 0 0
Thus we have found the eigenvectors v1 for λ1 , v2 for λ2 , and v3 for λ3 . We can double-check our answers
by checking the equation Av = λ v in each case:
2 2 −2 1 0
Av1 = 1 3 −1 0 = 0 = 0v1 ,
−1 1 1 1 0
8.2. Finding eigenvalues 279
2 2 −2 0 0
Av2 = 1 3 −1 1 = 2 = 2v2 ,
−1 1 1 1 2
2 2 −2 1 4
Av3 = 1 3 −1 1 = 4 = 4v3 .
−1 1 1 0 0
Exercises
Exercise 8.2.2 Find the characteristic polynomial, eigenvalues, and basic eigenvectors of the matrix
9 10
.
−5 −6
Exercise 8.2.3 Find the characteristic polynomial, eigenvalues, and basic eigenvectors of the matrix
0 3 −1
−2 4 −2 .
2 −3 3
One eigenvalue is 1.
Exercise 8.2.4 Find the characteristic polynomial, eigenvalues, and basic eigenvectors of the matrix
3 0 −2
−2 1 2 .
0 0 1
One eigenvalue is 3.
Exercise 8.2.5 Find the characteristic polynomial, eigenvalues, and basic eigenvectors of the matrix
9 2 8
2 −6 −2 .
−8 2 −5
Exercise 8.2.7 Find the eigenvalues and eigenvectors of the following triangular matrix:
3 2 2
0 1 −2 .
0 0 −1
Outcomes
A. Visualize the effect of a linear transformation by considering its eigenvectors and eigenvalues.
e2 h i
T (e1 ) = 21
T
e1 7−→
Although we can see from this picture that the letter “F” is being distorted somehow, it is perhaps not very
obvious what exactly this linear transformation does.
We can get a much better idea by computing the eigenvectors and eigenvalues of A. A short calculation
shows that the basic eigenvectors are
1 −1
v1 = and v2 = ,
1 1
with corresponding eigenvalues λ1 = 3 and λ2 = 1. Consider the effect of the linear transformation T on
the eigenvectors:
2 1 1 3
T (v1 ) = = = 3v1 ,
1 2 1 3
282 Eigenvalues, eigenvectors, and diagonalization
2 1 −1 −1
T (v2 ) = = = v2 .
1 2 1 1
So each eigenvector is mapped to a scalar multiple of itself. This gives us a hint for how to draw a more
useful before-and-after picture. Rather than tracking the movement of the standard basis vectors e1 and
~e2 , let us track the movement of the eigenvectors v1 and v2 instead:
T (v1 ) = 3v1
v2 v1 T (v2 ) = v2
T
7−→
Thus, the linear transformation described by the matrix A is revealed to be just a scaling by a factor of 3
along the direction of
1
v1 = .
1
In summary, the geometric meaning of an eigenvector is that it is mapped to a multiple of itself. Thus,
when viewed from the point of view of its action on the eigenvectors, a linear transformation behaves like
a scaling of each eigenvector. We can say that each eigenvector describes a direction of scaling, and each
corresponding eigenvalue giving the corresponding (positive or negative) scaling factor.
Solution. The characteristic polynomial is λ 2 − 1, and so the eigenvalues are λ1 = 1 and λ2 = −1. By
solving each equation (A − λ I)v = 0, we find that the corresponding basic eigenvectors are
1 −1
v1 = and v2 =
1 1
(check this!). We get the following before-and-after picture:
8.3. Geometric interpretation of eigenvectors 283
T (v1 ) = v1
v2 v1
T
7−→
T (v2 ) = −v2
Solution. The characteristic polynomial is (−2 − λ )(1 − λ ), and therefore, the eigenvalues are λ1 = −2
and λ2 = 1. We find that the corresponding basic eigenvectors are
1 1
v1 = and v2 = ,
0 1
v2
T (v2 ) = v2
T
7−→
v1 T (v1 ) = −2v1
284 Eigenvalues, eigenvectors, and diagonalization
This particular linear transformation keeps the vector v2 fixed, while scaling by a factor of −2 in the
direction of v1 . It could be described as a kind of slanted reflection with scaling. ♠
Exercises
Exercise 8.3.1 For each of the following matrices, find the eigenvectors and eigenvalues. Use this to
visualize the linear transformation T : R2 → R2 that is described by the matrix.
1 2 1 0 1 1 1 1 1 0
A= , B= , C= , D= , E= .
2 1 2 −1 0 2 1 1 −1 2
Exercise 8.3.2 If A is the matrix of a linear transformation that rotates all vectors in R2 by 60◦ , explain
why A cannot have any real eigenvalues. Is there an angle such that rotation by this angle would have a
real eigenvalue? What eigenvalues would be obtainable in this way?
Exercise 8.3.3 Let T be the linear transformation that reflects vectors about the x-axis. Find a matrix for
T and then find its eigenvalues and eigenvectors.
Exercise 8.3.4 Let T be the linear transformation that reflects vectors about the line x = y. Find a matrix
of T and then find eigenvalues and eigenvectors.
Exercise 8.3.5 Let T be the linear transformation that reflects all vectors in R3 about the xy-plane. Find
a matrix for T and then obtain its eigenvalues and eigenvectors.
8.4 Diagonalization
Outcomes
A. Compute sums, products, and powers of diagonal matrices.
A square matrix D is called a diagonal matrix if all entries except those on the main diagonal are zero.
Such matrices look like the following:
d11 0 · · · 0
0 d22 · · · 0
.. .. . . .. .
. . . .
0 0 · · · dnn
8.4. Diagonalization 285
Diagonal matrices are particularly easy to work with. For example, the sum of two diagonal matrices is
diagonal. Also, the product of two diagonal matrices is diagonal, and is computed by taking the product
of corresponding diagonal entries.
Solution. We have
2+1 0 0 3 0 0
A+B = 0 3−2 0 = 0 1 0 ,
0 0 4+2 0 0 6
2·1 0 0 2 0 0
AB = 0 3 · (−2) 0 = 0 −6 0 ,
0 0 4·2 0 0 8
and
24 0 0 16 0 0
A4 = 0 34 0 = 0 81 0 .
0 0 44 0 0 256
Notice that all operations are computed componentwise on the diagonal. Therefore, multiplication of
diagonal matrices is much simpler than multiplication of general matrices. ♠
One of the most important problem solving techniques in linear algebra is diagonalization. In a nutshell,
the point of diagonalization is to simplify a problem by replacing an arbitrary matrix by a diagonal matrix.
We say that two square matrices A and B are similar if there exists an invertible matrix P such that
P−1 AP = B. A matrix is diagonalizable if it is similar to a diagonal matrix. This is summarized in the
following definition.
The key connection between diagonalizability, eigenvectors, and eigenvalues is the following theorem.
Moreover, in this case, let P be the invertible matrix whose columns are n linearly independent
eigenvectors of A, and let D be the diagonal matrix whose diagonal entries are the corresponding
eigenvalues. Then P−1 AP = D.
286 Eigenvalues, eigenvectors, and diagonalization
Proof. Assume that A has n linearly independent eigenvectors v1 , . . . , vn . Let λ1 , . . ., λn be the correspond-
ing eigenvalues, so that
Avi = λi vi (8.1)
for all i = 1, . . ., n. Let P be the matrix that has v1 , . . . , vn as its columns. Then P is invertible because
v1 , . . . , vn are linearly independent. Let D be the diagonal matrix that has λ1 , . . . , λn as its diagonal entries.
By the column method of matrix multiplication, the i th column of AP is Avi . Also by the column method
of matrix multiplication, the i th column of PD is λi vi . Therefore, by (8.1), the matrices AP and PD have
the same columns, i.e.,
AP = PD.
It follows that P−1 AP = D, as desired.
Conversely, assume that A is diagonalizable. Then there exists an invertible matrix P and a diagonal
matrix D such that P−1 AP = D, or equivalently, AP = PD. Let v1 , . . . , vn be the columns of P and let
λ1 , . . . , λn be the diagonal entries of D. Again we find that the i th column of AP is Avi and the i th column
of PD is λi vi , and therefore Avi = λi vi holds for all i. It follows that v1 , . . . , vn are eigenvectors of A. Since
P is invertible, v1 , . . . , vn are linearly independent, so A has n linearly independent eigenvectors. ♠
Solution. By Theorem 8.21, we use the eigenvectors of A as the columns of P and the corresponding eigen-
values as the diagonal entries of D. We already found the eigenvectors and -values of A in Example 8.13.
They were
−1 −1 0
v1 =
1 . v2 = 0 . and v3 = 1 .
1 2 0
with corresponding eigenvalues λ1 = 1, λ2 = −1, and λ3 = 4. Therefore we can use
−1 −1 0 1 0 0
P= 1 0 1 and D = 0 −1 0 .
1 2 0 0 0 4
To double-check that P−1 AP is indeed equal to D, we first compute the inverse of P:
−2 0 −1
P−1 = 1 0 1 .
2 1 1
Then
−2 0 −1 3 0 2 −1 −1 0 1 0 0
P−1 AP = 1 0 1 6 4 3 1 0 1 = 0 −1 0 = D.
2 1 1 −4 0 −3 1 2 0 0 0 4
8.4. Diagonalization 287
Alternatively, we could have checked that AP = PD, which would not have required computing P−1 . ♠
Notice that the eigenvalues on the main diagonal of D must be in the same order as the corresponding
eigenvectors in P. Since the eigenvectors v1 and v2 are both for the eigenvalue λ = 2, the entry 2 appears
twice in the matrix D. ♠
The following example shows that not all matrices are diagonalizable.
Solution. Through the usual procedure, we find that the characteristic polynomial is (1− λ )2 , and therefore
the only eigenvalue is λ = 1. To find the eigenvectors, we solve the equation (A − I)v = 0:
0 1 0
.
0 0 0
The general solution is
1
v=t .
0
Because the solution space is 1-dimensional, there is only one basic eigenvector:
1
v1 = .
0
Since the matrix A has only one basic eigenvector, we cannot find two linearly independent eigenvectors.
Therefore, by Theorem 8.21, A cannot be diagonalized. ♠
Exercises
−2 0 0 1 0 0
Exercise 8.4.1 Let D = 0 1 0 and E = 0 −1 0 . Find D + E, DE, and D7 .
0 0 2 0 0 3
Outcomes
A. Use diagonalization to raise a matrix to a high power.
Suppose we have a matrix A and we want to find A50 . One could try to multiply A with itself 50 times,
but this is a lot of work (try it!). However, diagonalization allows us to compute high powers of a matrix
relatively easily. Suppose A is diagonalizable, so that P−1 AP = D. We can rearrange this equation to write
A = PDP−1 . Now, consider A2 . Since A = PDP−1 , it follows that
A2 = (PDP−1 )2 = PDP−1 PDP−1 = PD2 P−1 .
Similarly,
A3 = (PDP−1 )3 = PDP−1 PDP−1 PDP−1 = PD3 P−1 .
In general,
An = (PDP−1 )n = PDn P−1 .
Therefore, we have reduced the problem to finding Dn . But as we saw in Example 8.19, computing a
power of a diagonal matrix is easy. To compute Dn , we only need to raise every entry on the diagonal to
the power of n. Through this method, we can compute large powers of matrices.
Solution. First, we will diagonalize A. Following the usual steps, we find that the eigenvalues are λ = 1
and λ = 2. The basic eigenvectors corresponding to λ = 1 are
0 −1
v1 = 0 and v2 = 1 ,
1 0
and the basic eigenvector corresponding to λ = 2 is
−1
v3 = 0 .
1
Now we construct P by using the basic eigenvectors of A as the columns of P. Thus
0 −1 −1
P= 0 1 0 .
1 0 1
8.5. Application: Matrix powers 291
The inverse of P is
1 1 1
P−1 = 0 1 0 .
−1 −1 0
Then
1 1 1 2 1 0 0 −1 −1 1 0 0
P−1 AP = 0 1 0 0 1 0 0 1 0 = 0 1 0 = D.
−1 −1 0 −1 −1 1 1 0 1 0 0 2
Now it follows by rearranging the equation that A = PDP−1 , and therefore, as noted above,
50
0 −1 −1 1 0 0 1 1 1
A50 = PD50 P−1 = 0 1 0 0 150 0 0 1 0
1 0 1 0 0 2 50 −1 −1 0
250 −1 + 250 0
= 0 1 0 .
1−2 50 1−2 50 1 ♠
Thus, through diagonalization, we have efficiently computed a high power of A. The following example
shows that we can also use the same technique for finding a square root of a matrix.
We can equivalently write A = PDP−1 . Finding a square root of a diagonal matrix is easy:
1 √0 0
1
D2 = 0 2 0 .
0 0 2
1 1 1
If we now define B = PD 2 P−1 , we clearly have B2 = PD 2 P−1 PD 2 P−1 = PDP−1 = A. So the desired
square root of A is
1 0 1 1 √0 0 1 −1 −1 1√ 1√ 1
1
B = PD 2 P−1 = 1 1 1 0 2 0 −1 1 0 = 1 − √2 1 + √2 1 .
−1 −1 0 0 0 2 0 1 1 −1 + 2 1 − 2 1
Finally, we verify that we have computed B correctly by squaring it and double-checking that we really
get A.
1√ 1√ 1 1√ 1√ 1 1 3 3
B2 = 1 − √2 1 + √2 1 1 − √2 1 + √2 1 = −1 5 3 = A.
−1 + 2 1 − 2 1 −1 + 2 1 − 2 1 1 −1 1
We note that the square root of a matrix is not unique. In fact, D has 8 different square roots, all of the
form
±1 0
√ 0
0 ± 2 0 .
0 0 ±2
It follows that A has 8 different square roots as well. We leave it as an exercise to compute them all. ♠
The same method can also be used to compute other powers of a matrix, for example a cube root.
Exercises
1 2
Exercise 8.5.1 Let A = . Find A10 by diagonalization.
2 1
1 4 1
Exercise 8.5.2 Let A = 0 2 5 . Find A50 by diagonalization.
0 0 5
1 −2 −1
Exercise 8.5.3 Let A = 2 −1 1 . Find A100 by diagonalization.
−2 3 1
−5 −6
Exercise 8.5.4 Let A = . Find a square root of A, i.e., find a matrix B such that B2 = A.
9 10
8.6. Application: Solving recurrences 293
−2 0 6
Exercise 8.5.5 Let A = −3 1 6 . Find a square root of A.
−3 0 7
Outcomes
A. Solve a linear recurrence relation using diagonalization.
0, 1, 1, 2, 3, 5, 8, 13, 21, . . .
The first two Fibonacci numbers are 0 and 1. Every subsequent Fibonacci number is the sum of the
previous two numbers. For example, 0 + 1 = 1, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5, and so on. Thus, if we
write Fn for the n th Fibonacci number, then the Fibonacci sequence is given by the following conditions:
F0 = 0,
F1 = 1,
Fn+2 = Fn + Fn+1 , for all n ≥ 0.
The condition Fn+2 = Fn +Fn+1 is known as a recurrence relation, or simply as a recurrence, because we
compute each member of the sequence from previous members (the word “recurrence” comes from Latin
“recurrere”, which means “to go back”). The conditions F0 = 0 and F1 = 1 are known as the base cases of
the recurrence. Note that we start counting from zero, i.e., we call F0 = 0 the “zeroth Fibonacci number”,
F1 = 1 the “first Fibonacci number”, and so on. Counting from zero will help simplify our calculations
later.
Solution. To compute the 10 th Fibonacci number using the recurrence, we have to compute all the previous
Fibonacci numbers as well.
F0 = 0,
F1 = 1,
F2 = F0 + F1 = 0+1 = 1,
F3 = F1 + F2 = 1+1 = 2,
F4 = F2 + F3 = 1+2 = 3,
F5 = F3 + F4 = 2+3 = 5,
294 Eigenvalues, eigenvectors, and diagonalization
F6 = F4 + F5 = 3 + 5 = 8,
F7 = F5 + F6 = 5 + 8 = 13,
F8 = F6 + F7 = 8 + 13 = 21,
F9 = F7 + F8 = 13 + 21 = 34,
F10 = F8 + F9 = 21 + 34 = 55.
Therefore, to compute vn+1 , we only need to know vn (and not vn−1 ). Let
0 1
A= .
1 1
Since vn is obtained from v0 by applying the matrix A n times, we have vn = An v0 for all n ≥ 0. We can
therefore get a formula for vn , and thus for Fn , by diagonalizing the matrix A.
The eigenvalues are the roots of the characteristic polynomial. We find them by using the quadratic
formula. The eigenvalues are √ √
1+ 5 1− 5
λ1 = and λ2 = .
2 2
8.6. Application: Solving recurrences 295
1 − λ1 = λ2 . (8.2)
Solution. Since
Fn
vn = ,
Fn+1
we know that the n th Fibonacci number is the first component of vn , i.e.,
Fn = 1 0 vn .
296 Eigenvalues, eigenvectors, and diagonalization
Solution. Note that this calculation requires about 25 digits of precision to give the correct result. We have
1
F100 = √ (λ1100 − λ2100 )
5
1
= √ (1.6180339887498948482045868100 − (−0.6180339887498948482045868)100)
5
= 354224848179261915075.
♠
We can use the same method to solve other linear recurrences. Here is another example:
b0 = 1,
b1 = 2,
bn+2 = 6bn + bn+1 , for all n ≥ 0.
Find the first 5 members of this sequence. Then solve the recurrence and find b20 .
8.6. Application: Solving recurrences 297
b0 = 1,
b1 = 2,
b2 = 6b0 + b1 = 6 + 2 = 8,
b3 = 6b1 + b2 = 12 + 8 = 20,
b4 = 6b2 + b3 = 48 + 20 = 68.
To solve the recurrence, let
bn
wn = ,
bn+1
so that for all n ≥ 0,
bn+1 bn+1 0 1 bn 0 1
wn+1 = = = = wn .
bn+2 6bn + bn+1 6 1 bn+1 6 1
We then diagonalize the matrix
0 1
B= .
6 1
The characteristic polynomial is
p(λ ) = λ 2 − λ − 6,
with roots λ = −2 and λ = 3. The respective eigenvectors are
1
v= for the eigenvalue λ = −2,
−2
1
u= for the eigenvalue λ = 3.
3
Finally, we calculate
Exercises
a0 = 0,
a1 = 1,
an+2 = 2an+1 + 3an , for all n ≥ 0.
Find the first 5 members of this sequence. Then solve the recurrence and find a20 .
b0 = 1,
b1 = 2,
b2 = 3,
bn+3 = 2bn+2 + bn+1 − 2bn , for all n ≥ 0.
Find the first 5 members of this sequence. Then solve the recurrence and find b20 .
Exercise 8.6.3 Consider two sequences of numbers c0 , c1 , c2 , . . . and d0 , d1 , d2 , . . ., defined by the following
mutual recurrence relation.
c0 = 0, d0 = 1
cn+1 = 4cn − 3dn , dn+1 = 2cn − dn , for all n ≥ 0.
Find the first 5 members of both sequences. Then solve the recurrence. Hint: first find a matrix A such that
cn+1 cn
=A .
dn+1 dn
8.7. Application: Systems of linear differential equations 299
Outcomes
A. Solve a system of second order linear differential equations.
Recall from calculus that if y = f (x) is a function, then y′ = f ′ (x) is its derivative and y′′ = f ′′ (x) is its
second derivative. For example,
y = sin(x),
y′ = cos(x),
y′′ = − sin(x).
Note that if y = sin(x), the second derivative y′′ is exactly the negative of y, i.e.,
y′′ = −y.
This last equation is called a differential equation. Unlike an ordinary equation, which is about an
unknown number, a differential equation is about an unknown function. Typically, a differential equation
mentions the function and one or more of its derivatives. A differential equation that only mentions the
first derivative y′ is called a first-order differential equation. A differential equation that also mentions
the second derivative y′′ is called a second-order differential equation. If the equation is a linear function
of y and its derivatives, it is called a linear differential equation.
We say that the function y = sin(x) is a solution of the differential equation y′′ = −y. It is not the only
solution. Another solution is y = cos(x), because in that case, y′′ = − cos(x), and therefore y′′ = −y. From
calculus, we know that the general solution to the differential equation y′′ = −y is given by
y = a sin(x) + b cos(x),
where a and b are any real numbers, i.e., parameters. Using the terminology of linear algebra, we can say
that the general solution of the equation y′′ = −y is a linear combination of the basic solutions y = sin(x)
and y = cos(x). More generally, we have the following theorem from calculus:
Proof. By taking derivatives, it is easy to check that each of the six functions is a solution of the cor-
√ √ √
responding differential equation. For example, for y = sin( q x), we have y′ = q cos( q x) and y′′ =
√
−q sin( q x). Therefore, y′′ = −qy.
300 Eigenvalues, eigenvectors, and diagonalization
Note that we can obtain the general solution of each of the differential equations as a linear combination
of the basic solutions. Thus, the general solution of y′′ = −qy is
√ √
y = a sin( qx) + b cos( qx),
In the same way that a system of linear equations consists of several linear equations about several vari-
ables, a system of differential equations consists of several differential equations about several unknown
functions and their derivatives. For example, the following is a system of second order linear differential
equations:
y′′ = 4y − 3z,
z′′ = 6y − 5z.
In reading these equations, it is important to understand that we are looking for two unknown functions
y = f (x) and z = g(x) such that their second derivatives satisfy both of the equations y′′ = 4y − 3z and
z′′ = 6y − 5z. The reason that this is in principle a difficult problem is that the equation for y′′ mentions
not only y, but also z, and the equation for z′′ mentions not only z, but also y. Therefore, it is not possible
to solve this system one function at a time. We say that the variables y and z are coupled.
The following example shows how we can use diagonalization to decouple the variables in a system
of differential equations. This makes it possible to solve the equations.
y′′ = 4y − 3z,
z′′ = 6y − 5z.
Our next step is to diagonalize the matrix A. Following the usual diagonalization procedure, we find that
A = PDP−1 , where
1 1 1 0
P= and D = .
1 2 0 −2
With this, our equation takes the form v′′ = PDP−1 v, which we can also write as P−1 v′′ = DP−1 v. We
now introduce a change of variables. Let w = P−1 v. Then our system of differential equations can be
written as
w′′ = Dw. (8.5)
Note that the equation (8.5) is of exactly the same form as the equation (8.4), but with the crucial difference
that the matrix in (8.5) is diagonal. Let us give a name to the components of w:
u
w= .
v
or equivalently,
u′′ = u,
v′′ = −2v.
Note that the variables u and v are not coupled! This happened because the matrix D is diagonal. We can
therefore use Theorem 8.32 to solve the equations for u and for v separately. By Theorem 8.32, the general
solution for the equation u′′ = u is
u = aex + be−x ,
and the general solution for the equation v′′ = −2v is
√ √
v = c sin( 2 x) + d cos( 2 x).
Here, a, b, c, and d are parameters. Therefore, the general solution for (8.5) is
x + be−x
x −x
u √ae √ e e 0
√ 0
√
w= = =a +b +c +d .
v c sin( 2 x) + d cos( 2 x) 0 0 sin( 2 x) cos( 2 x)
We have therefore found the four basic solution of (8.5). But what about our original equation (8.4)? We
can undo our change of variables. Since w = P−1 v, we have v = Pw. Therefore, the general solution to
our original system of differential equations is
x −x
e e 0
√ 0
√
v = Pw = a P +bP +cP +dP
0 0 sin( 2 x) cos( 2 x)
x 1 −x 1
√ 1 √ 1
= ae +be + c sin( 2 x) + d cos( 2 x) .
1 1 2 2
♠
302 Eigenvalues, eigenvectors, and diagonalization
One of the reasons that differential equations are important is that the laws of nature often take the form
of differential equations. For example, Newton’s second law of motion asserts that the acceleration of an
object is equal to the total force on the object divided by the mass of the object. In physics, it is common
to use t instead of x for the independent variable, and x instead of y for the dependent variable, so that we
write x = f (t) instead of y = f (x). If x is the position of the object at time t, then the object’s acceleration
is x′′ , and Newton’s second law takes the form
F
x′′ = .
m
This is a differential equation. In the following example, we will need another law of physics, namely
Hooke’s law about the force exerted by a spring. A spring is an object made from an elastic material
(often in the shape of a coil), which returns to its original shape after being stretched or compressed.
Hooke’s law states that the force exerted by a spring to both of its ends is equal to
F = kx.
Here x is the extension of the spring, i.e., the change in length of the spring, relative to its relaxed (natural)
N
length. Also, k is a constant called the spring constant, measured in units of m (Newtons per meter). Of
course the direction of the force on one end of the spring is the opposite of the direction on the other
end. Hooke’s law is not a differential equation, because it does not mention any derivatives. It is just an
ordinary equation. Nevertheless, both the force F and the extension x can vary with time, i.e., they can
both be functions of t.
k k
1 kg 2 kg 1 kg
Solution. Let us start by defining appropriate coordinates. Let x be the position of the first car, y the
position of the second car, and z the position of the third car, measured in meters from left to right, relative
to each car’s natural resting position. The x-, y-, and z-axes are shown in the following picture. All three
axes are parallel to the train tracks, but they have their origins in different places. The coordinates are
chosen so that when x = 0, y = 0, and z = 0, then all three cars are at rest and both springs are in their
8.7. Application: Systems of linear differential equations 303
Then the extension of the left spring is y − x, and therefore the left spring’s contracting force is
F1 = k(y − x).
Similarly, the extension of the right spring is z − y, and therefore its contracting force is
F2 = k(z − y).
The total force acting on the left car is F1 , the total force acting on the middle car is F2 − F1 , and the total
force acting on the right car is −F2 . By Newton’s second law, the acceleration of each car is given by
x′′ = mF11 , y′′ = F2m−F
2
1
, and z′′ = −F2
m3 . We therefore have the following equations of motion:
k
x′′ = (y − x),
m1
k
y′′ = (x − 2y + z),
m2
k
z′′ = (z − y).
m3
Let us ignore the physical units and plug in the masses m1 = 1, m2 = 2, and m3 = 1 and the spring constant
k = 2. Then the equations of motion are:
x′′ = 2(y − x),
y′′ = x − 2y + z,
z′′ = 2(z − y),
or equivalently in matrix form:
x′′ −2 2 0 x
y′′ = 1 −2 1 y .
z′′ 0 2 −2 z
We can also write this as v′′ = Av, where
x −2 2 0
v= y and A = 1 −2 1 .
z 0 2 −2
To solve the equation, we diagonalize the matrix A. Using the usual method for diagonalization, we find
that A = PDP−1 , where
−1 1 1 −2 0 0
P = 0 −1 1 and D = 0 −4 0 .
1 1 1 0 0 0
304 Eigenvalues, eigenvectors, and diagonalization
The equation v′′ = Av then becomes v′′ = PDP−1 v, or equivalently P−1 v′′ = DP−1 v. We then diagonalize
the equation by performing the change of variables w = P−1 v. The equation becomes
w′′ = Dw.
or equivalently,
u′′ = −2u,
v′′ = −4v,
w′′ = 0.
Since the variables are now decoupled, we can solve each differential equation individually.
• Solutions for λ = −2: By Theorem 8.32, the basic solutions for u′′ = −2u are
√ √
u = sin( 2t) and u = cos( 2t).
For example, the first of these basic solutions, written in the coordinates x, y, and z, gives
√
x = − sin( 2t),
y = 0, √
z = sin( 2t).
This corresponds to a periodic oscillation of the train where the middle car
√is stationary, and the left
car moves left when the right car moves right. Each oscillation takes 2π / 2 ≈ 4.4 seconds. Here is
8.7. Application: Systems of linear differential equations 305
t = 0s
t = 1.1 s
t = 2.2 s
t = 3.3 s
t = 4.4 s
The other basic solution, with cos instead of sin, is the same motion, just starting at a different offset
in time. Note how the eigenvector of A,
−1
0 ,
1
describes the relative motion of the three cars, i.e., the first and last cars are moving in opposite di-
rections, whereas the middle car is stationary.
√ The corresponding eigenvalue λ = −2 determines the
frequency. The frequency, which is 2/2π oscillations per second, is also called an eigenfrequency
or resonance frequency of the system.
• Solutions for λ = −4: By Theorem 8.32, the basic solutions for v′′ = −4v are
This corresponds to a periodic oscillation of the train where the outer cars move right at the same
time that the middle car moves left. Each oscillation takes 2π /2 ≈ 3.14 seconds. Here is a “movie”
of the motion:
t = 0s
t = 0.79 s
t = 1.57 s
t = 2.36 s
t = 3.14 s
As before, the other basic solution, using cos instead of sin, is the same motion, but shifted in time.
Also, the eigenvector
1
−1
1
describes the relative motion of the three cars; here the first and last car move in the same direction
while the middle car moves in the opposite direction. The eigenfrequency of this oscillation, at
2/2π oscillations per second, is slightly higher than the first one, due to the larger magnitude of the
eigenvalue λ = −4.
• Solutions for λ = 0: The last eigenvalue is zero. By Theorem 8.32, the general solution of w′′ = 0
is
w = a + bt,
where a and b are arbitrary constants. This translates into the following solution for w:
0
w = (a + bt) 0 ,
1
and by the change of variables v = Pw, we get
1
v = (a + bt) 1 .
1
Writing this in terms of the coordinates x, y, and z, we get
x = a + bt,
y = a + bt,
z = a + bt.
8.7. Application: Systems of linear differential equations 307
This simply describes a linear motion: the three cars are moving down the track at constant speed b
from some initial starting position a.
t = 0s
t = 1s
t = 2s
t = 3s
What we have described here are the basic solutions of the system. The solutions corresponding to each of
the eigenvalues are also called modes of the system. Thus, the train has three modes, each corresponding
to a particular eigenvalue of the matrix A. In the first mode, the middle car is stationary and the other two
cars oscillate in opposite directions. In the second mode, the middle car oscillates in the opposite direction
of the two outer cars. The third mode is a linear movement along the track.
As always, the general solution is a linear combination of basic solutions; for example, the cars might
be oscillating in both the first and second modes at their respective frequencies, while also moving down
the tracks. ♠
Exercises
Exercise 8.7.1 Solve the following system of second order linear differential equations:
y′′ = −5y − 6z,
z′′ = 3y + 4z.
Exercise 8.7.2 Solve the following system of second order linear differential equations:
y′′ = 4y − 3z,
z′′ = 2y − z.
308 Eigenvalues, eigenvectors, and diagonalization
Exercise 8.7.3 Solve the following system of second order linear differential equations:
Exercise 8.7.4 Solve the following system of first order linear differential equations.
y′ = 4y + 6z,
z′ = −3y − 5z.
Hint: The method is similar to that of second-order equations. Use the fact, known from calculus, that the
equation f ′ = k f has basic solution f (x) = ekx . Here k is any constant (positive, negative, or zero).
Exercise 8.7.5 Consider three coupled train cars as in Example 8.34, except that all three cars have mass
N
1 kg and both spring have spring constant k = 1 m . Find and solve the equations of motion.
Outcomes
From calculus, recall that a function is called analytic if it can be defined by a power series. For example:
1 1 1
ex = 1 + x + x2 + x3 + x4 + . . .
2 3! 4!
1 3 1 5 1 7
sin x = x − x + x − x ± . . .
3! 5! 7!
1 2 1 4 1 6
cos x = 1 − x + x − x ± . . .
2 4! 6!
We know from calculus that the above power series converge for all real numbers x. Since it makes sense
to compute the n th power of a square matrix, in principle it also makes sense to plug a matrix into a power
series. For a square matrix A, we can define
1 1 1
eA = I + A + A2 + A3 + A4 + . . .
2 3! 4!
1 3 1 5 1 7
sin A = A − A + A − A ± . . .
3! 5! 7!
1 2 1 4 1 6
cos A = I − A + A − A ± . . .
2 4! 6!
8.8. Application: The matrix exponential 309
The goal of this section is to investigate whether these power series converge, and if yes, how to compute
the sum of the series. We begin with the case of a diagonal matrix.
Solution. By definition,
1 1
eD = I + D + D2 + D3 + . . .
2 3!
1 0 x 0 1 x2 0 1 x3 0
= + + + +...
0 1 0 y 2 0 y2 3! 0 y3
1 + x + 12 x2 + 3!
1 3
x +... 0
=
0 1 + y + 21 y2 + 3!1 y3 + . . .
x
e 0
= .
0 ey
Therefore, the exponential of a diagonal matrix is computed by taking the exponential of each diagonal
entry. Note that this proves, in particular, that the sum converges. ♠
The same argument also works for applying other analytic functions to diagonal matrices, for example:
x 0 sin x 0
sin = ,
0 y 0 sin y
x 0 cos x 0
cos = .
0 y 0 cos y
But how can we compute the matrix exponential of a non-diagonal matrix? This can be done by diagonal-
ization. The following theorem shows how:
eA = PeD P−1 ,
sin A = P(sinD)P−1 ,
cos A = P(cos D)P−1 .
Proof. We have
1 1
eA = I + A + A2 + A3 + . . .
2 3!
1 1
= I + PDP−1 + (PDP−1)2 + (PDP−1)3 + . . .
2 3!
310 Eigenvalues, eigenvectors, and diagonalization
1 1
= PIP−1 + PDP−1 + PD2 P−1 + PD3 P−1 + . . .
2 3!
1 2 1 3 −1
= P(I + D + D + D + . . .)P
2 3!
= PeD P−1 .
The proof for sin and cos is similar. Indeed, the same method works for any analytic function. ♠
Solution. We first diagonalize A. Following the usual method, we find that the eigenvalues are λ = −3
and λ = 2, with corresponding eigenvectors
2 1
and .
1 1
Exercises
Exercise 8.8.2 The cube root is an analytic function. Compute the cube root of the matrix
22 −21
A= .
14 −13
8.9. Properties of eigenvectors and eigenvalues 311
Exercise 8.8.3 Use matrix exponentials to find the solution to the first-order linear differential equation
′
x 0 −1 x
=
y 6 5 y
Outcomes
A. Know that eigenvectors corresponding to distinct eigenvalues are linearly independent.
C. Determine whether a matrix is diagonalizable from the geometric multiplicities of its eigen-
values.
In this section, we state some useful properties of eigenvectors and eigenvalues. The first question we
consider is whether eigenvectors for different eigenvalues are linearly independent. This is indeed the
case, as the following proposition shows:
Proof. Suppose, for the sake of obtaining a contradiction, that v1 , . . . , vk are linearly dependent. Let m the
the smallest index such that vm is redundant, i.e., such that vm is a linear combination of previous vectors.
Say
vm = a1 v1 + . . . + am−1 vm−1 . (8.6)
Multiplying the equation by A, we get
Proof. Each of the n eigenvalues has an eigenvector, and by Proposition 8.38, they are linearly independent.
Then A is diagonalizable by Theorem 8.21. ♠
The next issue we consider is that of “repeated” eigenvalues. There are two senses in which an eigenvalue
can occur “more than once”. The first is if the eigenvalue appears as a repeated root of the characteristic
polynomial. For example, if the characteristic polynomial is p(λ ) = (1 − λ )(1 − λ )(3 − λ ), then we say
that the root λ = 1 appears with multiplicity two, and the root λ = 3 appears with multiplicity one. We
call this the algebraic multiplicity of the eigenvalue.
The second sense in which an eigenvalue can occur “more than once” is when an eigenvalue has more
than one linearly independent eigenvector. In other words, when the eigenspace has dimension greater
than 1. We call this the geometric multiplicity of the eigenvalue.
The following definition summarizes these concepts.
One would hope that the algebraic and geometric multiplicities are always equal. Unfortunately, this is
not the case, as the following example shows.
This system has rank 4, and the only basic solution is [1, 0, 0, 0, 0]T . Thus, the eigenspace E3 is 1-
dimensional, and the geometric multiplicity of λ = 3 is 1. A similar calculation show that λ = 4 has
geometric multiplicity 2 and λ = 5 has geometric multiplicity 1. The information is summarized in the
following table:
Eigenvalue λ =3 λ =4 λ =5
Algebraic multiplicity 2 2 1
Geometric multiplicity 1 2 1
♠
In the example, the geometric multiplicity is either smaller than or equal to the algebraic multiplicity. The
following proposition states that this is always the case.
Proof. It is clear that m ≥ 1, because each eigenvalue, by definition, must have at least one associated
eigenvector. Therefore, the eigenspace is at least 1-dimensional. We must show m ≤ k. Assume that A is
an n × n-matrix. By assumption, the geometric multiplicity of λ̂ is m, so the eigenspace Eλ̂ has dimension
m. So there exist m linearly independent eigenvectors v1 , . . . , vm for the eigenvalue λ̂ . Extend v1 , . . . , vm to
a basis v1 , . . . , vn of Rn , and let P be the invertible matrix that has v1 , . . . , vn as its columns. Let B = P−1 AP.
Since the first m columns of P are eigenvectors of A for the eigenvalue λ̂ , it follows that B is of the form
λ̂ 0 · · · 0 ∗ · · · ∗
0 λ̂ · · · 0 ∗ · · · ∗
. . . . . .
.. .. . . .. . . ..
0 0 · · · λ̂ ∗ · · · ∗ ,
0 0 ··· 0 ∗ ··· ∗
. . .
.. .. . . ... . . . ...
0 0 ··· 0 ∗ ··· ∗
i.e., the first m columns of B are like those of a diagonal matrix. Then from the cofactor method for
computing determinants, we know that det(B − λ I) contains the factor (λ̂ − λ )m . But since A and B are
314 Eigenvalues, eigenvectors, and diagonalization
similar matrices, they have the same characteristic polynomial. Therefore, det(A − λ I) also has (λ̂ − λ )m
as a factor. It follows, by definition of algebraic multiplicity, that m ≤ k, as desired. ♠
We know from Theorem 8.21 that an n × n-matrix is diagonalizable if and only if it has n linearly inde-
pendent eigenvectors. We can re-state this in terms of geometric multiplicity as follows.
Proof. By Proposition 8.38, eigenvectors corresponding to different eigenvalues are linearly independent.
Therefore, by taking a basis of each eigenspace, we can obtain exactly as many linearly independent eigen-
vectors as the sum of the dimensions of all the eigenspaces, i.e., the sum of the geometric multiplicities of
all eigenvalues. By Theorem 8.21, A is diagonalizable if and only if this number is n. ♠
Exercises
Exercise 8.9.1 Find the algebraic and geometric multiplicity of each eigenvalue of the following matrices.
Which of the matrices are diagonalizable?
1 1 −7 8
(a) A = , (b) B = ,
−1 3 −4 5
2 −1 −1 3 1 −1 −2 0 1
(c) C = 0 4 2 , (d) D = 0 2 1 , (e) E = −1 −1 1 .
0 −1 1 0 −1 4 −2 1 0
(a) A is a 3 × 3-matrix with eigenvalues −1 and 3. The eigenvalue −1 has algebraic multiplicity 2 and
geometric multiplicity 1. The eigenvalue 3 has algebraic and geometric multiplicity 1.
(b) B is a 4 × 4-matrix with eigenvalues 2 and −2. The eigenvalue 2 has algebraic and geometric
multiplicity 1. The eigenvalue −2 has algebraic and geometric multiplicity 3.
(c) C is a 5 × 5-matrix with eigenvalues 1 and 3. The eigenvalue 1 has algebraic and geometric multi-
plicity 2, and the eigenvalue 3 has algebraic and geometric multiplicity 1.
8.10. The Cayley-Hamilton Theorem 315
Outcomes
In this section, we will consider the so-called Cayley-Hamilton theorem. It states that every square
matrix is a root of its own characteristic polynomial. We use the following notation. If
The explanation for the last term is that A0 is interpreted as I, the identity matrix.
The remainder of this section is devoted to the proof of the Cayley-Hamilton theorem. Readers who are
not interested in the proof can skip this material. We begin with a lemma:
A0 + A1 λ + . . . + Am λ m = 0.
Then each Ai = 0.
A0 λ −m + A1 λ −m+1 + . . . + Am−1 λ −1 + Am = 0.
Now let |λ | → ∞ to obtain Am−1 = 0. Continue multiplying by λ and letting λ → ∞ to obtain Ai = 0 for
all i. ♠
The following is a simple consequence of the lemma.
Corollary 8.47:
Let Ai and Bi be n × n-matrices and suppose that
A0 + A1 λ + . . . + Am λ m = B0 + B1 λ + . . . + Bm λ m
Proof. Subtracting the right-hand side from the left-hand side and using Lemma 8.46, we get that Ai = Bi
for all i. But then the conclusion immediately follows. ♠
With this preparation, it is now relatively easy to prove the Cayley-Hamilton theorem.
Proof of the Cayley-Hamilton Theorem. Let A be an n × n-matrix, and let p(λ ) = det(A − λ I) be its
characteristic polynomial. Let adj(A − λ I) be the adjugate of the matrix A − λ I (see Section 7.6 for the
definition of the adjugate). Since each of the entries of the adjugate is a cofactor of A − λ I, the entries are
polynomials in λ of degree at most n − 1. Therefore, the adjugate can be written in the form
Since this equation holds for all λ , Corollary 8.47 may be used. Therefore, if λ is replaced with A, the
two sides will be equal. Thus
Exercises
Exercise 8.10.3
Outcomes
A. Find the complex eigenvalues and eigenvectors of a matrix.
has characteristic polynomial λ 2 +1. Since the equation λ 2 +1 does not have any roots in the real numbers,
there are no real eigenvalues.
On the other hand, the fundamental theorem of algebra tell us that over the complex numbers, every
non-constant polynomial has a root. In fact, every polynomial of degree n factors into n linear factors.
Therefore, every matrix has at least one eigenvalue over the complex numbers. Some matrices are diago-
nalizable over the complex numbers but not over the real numbers. An introduction to complex numbers
and the fundamental theorem of algebra can be found in Appendix A.
Solution. The characteristic polynomial is λ 2 + 1. This has no roots in the real numbers, but it has two
roots λ = i and λ = −i in the complex numbers. To find the eigenvectors for λ = i, we solve (A −iI)v = 0:
−i −1 0 R1 ↔ R2 1 −i 0 R2 ← R2 +iR1 1 −i 0
≃ ≃ .
1 −i 0 −i −1 0 0 0 0
Thus, the basic eigenvector for λ = i is
i
v1 = .
1
Similarly, to find the eigenvectors for λ = −i, we solve (A + iI)v = 0:
i −1 0 R1 ↔ R2 1 i 0 R2 ← R2 −iR1 1 i 0
≃ ≃ .
1 i 0 i −1 0 0 0 0
Thus, the basic eigenvector for λ = −i is
−i
v2 = .
1
Since we have found two linearly independent eigenvectors, the matrix A is diagonalizable. We have
A = PDP−1 , where
i −i i 0
P= and D = .
1 1 0 −i
♠
♠
In Example 8.48, the complex eigenvalues were i and −i. In Example 8.49, the complex eigenvalues were
1 + i and 1 − i. In Example 8.50, the complex eigenvalues were 1 + 2i and 1 − 2i, and there was also a real
eigenvalue of −1. Is it a coincidence that the complex eigenvalues always come in conjugate pairs? The
following proposition states that this is always the case.
It is important to note that even over the complex numbers, not all matrices are diagonalizable. On the one
hand, the characteristic polynomial of an n × n-matrix always factors into n linear factors over the complex
numbers. Therefore, the sum of the algebraic multiplicities of the eigenvalues is always n. However, it
can still happen that the geometric multiplicity of some eigenvalue is less than its algebraic multiplicity.
In that case, the matrix is not diagonalizable, even over the complex numbers. We have:
Proof. Let A be an n × n-matrix. By the fundamental theorem of algebra, the characteristic polynomial
factors into n linear factors. Therefore, the sum of the algebraic multiplicities of all the eigenvalues is n.
We know by Proposition 8.42 that the geometric multiplicity of each eigenvalue less than or equal to its
algebraic multiplicity. If the geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity,
then the sum of the geometric multiplicities is n, and therefore A is diagonalizable by Proposition 8.43.
On the other hand, if the geometric multiplicity of some eigenvalue is less than its algebraic multiplicity,
then the sum of the geometric multiplicities is less than n, and A is not diagonalizable. ♠
Solution. The characteristic polynomial is (1 − λ )2 , and therefore the only eigenvalue is λ = 1, with
algebraic multiplicity 2. On the other hand, the eigenspace for λ = 1 is 1-dimensional:
0 1 0
v=
0 0 0
322 Eigenvalues, eigenvectors, and diagonalization
has a 1-dimensional solution space. Therefore, we can find only one basic eigenvector, and the matrix is
not diagonalizable. ♠
To finish up this chapter, we will consider an application of complex eigenvalues. We will solve a recur-
rence as in Section 8.6. But this time, although the recurrence relation only uses real numbers, complex
numbers will be required to solve it.
f0 = 1,
f1 = 3,
fn+2 = 2 fn+1 − 2 fn , for all n ≥ 0.
Exercises
Exercise 8.11.1 Find the (real or complex) eigenvalues and eigenvectors of the following matrices. Diag-
onalize each matrix if possible.
2 −1 3 1 1 1
2 1 −1 −2
A= , B= , C= 0 1 1 , D = −1 2 0
−1 2 1 1
−1 1 0 1 −1 1
324 Eigenvalues, eigenvectors, and diagonalization
Exercise 8.11.2 I know a certain 2 × 2-matrix A. My matrix has complex eigenvalue λ = 1 + 2i and
real
1
corresponding eigenvector v = .
i
(b) Diagonalize A.
Outcomes
A. Develop the concept of a vector space through axioms.
B. Use the vector space axioms to determine if a set and its operations constitute a vector space.
325
326 Vector spaces
The above definition is concerned about two operations: vector addition, denoted by v + w, and scalar
multiplication, denoted by kv or sometimes k · v. In the law of additive inverses, we have written −u for
(−1)u. Often, the scalars will be real numbers, but it is also possible to use scalars from a different field
K. We also use the term K-vector space to refer to a vector space over a field K. When K = R, we also
speak of a real vector space, and when K = C, we speak of a complex vector space. If the field is clear
from the context, we often don’t mention it at all, and just speak of a “vector space”. The elements of a
vector space are called vectors.
Our first example of a vector space is of course Rn .
Proof. Properties (A1)–(A4) hold by Proposition 2.8, and properties (SM1)–(SM4) hold by Proposi-
tion 2.11. ♠
We now consider some other examples of vector spaces.
Proof. To show that P2 is a vector space, we verify the 8 vector space axioms. Let
p(x) = a2 x2 + a1 x + a0 ,
q(x) = b2 x2 + b1 x + b0 ,
r(x) = c2 x2 + c1 x + c0
(A3) To prove the existence of an additive unit, let 0(x) = 0x2 + 0x + 0, the so-called zero polynomial.
Then
p(x) + 0(x) = (a2 x2 + a1 x + a0 ) + (0x2 + 0x + 0)
= (a2 + 0)x2 + (a1 + 0)x + (a0 + 0)
= a2 x2 + a1 x + a0
= p(x).
k(ℓp(x)) = k(ℓ(a2x2 + a1 x + a0 ))
= k(ℓa2 x2 + ℓa1 x + ℓa0 )
= kℓa2 x2 + kℓa1 x + kℓa0
= (kℓ)(a2x2 + a1 x + a0 )
= (kℓ)p(x).
1p(x) = 1(a2 x2 + a1 x + a0 )
= 1a2 x2 + 1a1 x + 1a0
= a2 x2 + a1 x + a0
= p(x).
Since the operations of addition and scalar multiplication on P2 satisfy the 8 vector space axioms, P2 is a
vector space. ♠
Our next example of a vector space is the set of all n × m-matrices.
Proof. The properties (A1)–(A4) hold by Proposition 4.11, and the properties (SM1)–(SM4) hold by
Proposition 4.14. ♠
We now examine an example of a set that does not satisfy all of the above axioms, and is therefore not a
vector space.
Solution. In order to show that V is not a vector space, it suffices to find one of the 8 axioms that is not
satisfied. We will begin by examining the axioms for addition until one is found which does not hold. In
fact, for this example, the very first axiom fails. Let
1 0 0 0 0 0
A= , B= .
0 0 0 1 0 0
(k f )(x) = k( f (x)).
Proof. To verify that FuncX,K is a vector space, we must prove the 8 axioms of vector spaces. Let f , g, h
be functions in FuncX,K , and let k, ℓ be scalars. Recall that two functions f , g are equal if for all x ∈ X , we
have f (x) = g(x).
and so (k + ℓ) f = k f + ℓ f .
so (kℓ f ) = k(ℓ f ).
(SM4) Finally, we prove the rule for multiplication by one. For all x ∈ X , we have
and therefore 1 f = f .
It follows that FuncX,K satisfies all the required axioms and is a vector space. ♠
For the next two examples of vector spaces, we leave the proofs as an exercise.
(a0 , a1 , a2 , a3 . . .),
where ai ∈ K for all i. We also use the notation (ai )i∈N , or occasionally (ai ), to denote such a
sequence. Let SeqK be the set of sequences of elements of K . We add two sequences by adding
their i th elements:
(ai )i∈N + (bi )i∈N = (ai + bi )i∈N .
We scale a sequence by scaling each of its elements:
We conclude this section by deriving some initial consequences of the vector space axioms.
(b) Additive inverses are unique. In other words, whenever u + v = 0, then v = −u.
Proof. We prove the first three properties, and leave the last one as an exercise. Assume V is any vector
space over a field K.
u + v = u.
Applying the law (A1) (commutative law) to the left-hand side, we have
v + u = u.
(v + u) + (−u) = u + (−u).
Applying the law (A2) (associative law) to the left-hand side, we have
v + (u + (−u)) = u + (−u).
Applying the law (A4) (additive inverse law) to both sides of the equation, we have
v + 0 = 0.
Applying the law (A3) (additive unit law) to the left-hand side, we have
v = 0.
This proves that whenever u + v = u, then v = 0, or in other words, v = 0 is the only element acting
as an additive unit.
u + v = 0.
Applying the law (A1) (commutative law) to the left-hand side, we have
v + u = 0.
332 Vector spaces
Exercises
Exercise 9.1.1 Consider the set R2 with the following non-standard addition operation ⊕:
(a, b) ⊕ (c, d) = (a + d, b + c).
9.1. Definition of vector spaces 333
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Vector addition is defined as usual. Is this a vector space? Explain why or why not.
Scalar multiplication is defined as usual. Is this a vector space? Explain why or why not.
Exercise 9.1.5 Prove that the set SeqK from Example 9.7 is a vector space. Hint: this is a special case of
Example 9.6, if you realize that a sequence (ai )i∈N is the same thing as a function a : N → K.
Exercise 9.1.6 Prove that the set P from Example 9.8 is a vector space.
Exercise 9.1.7 Let V be the set of functions defined on a set X that have values in a vector space W . Is
this a vector space? Explain.
Exercise 9.1.8 Consider the set R2 with the following non-standard operations of addition and scalar
multiplication:
(a, b) ⊕ (c, d) = (a + c − 1, b + d − 1),
k ⊙ (c, d) = (kc + (1 − k), kd + (1 − k)).
Show that R2 is a vector space with these operations. Hint: the zero vector is not (0, 0), but (1, 1).
Exercise 9.1.9 Consider the set R of real numbers. Addition of real numbers is defined in the usual way,
and scalar multiplication is just multiplication of one real number by another. In other words, x + y means
to add the two numbers and xy means to multiply them. Show that R, with these operations, is a real vector
space.
Exercise 9.1.10
√ Let K = Q be the field of rational numbers, and let V be the set of real numbers of the
form a + b 2, where a and b are rational numbers. Show that with the usual operations, V is a Q-vector
space.
Exercise 9.1.11 Let K = Q be the field of rational numbers, and let V = R be the set of real numbers.
Show that with the usual operations, V is a Q-vector space.
334 Vector spaces
Exercise 9.1.12 Let P3 be the set of all polynomials of degree 3 or less. That is, these are of the form
ax3 + bx2 + cx + d. Addition and scalar multiplication of polynomials are defined as usual. Show that P3
is a vector space.
Exercise 9.1.13 Let X = {1, 2, . . . , n}, and consider the space FuncX,R of real-valued functions defined
on X . Explain how FuncX,R can be considered as Rn .
Outcomes
A. Determine if a vector is within a given span.
In this section, we will again explore concepts introduced earlier in terms of Rn and extend them to apply
to abstract vector spaces.
We can now revisit many of the concepts first introduced in Chapter 2 in the context of general vec-
tor spaces. We will look at linear combinations, span, and linear independence in this section, and at
subspaces, bases, and dimension in the next section.
v = a1 u1 + . . . + an un .
or equivalently,
1 3 a+b c−d
= .
−1 2 c+d a−b
This yields a system of four equations in four variables:
a+b = 1,
c+d = −1,
c−d = 3,
a−b = 2.
We can easily solve the system of equations to find the unique solution a = 32 , b = − 12 , c = 1, d = −2.
Therefore
1 3 3 1 0 1 1 0 0 1 0 −1
= − +1 −2 .
−1 2 2 0 1 2 0 −1 1 0 1 0
♠
Solution. Note that q2 (x) = (x + 1)2 = x2 + 2x + 1 and q3 (x) = (x + 2)2 = x2 + 4x + 4. We must find
coefficients a, b, c such that p(x) = aq1 (x) + bq2 (x) + cq3 (x), or equivalently,
Since two polynomials are equal if and only if each corresponding coefficient is equal, this yields a system
of three equations in three variables
a + b + c = 7,
2b + 4c = 4,
b + 4c = −3.
We can easily solve this system of equations and find that the unique solution is a = 52 , b = 7, c = − 52 .
Therefore
5 5
p(x) = q1 (x) + 7 q2 (x) − q3 (x).
2 2
♠
336 Vector spaces
As in Chapter 2, the span of a set of vectors is defined as the set of all of its linear combinations. We
generalize the concept of span to consider spans of arbitrary (possibly finite, possibly infinite) sets of
vectors.
It is important not to misunderstand this definition. Even when the set S is infinite, each individual element
v ∈ span S is a linear combination of only finitely many elements u1 , . . . , uk of S. The definition does not
talk about infinite linear combinations
a1 u1 + a2 u2 + a3 u3 + . . .
Indeed, such infinite sums do not typically exist. However, different elements v, w ∈ span S can be linear
combinations of a different (finite) number of vectors of S. For example, it is possible that v is a linear
combination of 10 elements of S, and w is a linear combination of 100 elements of S.
e0 = (1, 0, 0, 0, 0, . . .),
e1 = (0, 1, 0, 0, 0, . . .),
e2 = (0, 0, 1, 0, 0, . . .),
and so on. Let S = ek | k ∈ N . Which of the following sequences are in span S?
Solution.
(d) The sequence k is not in span S, for the same reason. We would need to add infinitely many se-
quences of the form ek to get a sequence that contains infinitely many non-zero elements. However,
this is not permitted by the definition of span. ♠
Solution. The answer is yes, because we found in Example 9.12 that p(x) = 52 x2 + 7 (x+1)2 − 52 (x+2)2 .
♠
We say that a set of vectors S is a spanning set for V if V = span S.
Since the system has rank 3, it has a solution. Therefore, p(x) ∈ span S. Since p(x) was an arbitrary
element of P2 , it follows that S is a spanning set for P2 . ♠
To define the concept of linear independence in a general vector space, it will be convenient to base our
definition on the “alternative” characterization of Theorem 5.12. Here too, we generalize the definition to
an arbitrary (finite or infinite) set of vectors.
Solution. According to the definition of linear independence, we must solve the equation
If there is a non-trivial solution, the polynomials are linearly dependent. If there is only the trivial solution,
they are linearly independent. We first rearrange the left-hand side to collect equal powers of x:
Since the system has rank 3, there are no free variables. The only solution is a = b = c = 0, and the
polynomials are linearly independent. ♠
e0 = (1, 0, 0, 0, 0, . . .),
e1 = (0, 1, 0, 0, 0, . . .),
e2 = (0, 0, 1, 0, 0, . . .),
and so on. Let S = e0 , e1 , e2 , . . . . This is an infinite subset of SeqK . Show that S is linearly
independent.
Solution. Since S is an infinite set, we have to show that every finite subset of S is linearly independent.
So consider a finite subset n o
k1 k2 kn
e ,e ,...,e ⊆S
and assume that
a1 ek1 + a2 ek2 + . . . + an ekn = 0. (9.1)
We have to show that a1 , . . . , an = 0. Consider some index i ∈ {1, . . . , n}. Then the kith element of a1 ek1 +
. . .+an ekn is equal to ai by the left-hand side of (9.1), but itis also equal to 0 by the right-hand side of (9.1).
It follows that ai = 0 for all i ∈ {1, . . . , n}, and therefore ek1 , ek2 , . . . , ekn is linearly independent. Since
k k
e 1 , e 2 , . . . , ekn was an arbitrary finite subset of S, it follows, by definition, that S is linearly independent.
♠
9.2. Linear combinations, span, and linear independence 339
Notice that this equation has non-trivial solutions, for example a = 2, b = 3 and c = −1. Therefore the
matrices are linearly dependent. ♠
Solution. Assume A sin x + B cos x = 0. Note that this is an equality of functions, which means that it
is true for all x. In particular, substituting x = 0 into the equation, and using the fact that sin 0 = 0 and
cos 0 = 1, we have
0 = A sin 0 + B cos 0 = A · 0 + B · 1 = B,
π
and therefore B = 0. On the other hand, substituting x = 2 into the equation, and using the fact that
sin π2 = 1 and cos π2 = 0, we have
π π
0 = A sin + B cos = A · 1 + B · 0 = A,
2 2
and therefore A = 0. Therefore, the equation A sin x + B cos x = 0 only has the trivial solution A = B = 0,
and it follows that sin x and cos x are linearly independent. ♠
The properties of linear independence that were discussed in Chapter 2 remain true in the general setting
of vector spaces. For example, the first two parts of Proposition 5.16 apply without change. (The third
part specifically mentions Rn , but can be generalized to any vector space of dimension n). We also have
the usual characterization of linear dependence in terms of redundant vectors:
u j = a1 u1 + a2 u2 + . . . + a j−1 u j−1 ,
for some j.
340 Vector spaces
Proof. Suppose that the vectors are linearly dependent. Then the equation b1 u1 + . . . + bk uk = 0 has a
non-trivial solution for some k. In other words, there exist scalars b1 , . . . , bk , not all equal to zero, such that
b1 u1 + . . . + bk uk = 0. Let j be the largest index such that b j 6= 0. Then b1 u1 + . . . + b j u j = 0. Dividing by
b
b j and solving for u j , we have u j = − bb1j u1 − . . . − bj−1
j
u j−1 , so u j can be written as a linear combination
of earlier vectors as claimed. ♠
Solution. A polynomial of degree n cannot be a linear combination of polynomials of degree less than
n. Therefore, none of the polynomials p1 (x), . . ., pk (x) can be written as a linear combination of earlier
polynomials. By Proposition 9.22, p1 (x), . . . , pk (x) are linearly independent. ♠
Theorems 5.18 and 5.19 also remain true in the setting of general vector spaces. The original proofs can be
used without change. Thus, if u1 , . . . , uk are linearly independent, then every vector v ∈ span {u1 , . . . , uk }
can be uniquely written as a linear combination of u1 , . . . , uk . Also, given any finite set of vectors, we can
find a subset of the vectors that is linearly independent and has the same span.
We finish this section with a useful observation about linear independence. Namely, given a linearly
independent set of vectors and one more vector that is not in their span, then we can add the vector to the
set and it will remain linearly independent.
{u1 , . . . , uk , v}
Proof. Assume, on the contrary, that the set were linearly dependent. Then by Proposition 9.22, one of
the vectors can be written as a linear combination of earlier vectors. This vector cannot be one of the ui ,
because u1 , . . . , uk are linearly independent. It also cannot be v, because v ∈
/ span {u1 , . . . , uk }. Therefore,
our assumption cannot be true, and the set is linearly independent. ♠
Exercises
Exercise 9.2.1 Let V be a vector space and suppose {u1 , . . . , uk } is a set of vectors in V . Show that 0 is in
span {u1 , . . . , uk }.
9.2. Linear combinations, span, and linear independence 341
Exercise 9.2.2 Determine whether p(x) = 4x2 − x is in span x2 + x, x2 − 1, − x + 2 .
Exercise 9.2.3 Determine whether p(x) = −x2 + x + 2 is in span x2 + x + 1, 2x2 + x .
Exercise 9.2.4
1 3
(a) Write A = as a linear combination of
0 0
1 0 0 1 1 0 0 1
, , , .
0 1 1 0 1 1 1 1
(b) Show that the above set of four matrices is a spanning set for M2,2 , the vector space of all 2 × 2-
matrices.
Exercise 9.2.5 Let K be a field, and consider the vector space SeqK of infinite sequences of scalars. A
sequence a = (ai )i∈N is called finitely supported if all but finitely many elements of the sequence are zero.
In other words, a is finitely supported if there exists some N ∈ N such that ak = 0 for all k ≥ N. Let
e0 , e1 , e2 , . . . be the sequences from Example 9.14. Show that a ∈ span e0 , e1 , e2 , . . . if and only if a is
finitely supported.
Exercise 9.2.6 For each of the following sets of polynomials, determine whether the set is linearly inde-
pendent. If it is linearly dependent, write one polynomial as a linear combination of the other polynomials
in the set.
(a) x + 1, x2 + 2, x2 − x − 3 .
(b) x2 + x, − 2x2 − 4x − 6, 2x − 2 .
Exercise 9.2.7 Determine whether each of the following sets of matrices is linearly independent. If it is
linearly dependent, write one matrix as a linear combination of the other matrices in the set.
1 2 −7 2 4 0
(a) , , .
0 1 −2 −3 1 2
1 0 0 1 1 0 0 0
(b) , , , .
0 1 0 1 1 0 1 1
is an invertible matrix.
Exercise 9.2.9 Assume u, v, w are linearly independent elements of some vector space V . Consider the
set of vectors
R = 2u − w, w + v, 3v + 12 u .
Determine whether R is linearly independent.
9.3 Subspaces
Outcomes
A. Determine whether a set of vectors is a subspace of a given vector space.
• R3 itself.
9.3. Subspaces 343
It follows that W contains 0, and is closed under addition and scalar multiplication. Therefore, W is a
subspace of V . ♠
1. The zero function, defined by f (x) = 0 for all x, is differentiable. In fact, its derivative is f ′ (x) = 0.
It follows that W contains 0, and is closed under addition and scalar multiplication. Therefore, W is a
subspace of V . ♠
In other words, W is the set of all sequences satisfying the recurrence relation an+2 = an + an+1 .
Then W is a subspace of V .
Proof. Before we prove that W is a subspace, let us first consider an example. The following sequences
are elements of W , because they both satisfy the recurrence:
which again satisfies the recurrence. Therefore, the set W is closed under the addition of these particular
sequences a and b. We now prove the properties in general.
1. Let z be the zero sequence, defined by zn = 0 for all n. Then z satisfies the recurrence relation, since
for all n ≥ 0, zn+2 = 0 = zn + zn+1 . Therefore z ∈ W .
2. To show that W is closed under addition, consider any two sequences a, b ∈ W , and let c = a + b.
Then for all n ≥ 0,
cn+2 = an+2 + bn+2 = (an + an+1 ) + (bn + bn+1 ) = (an + bn ) + (an+1 + bn+1 ) = cn + cn+1 ,
so c satisfies the recurrence. It follows that c ∈ W , and therefore W is closed under addition.
3. To show that W is closed under scalar multiplication, consider any k ∈ R and a ∈ W , and let c = ka.
Then for all n ≥ 0,
so c satisfies the recurrence. It follows that c ∈ W , and therefore W is closed under scalar multipli-
cation.
f ′′ = − f
is a differential equation. The functions f (x) = sin x, f (x) = cos x, and f (x) = 0 are examples of
solutions of this differential equation. Let W be the set of all functions that are solutions of the
differential equation f ′′ = − f . Then W is a subspace of V .
Proof.
1. The zero function f (x) = 0 is a solution of the differential equation, and therefore an element of W .
2. To show that W is closed under addition, let f , g ∈ W and consider h = f + g. Then f ′′ = − f and
g′′ = −g, and therefore h′′ = f ′′ + g′′ = − f + (−g) = −h. Therefore, h ∈ W , and W is closed under
addition.
3. To show that W is closed under scalar multiplication, let k ∈ R and f ∈ W , and consider h = k f .
Then f ′′ = − f , and therefore h′′ = k f ′′ = k(− f ) = −h. It follows that h ∈ W , and therefore W is
closed under scalar multiplication.
9.3. Subspaces 345
W = {p ∈ P2 | p(r) = 0} .
2. To show that W is closed under addition, assume p, q ∈ W , and let s = p + q. Then p(r) = 0 and
q(r) = 0, therefore s(r) = p(r) + q(r) = 0. It follows that s ∈ W .
3. To show that W is closed under scalar multiplication, assume p(x) ∈ W and k be a scalar. Then
(kp)(r) = k(p(r)) = k0 = 0, and therefore kp ∈ W .
Proof. Clearly {0} contains 0, and is closed under addition and scalar multiplication because 0 + 0 = 0
and k0 = 0 for all k. Similarly, V contains 0 and is closed under addition and scalar multiplication, because
addition and scalar multiplication are operations on V . Therefore, both {0} and V are subspaces of V . ♠
The interest of subspaces lies in the fact that they are vector spaces in their own right, as stated in the
following proposition.
Proof. Since W is a subspace, it is closed under addition and scalar multiplication. This ensures that
addition and scalar multiplication are well-defined operations on W . The axioms (A1), (A2), (A4), and
(SM1)–(SM4) all obviously hold in W , because they hold in V (two elements of W are equal in W if and
only if they are equal in V ). The axiom (A3) holds because 0 ∈ W . ♠
346 Vector spaces
Proof.
(a) To show that span S is a subspace, first note that 0 ∈ span S, because 0 is the empty linear combi-
nation. Also, if v, u ∈ span S, then by definition of span, there exist v1 , . . . , vk , u1 , . . . , uℓ ∈ S and
a1 , . . . , ak , b1 , . . . , bℓ ∈ K such that
v = a1 v1 + . . . + ak vk ,
u = b1 u 1 + . . . + bℓ u ℓ .
Then
v + u = a1 v1 + . . . + ak vk + b1 u1 + . . . + bℓ u,
and therefore v + u ∈ span S. It follows that span S is closed under addition. The proof for scalar
multiplication is similar. Finally, every v ∈ S is trivially a linear combination of itself, v = 1v, and
therefore S ⊆ span S.
(b) Consider any other subspace W of V such that S ⊆ W . To show that span S ⊆ W , consider an
arbitrary element v ∈ span S. By definition of span, there exist v1 , . . . , vk ∈ S and a1 , . . . , ak ∈ K such
that v = a1 v1 + . . . + ak vk . By assumption, v1 , . . . , vk ∈ W . Since W is closed under addition and
scalar multiplication, it follows that v ∈ W . Since v was an arbitrary element of span S, it follows
that span S ⊆ W .
♠
While the last proposition looks technical, it can actually be useful for proving that two sets of vectors
span the same subspace. The following in an example of this.
Solution. To show that two sets are equal, we must show that each is a subset of the other. So we will
show span S ⊆ span T and span T ⊆ span S. By Proposition 9.34, it is sufficient to show that S ⊆ span T
and T ⊆ span S, i.e., we must show that every element of S is a linear combination of elements of T and
vice versa.
9.3. Subspaces 347
1. S ⊆ span T . We have
Since each element of S is an element of span T , it follows that S ⊆ span T . By Proposition 9.34,
this implies that span S ⊆ span T .
2. T ⊆ span S. We have
Since each element of T is an element of span S, it follows that T ⊆ span S. By Proposition 9.34,
this implies that span T ⊆ span S.
Exercises
Exercise 9.3.1 Consider the set of symmetric n × n-matrices, i.e., matrices satisfying A = AT . Show that
this set of symmetric matrices is a subspace of Mn,n , the vector space of n × n-matrices.
Exercise 9.3.2 Consider the set of all vectors xy ∈ R2 such that x + y ≥ 0. Is this a subspace of R2 ?
Exercise 9.3.3 Consider the set of all vectors xy ∈ R2 such that xy = 0. Is this a subspace of R2 ?
Exercise 9.3.4 Let V be the set of those polynomials ax2 + bx + c ∈ P2 such that a + b + c = 0. Is V a
subspace of P2 ? Explain.
Exercise 9.3.5 Let U ,W be subspaces of a vector space V and consider U + W defined as the set of all
vectors that can be written of the form u + w, where u ∈ U and w ∈ W . Show that U +W is a subspace of
V.
Exercise 9.3.6 Let U ,W be subspaces of a vector space V . Then U ∩W consists of all vectors which are
in both U and W . Show that U ∩W is a subspace of V .
Exercise 9.3.7 Let U ,W be subspaces of a vector space V . Then U ∪W consists of all vectors which are
in either U or W . Show that U ∪W is not necessarily a subspace of V by giving an example where U ∪W
fails to be a subspace.
n o
Exercise 9.3.8 Let U = [x, y, z]T ∈ R3 |x| ≤ 4 . Is U a subspace of R3 ?
348 Vector spaces
Exercise 9.3.13 Let W be the subset of SeqR consisting of all sequences that are alternating, i.e., where
ai ≥ 0 for even i and ai ≤ 0 for odd i, or vice versa. Is W a subspace of SeqR ?
Exercise 9.3.14 Let W be the subset of SeqK consisting of all sequences that satisfy the recurrence relation
an+3 = an+2 + 2an+1 − an . Is W a subspace of SeqK ?
Exercise 9.3.15 A sequence (ai )i∈N is called periodic if there exists some k > 0 such that for all i, ai+k = ai .
The number k is called a period of the sequence. For example, the following is a periodic sequence with
period 3:
(1, 5, −7, 1, 5, −7, 1, 5, −7, 1, 5, −7, 1, . . .).
(a) Show that the set of all periodic sequences of a fixed period k forms a subspace of SeqK .
(b) More difficult: show that the set of all periodic sequences of all periods forms a subspace of SeqK .
Exercise 9.3.16 A function f : R → R is called symmetric if f (x) = f (−x) for all x ∈ R. Let W be the
subset of FuncR,R consisting of all symmetric functions. Is W a subspace of FuncR,R ?
Exercise 9.3.17 A function f : R → R is said to vanish at infinity if limx→∞ f (x) = 0 and limx→−∞ f (x) =
0. Let W be the subset of FuncR,R consisting of all functions that vanish at infinity. Prove that W is a
subspace of FuncR,R .
Exercise 9.3.18 Show that the sets S = x + 2, (x + 2)2 and T = x2 − 4, x2 + x − 2 span the same
subspace of P2 .
nh i h io nh i h io
Exercise 9.3.19 Show that the sets S = 1 0 0 1 and T = 1 1 1 1
0 1 , 1 0 1 −1 , −1 1 span the
same subspace of M2,2 .
9.4. Basis and dimension 349
Outcomes
A. Find a basis of a given vector space.
2. B is linearly independent.
Unlike Rn , a vector space like P2 does not necessarily have a “standard” basis. One basis might be
useful for one application, and another basis for a different application.
Proof. It is easy to verify that each set of vectors is linearly independent and spanning. See Examples 9.16,
9.18, and 9.23 for similar calculations. ♠
Proof. The polynomials 1, x, x2 , x3 , x4 , . . . are linearly independent by Proposition 9.22. Namely, if they
were linearly dependent, then one of the polynomials could be written as a linear combination of earlier
350 Vector spaces
ones. However, this is not possible because a polynomial of degree n cannot be a linear combination of
polynomials of degree less than n.
To show that the polynomials 1, x, x2 , x3 , x4 , . . . are a spanning set, consider an arbitrary element p(x)
of P. Then by definition, p(x) is of the form
p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0 ,
forsome n ≥ 0 and a0 , . . . , an ∈ K. But then p(x) is a linear combination of 1, . . . , xn , i.e., it is in the span
of 1, x, x2 , x3 , x4 , . . . . ♠
e0 = (1, 0, 0, 0, 0, . . .),
e1 = (0, 1, 0, 0, 0, . . .),
e2 = (0, 0, 1, 0, 0, . . .),
The following theorem ensures that every vector space has a basis. We will not prove this theorem, because
when the spaces are infinite-dimensional, the proof uses mathematics that is beyond the scope of this book.
The proof uses a reasoning principle called the axiom of choice, which allows us to prove the existence
of a basis even in cases where we cannot find an actual concrete example of a basis. For example, it is
not possible to give a specific example of a basis for the space SeqK , even though the following theorem
guarantees that such a basis exists.
The Exchange Lemma, which we proved in the context of Rn in Section 5.4, is true in general vector
spaces.
The proof is exactly the same as that of Lemma 5.37, so we do not repeat it here. As in Section 5.4, an
important consequence of the Exchange Lemma is that any two bases of a vector space have the same size.
Proof. We first show that B1 and B2 are either both finite or both infinite. Assume one of them, say B1 ,
is finite and contains s vectors. Since B1 is spanning and B2 is linearly independent, it follows from the
Exchange Lemma that B2 cannot contain more than s vectors, and in particular, B2 must be finite. So the
sets are either both finite or both infinite. If they are both finite, say of size s and r, then by the Exchange
Lemma, we have s ≤ r and r ≤ s, hence r = s. ♠
This allows us to define the dimension of a vector space.
Note that the dimension is well-defined by Theorems 9.40 and 9.42, since these theorems ensure that
every vector space has a basis (and therefore a dimension), and that any two bases are of the same size
(and therefore a vector space cannot have more than one dimension).
We now calculate the dimensions of some vector spaces we encountered in Sections 9.1 and 9.3.
• The space P2 has dimension 3. We found several bases for this space in Example 9.37.
• The space Mm,n has dimension mn. A possible basis consists of all the matrices that contain a single
1 and zeros everywhere else.
• The space SeqK is infinite-dimensional. We found an infinite linearly independent set in Exam-
ple 9.19, showing that the space cannot be finite-dimensional.
• The space P is infinite-dimensional. We found a basis for this space in Example 9.38.
• The subspace
of Func R,R consisting of the differentiable functions is infinite-dimensional. Again,
the set 1, x, x , x3 , . . . is an infinite linearly independent set in this space.
2
352 Vector spaces
Solution. The space is 2-dimensional. The easiest way to see this is to observe that a sequence a ∈ W is
determined by its first two elements. We can say that the first two elements of the sequence are parameters,
and all the other elements are then computed by the recurrence relation. Specifically, suppose a0 = x and
a1 = y. Using the recurrence relation to compute the remaining elements, we have
Since this is the general form of the elements of W , and since the two sequences starting with 1,0 and 0,1
are clearly linearly independent, it follows that
is a basis of W . ♠
Solution. From calculus, we know that the general solution of the differential equation f ′′ = − f is
where A, B are constants. We also know, from Example 9.21, that sin x and cos x are linearly independent.
It follows that {sin x, cos x} is a basis for the solution space. The solution space is therefore 2-dimensional.
♠
We conclude this section by stating two properties of bases that generalize Theorem 5.19 and Lemma 5.44:
every linearly independent set can be extended to a basis by adding 0 or more vectors, and every spanning
set can be reduced to a basis by removing 0 or more vectors.
Solution. We can obtain a basis of M22 by adding two more linearly independent matrices
0 0 0 0
and .
1 0 0 1
Shrink S to a basis of P2 .
Solution. We use a version of the casting-out method. We examine each element S from left to right and
cast out the elements that are linear combinations of previous elements. Clearly the first two elements, 1
and x, are linearly independent. The next element, 2x + 1, is redundant because it is a linear combination
of 1 and x. The next element x2 +1 is linearly independent of 1 and x. The final element x2 +2 is redundant
because it is a linear combination of 1 and x2 + 1. Therefore, the following subset of S is a basis of P2 :
B = 1, x, x2 + 1 .
Exercises
Exercise 9.4.1 Let P3 be the vector space of polynomials of degree at most 3. Determine which of the
following are bases for this vector space.
3
(a) x + 1, x2 + x, 2x3 + x2 , 2x3 − x2 − 3x + 1 .
354 Vector spaces
(b) x + 1, x3 + x2 + 2x, x2 + x, x3 + x2 + x .
Exercise 9.4.2 Determine whether the following is a basis for P2 , the vector space of polynomials of
degree at most 2. 2
x + x + 1, 2x2 + 2x + 1, x + 1 .
Exercise 9.4.5 Extend the following linearly independent set of polynomials to a basis of P3 :
3
x + x2 − x − 1, 3x3 + 2x2 + 2x − 1 .
Exercise 9.4.6 Let V be a 5-dimensional vector space. If you have 5 linearly independent vectors in V ,
can you conclude that the vectors span V ?
Exercise 9.4.7 Let V be a 5-dimensional vector space. If you have 6 vectors in V , is it possible that they
are linearly independent? Explain.
Exercise 9.4.8 Find a basis for the vector space of symmetric 3 × 3-matrices, i.e., matrices satisfying
A = AT . What is the dimension of this space?
Exercise 9.4.9 Let W be the subspace of P3 (over the field R) consisting of all polynomials p(x) that
satisfy p(3) = 0. Find a basis for W . What is the dimension of W ?
n h i h i o
Exercise 9.4.10 Find a basis for U = A ∈ M2,2 A 11 −10 = 10 −11 A . What is the dimension of
U?
Exercise 9.4.11
(a) Let k be a positive integer, let Wk ⊆ SeqK be the subspace consisting of all sequences that are
periodic with period k (see Exercise 9.3.15). Find a basis for Wk . What is its dimension?
(b) More difficult: find a basis for the infinite-dimensional vector space consisting of all periodic se-
quences of all periods.
√
Exercise 9.4.12 Let K = Q, the field of rational numbers. Consider vectors of the form a + b 2 where
a, b are rational numbers. Show that this collection of vectors is a vector space over Q and give a basis
for this vector space. What is its dimension?
9.5. Application: Error correcting codes 355
Outcomes
A. Determine the block length, message length, Hamming distance, and rate of a code.
Binary codes
When transmitting or storing information on digital media, the information is usually encoded as a se-
quence of bits, i.e., 0s and 1s. For example, in the ASCII code, each symbol is encoded as a sequence of
8 bits. The letter “A” is encoded as 01000001, the letter “B” is encoded as 01000010, and so on. We can
think of a sequence of n bits as an n-dimensional column vector over the field Z2 , i.e., as an element of
Zn2 . For our purposes, it is convenient to continue writing bit sequences horizontally, but we will consider
this to be merely an alternate notation for a column vector.
One issue with digital data is that the data can sometimes be corrupted. DVDs may be scratched,
magnetic storage may depolarize, and data sent by radio transmission may be subject to interference. This
can result in errors in the data, such as some 0s being changed to 1s or vice versa. One of the ways to deal
with such errors is to introduce redundancy in the way the data is encoded.
A very simple example of redundant encoding is the so-called 3-repetition code. This simply means
to repeat each bit 3 times. Thus, the bit 0 is encoded as 000 and the bit 1 as 111. For example, the bit
string 100101 is encoded as 111000000111000111. With this encoding, single bit errors are easy to detect
and correct. Assuming that at most one error occurs within each 3-bit block, the blocks can be decoded by
“majority decision”. Namely, if the bits in a block are not the same, we assume that the error occurred in
the bit that is in the minority. The following table shows the decoding scheme:
(c) What is the rate of this code? Is it better or worse than the rate of the 3-repetition code?
Solution. (a) We divide the message into blocks of length 2: 01 11 10. Then we encode each block
separately: 00111 11011 11100. (b) We divide the message into blocks of length 5: 11100 00111 11011.
9.5. Application: Error correcting codes 357
Then we decode each block separately: 10 01 11. (c) The rate is 25 = 0.4. It is slightly higher, and therefore
better, than the rate of the 3-repetition code, which is 13 ≈ 0.33. ♠
The error correction capabilities of a code depend on a property called the Hamming distance of the code,
which we now define.
• Let v be a vector in Zn2 . The Hamming weight of v, denoted W (v), is the number of compo-
nents of v that are equal to 1.
• Let v, w be two vectors in Zn2 . The Hamming distance between v and w, denoted D(v, w), is
the number of components where v and w differ. We can also express this as the Hamming
weight of v − w, i.e., D(v, w) = W (v − w).
• Finally, we say that the Hamming distance of a code is equal to the smallest Hamming
distance between any two code blocks.
A code with block length n, message length k, and Hamming distance d is also called an (n, k, d)-
code.
Solution. The vectors 00111 and 11100 differ in 4 places, so their Hamming distance is D(00111, 11100) =
4. This is also equal to the Hamming weight of 00111 − 11100 = 11011.
The 3-repetition code has only two code blocks: 000 and 111. Since their Hamming distance is 3, the
Hamming distance of the code is also 3.
To calculate the Hamming distance of the code from Example 9.52, we calculate the Hamming distance
between all pairs of code blocks:
D(00000, 00111) = 3,
D(00000, 11100) = 3,
D(00000, 11011) = 4,
D(00111, 11100) = 4,
D(00111, 11011) = 3,
D(11100, 11011) = 3.
Since the smallest distance between any two code blocks is 3, the Hamming distance of the code is 3. ♠
The significance of a code’s Hamming distance is explained by the following definition and proposition.
358 Vector spaces
• We say that a code C is m-error correcting if for all valid code blocks v, whenever w is
obtained from v by introducing up to m bit errors, then v is the only valid code block within
Hamming distance m of w.
• m-error detecting if m ≤ d − 1;
• m-error correcting if 2m ≤ d − 1.
Proof. Assume up to m errors have happened. In other words, let v be a valid code block, and let w be
the code block obtained from v by introducing up to m errors. To prove the first claim, assume m ≤ d − 1.
Then D(v, w) ≤ m ≤ d − 1. Since d is the minimum distance between any two valid code blocks, w cannot
be a valid code block. Hence, the errors can be detected. To prove the second claim, assume 2m ≤ d − 1.
Then D(v, w) ≤ m. Assume that there exists another valid code block u within Hamming distance m of w,
i.e., assume D(w, u) ≤ m. Then
Therefore, the Hamming distance of v and u is at most d − 1, contradicting the assumption that the code
has Hamming distance d. Hence, there is no such code block u, and the errors can be corrected. ♠
Solution. By Proposition 9.56, a code with Hamming distance 3 can detect up to 2 errors and correct up
to 1 error. The answers for other Hamming distances are summarized in the following table:
♠
9.5. Application: Error correcting codes 359
Solution. Since the code of Example 9.52 has Hamming distance 3, it can correct up to 1 error per code
block, so we are able to decode the message uniquely. For each code block, we must find the unique valid
code block that is within Hamming distance 1 or less.
Received Error Corrected Decoded
11101 bit 5 11100 10
01000 bit 2 00000 00
11011 none 11011 11
11111 bit 3 11011 11
00011 bit 3 00111 01
The decoded message is 10 00 11 11 01. ♠
We can say that a “good” error correcting code is one that has a high rate (of message length divided by
block length) and a large Hamming distance.
Linear codes
To construct a code of message length n, we need to specify a set of 2n code blocks. If n is large, it is
not really feasible to write down a list of all the code blocks, and to check all their Hamming distances by
hand. For example, a code of block length n = 10 requires 210 = 1024 code blocks, and we need to check
more than half a million Hamming distances. Instead, we will focus on a particular class of codes that is
much easier to describe. These are the linear codes.
The advantage of a linear code is that to specify the code blocks, we only need to list k basis elements,
rather than all 2k elements of the code.
Solution. Consider all linear combinations of 11100 and 00111 (with scalars in Z2 ):
0(11100) + 0(00111) = 00000,
0(11100) + 1(00111) = 00111,
1(11100) + 0(00111) = 11100,
1(11100) + 1(00111) = 11011.
360 Vector spaces
This shows that the code of Example 9.52 is span {11100, 00111}, and hence a subspace of Z52 . A basis
for the code is {11100, 00111} ♠
From Section 5.5, we know that every subspace of Zn2 , and therefore every linear code, is the column space
of some matrix G, and also the null space of some matrix H. Such matrices are called a generator matrix
and a check matrix for the code, respectively.
• A generator matrix for the code is an n × k-matrix G such that C is the column space of G.
• A check matrix for the code is an (n − k) × n-matrix H such that C is the null space of H .
Solution. We already found in Example 9.60 that {11100, 00111} is a basis for this code. We can obtain
a generator matrix by using the basis vectors as columns:
1 0
1 0
G= 1 1 .
0 1
0 1
To find a check matrix, assume that [a, b, c, d, e] is a row of the check matrix. Then for every code block
v ∈ C, we must have [a, b, c, d, e] v = 0. Since the code blocks are spanned by 11100 and 00111, it suffices
to consider the two equations
1
1
a b c d e
1 = 1a + 1b + 1c + 0d + 0e = 0
0
0
and
0
0
a b c d e
1 = 0a + 0b + 1c + 1d + 1e = 0.
1
1
Solving this system of equations, we find that the following is a basis for the solution space:
1 1 0 0 0 , 1 0 1 1 0 , 1 0 1 0 1
9.5. Application: Error correcting codes 361
We can use these basic solutions as the rows of the check matrix:
1 1 0 0 0
H= 1 0 1 1 0
1 0 1 0 1 ♠
Let C be a linear code with generator matrix G. Since C is the column space of the generator matrix, there
is a simple encoding method: we can simply multiply the generator matrix G by a message block to obtain
the corresponding code block. In other words, if u ∈ Zk2 is a message block, we can use v = Gu ∈ Zn2 as
the code block.
h i h i h i
Solution. The message blocks are u1 = 01 , u2 = 11 , and u3 = 10 . We obtain the corresponding
code blocks by multiplication with the generator matrix:
0 1 1
0 1 1
v1 = Gu1 =
1 v2 = Gu2 = 0 v3 = Gu3 = 1
1 1 0
1 1 0
Therefore the encoded message is 00111 11011 11100. Note that this is the same answer as in Exam-
ple 9.52(a). ♠
Let C be a linear code with check matrix H. By definition, C is the null space of the check matrix. This
means that a vector v ∈ Zn2 is a valid code block (i.e., a code block without errors) if and only if Hv = 0.
So we can easily use the check matrix to determine whether a code block contains errors or not.
However, something even better is true. The value of Hv, when it is not zero, tell us not only that
an error has occurred, but also which error has occurred! To see why, consider a code block v′ possibly
containing errors. Then v′ = v + e, where v is a valid code block and e is an error pattern. The error
pattern is the vector e that has a 1 in every component in which an error occurred, and a 0 everywhere else.
Then we have
Hv′ = H(v + e) = Hv + He = 0 + He = He.
362 Vector spaces
Therefore, the value of Hv′ only depends on the error pattern, and not on v. The value s = Hv’ = He
is called the syndrome of the error. A code is error correcting, for a class of error patterns, if and only
if each such error pattern has a different syndrome. In that case, we can simply make a table of all the
syndromes and corresponding error patterns, as an efficient method for correcting errors. Such a table is
called a syndrome table and the corresponding decoding method is called syndrome decoding.
Make a syndrome table for all single-bit errors. Then use your syndrome table to decode the mes-
sage 11101 01000 11011 11111 00011.
Solution. Since we are only interested in single-bit errors, there are six error patterns to consider: 00000
(no error), 10000, 01000, 00100, 00010, and 00001. For each of these error patterns e, we compute the
corresponding syndrome He. For example,
0 1 0
0 0 0 1 1 1
H 0 = 0 , H 0 = 1 , H 0 = 0 ,
0 0 0 1 0 0
0 0 0
and so on. We put this information in a table (writing the vectors horizontally as usual in this section).
This is the syndrome table:
Error pattern e Syndrome He
00000 000
10000 111
01000 100
00100 011
00010 010
00001 001
To decode the message 11101 01000 11011 11111 00011, we first calculate the syndrome of each code
block by multiplying it by the check matrix. The syndrome table then tells us the corresponding error
pattern, which we can use to correct the code block.
For example, consider the first code block 11101. Multiplying by H, we get the syndrome 001:
1
1 0
H 1 = 0
0 1
1
9.5. Application: Error correcting codes 363
The syndrome table then tells us that the corresponding error pattern is 00001. Therefore, the corrected
code block is 11101 + 00001 = 11100. The decoded block is 10. We proceed in the same way for all the
code blocks:
Received code block v′ Syndrome Hv′ Error pattern e Corrected code block v = v′ + e Decoded
11101 001 00001 11100 10
01000 100 01000 00000 00
11011 000 00000 11011 11
11111 011 00100 11011 11
00011 011 00100 00111 01
Therefore, the decoded message is 10 00 11 11 01. Note that this is the same answer we got in Exam-
ple 9.58. But the syndrome table gives a more systematic method of finding the error patterns, which we
previously had to do by guessing, or by comparing to all possible code blocks. ♠
Since the syndrome 101 is not in our syndrome table, it is not the syndrome of any single-bit error. There-
fore, the code block 10010 must contain more than one error. Since our code is only 1-error correcting,
this error cannot be corrected. (In fact, there are two valid code words within Hamming distance 2, namely
00000 and 11011). ♠
Hamming codes
Proof. The syndrome of the zero error pattern is always H0 = 0. Let ei be the error pattern containing a
single bit error in the i th component. Then its syndrome is Hei , which is the i th column of H.
If H has distinct, non-zero columns, then each possible single bit error has a different syndrome.
Therefore, the errors can be corrected.
364 Vector spaces
Conversely, if H has a column that is zero, or two columns that are equal, then the corresponding error
patterns have the same syndrome, and therefore cannot be corrected. ♠
This theorem was discovered by Richard Hamming in 1950. Hamming then realized that the best single-
error correcting binary codes can be constructed by letting the check matrix have all possible non-zero
columns. The resulting codes are called Hamming codes.
From the size of the check matrix, we know that the code length is n = 2r − 1. Since the check matrix
has rank r, its null space has dimension k = n − r. Thus, the r th Hamming code is an (n, k, 3)-code, where
n = 2r − 1 and k = 2r − 1 − r.
It is customary to order the columns of the check matrix so that the last r columns are the standard
basis vectors. In other words, the check matrix is usually taken to be of the form
H= A I
where I is the r × r-identity matrix and A is the r × k-matrix consisting of the remaining columns of H. In
that case, we can take the n × k-matrix
I
G=
A
as the generator matrix, where I is the k × k-identity matrix. Specifically, for this choice of G, we have
HG = 0, ensuring that the column space of G is contained in the null space of H. Moreover, since
rank G = k, the column space of G is k-dimensional, and therefore equal to the null space of H, so that G
is indeed a correct generator matrix for the code.
Solution. The check matrix must be a 3 × 7-matrix whose columns are all the possible non-zero vectors
of length 3. Moreover, we will follow the convention of using the standard basis vectors as the last 3
columns. Such a matrix is
1 1 0 1 1 0 0
H = 1 0 1 1 0 1 0 .
0 1 1 1 0 0 1
9.5. Application: Error correcting codes 365
To encode the message 0101 1110 0111, we multiply the generator matrix by each code block:
0 1 0
1 1 1
0 1 0
0 1
1 1 1 1
G
0 = 1 , G 1 = 0 , G 1 = 1 .
0 0 0
1 0 1
1 0 0
0 0 1
Thus, the encoded message is 0101010 1110000 0111001. Note that the fact that the top part of the
generator matrix G is the identity matrix has the pleasant effect that the k first bits of each code block are
the corresponding message block. This makes decoding especially convenient (after any errors have been
corrected first). ♠
Solution. We do not need to make a syndrome table, because the syndromes are exactly the columns of the
check matrix H. More precisely, the i th column of the check matrix is the syndrome of the error pattern
containing a single-bit error in the i th bit. We use the check matrix
1 1 0 1 1 0 0
H = 1 0 1 1 0 1 0
0 1 1 1 0 0 1
From these syndromes, we can read off the error locations and correct the errors:
Received code block Syndrome Error position Error pattern Corrected code block Decoded
1011101 111 bit 4 0001000 1010101 1010
1101001 101 bit 2 0100000 1001001 1001
0001101 010 bit 6 0000010 0001111 0001
1110000 000 none 0000000 1110000 1110
Solution. The check matrix and generator matrix for the Hamming code with r = 2 are
1
1 1 0
H=
and G = 1 .
1 0 1
1
The code has block size n = 3 and message length k = 1. The encoding function is
0 1
G 0 = 0 , G 1 = 1 .
0 1
Of course, since the codes are only 1-error correcting, one cannot increase the block size indefinitely,
or else the probability of having two or more errors in a block becomes too large. There exist more
sophisticated error correcting codes with larger Hamming distances, which can correct many errors per
code block. You might learn more about such codes in a course on applied modern algebra.
9.5. Application: Error correcting codes 367
Exercises
Exercise 9.5.1 Encode the message 0110 using the 3-repetition code. Decode the message 100 011 000
110 101 110 after correcting single-bit errors.
Exercise 9.5.2 Use the code of Example 9.52 to encode the message 01 11 10 00 01. Use the syndrome
method to correct all single-bit errors in the message 10100 00001 11001. What is the decoded message?
Is this a linear code? What are the message length and block length of this code? What is its Hamming
distance? How many errors per code block can this code detect? How many can it correct?
Find a generator matrix for this code (hint: the columns of G should form a basis for the null space of H).
List all possible code blocks. What is the Hamming distance of the code? Make a syndrome table for this
code.
to encode the message 111 110 101 001. Find a check matrix for this code and make a syndrome ta-
ble. Is the code 1-error correcting? Use the syndrome method to correct and decode the message
111000 101111 100111 110100.
Exercise 9.5.6 Construct check and generator matrices of a Hamming code for r = 4. What is the block
length and message length of this code? Encode the message 00110011000 10100000001. Decode the
message 001000010010000 010000001000001 after correcting single-bit errors.
10. Linear transformation of vector spaces
Outcomes
A. Understand the definition of a linear transformation in the context of vector spaces.
2. T preserves scalar multiplication, i.e, for all v ∈ V and k ∈ K , we have T (kv) = kT (v).
A linear transformation is also sometimes called a linear function or a linear map. A linear
transformation T : V → V (i.e., when V = W ) is also sometimes called an operator.
Our first example of a linear transformation is a matrix transformation. We have already seen this in
Section 6.2.
There are many interesting examples of linear transformations on vector spaces other than Rn . We will
consider a few such examples.
369
370 Linear transformation of vector spaces
Solution.
(b) First, we note that if p(x) is a polynomial of degree at most n, then its derivative p′ (x) is a polynomial
of degree at most n − 1. Therefore, the derivative operator D is a well-defined function from Pn to
Pn−1 . To show that it preserves addition, consider any two polynomials p(x), q(x) ∈ Pn . From
calculus, we know that the derivative of p(x) + q(x) is p′ (x) + q′ (x). Therefore,
D(p(x) + q(x)) = (p(x) + q(x))′ = p′ (x) + q′ (x) = D(p(x)) + D(q(x)),
and so D preserves addition. To show that it preserves scalar multiplication, consider p(x) ∈ Pn and
k ∈ R. From calculus, we know that the derivative of kp(x) is kp′ (x), and therefore
D(kp(x)) = (kp(x))′ = kp′ (x) = kD(p(x)).
Hence, D preserves scalar multiplication. It follows that D is a linear transformation.
♠
It is important to understand that we are not claiming that the derivative p′ (x) of a polynomial p(x) is a
linear function. It is of course a polynomial. Rather, what the above example shows is that the act of
taking the derivative is a linear operation, i.e., the derivative of a sum is the sum of the derivatives, and the
derivative of a constant times a function is a constant times the derivative.
Solution. Every element of P3 is of the form p(x) = ax3 + bx2 + cx + d. We can write the equation
p(x) = x3 + D(p(x)) as
(ax3 + bx2 + cx + d) = x3 + (3ax2 + 2bx + c).
10.1. Definition and examples 371
For the left-hand side and right-hand side to be equal, we must have a = 1, b = 3a, c = 2b, and d = c. This
yields the unique solution (a, b, c, d) = (1, 3, 6, 6), or p(x) = x3 + 3x2 + 6x + 6. ♠
The function unshift : SeqK → SeqK is defined by shifting the entire sequence to the right and
adding 0 as the new first element:
Solution.
(a) We have
shift(1, 2, 3, 4, . . .) = (2, 3, 4, 5, . . .),
unshift(shift(1, 1, 1, 1, . . .)) = unshift(1, 1, 1, 1, . . .) = (0, 1, 1, 1, . . .),
shift(unshift(1, 1, 1, 1, . . .)) = shift(0, 1, 1, 1, . . .) = (1, 1, 1, 1, . . .).
(b) To show that shift is a linear transformation, we show that it preserves addition and scalar multipli-
cation. Let a = (a0 , a1 , a2 , . . .) and b = (b0 , b1 , b2 , . . .). Then
shift(a + b) = shift(a0 + b0 , a1 + b1 , a2 + b2 , . . .)
= (a1 + b1 , a2 + b2 , a3 + b3 , . . .)
= (a1 , a2 , a3 , . . .) + (b1 , b2 , b3 , . . .)
= shift(a) + shift(b),
shift(ka) = shift(ka0 , ka1 , ka2 , . . .)
= (ka1 , ka2 , ka3 , . . .)
= k(a1 , a2 , a3 , . . .)
= k shift(a).
Therefore, shift is a linear transformation. The proof for unshift is similar.
shift(shift(a)) = shift(a) + a.
a2 = a1 + a0 ,
a3 = a2 + a1 ,
a4 = a3 + a2 ,
and so on. In other words, a is a solution if and only if an+2 = an+1 + an holds for all n ≥ 0. This is nothing
but the recurrence relation of Example 9.29. We already calculated the general solution in Example 9.44.
The general solution is
a = (x, y, x + y, x + 2y, 2x + 3y, 3x + 5y, . . .),
and a basis for the solution space is
♠
We conclude this section by stating some elementary properties of linear transformations. “Elementary”
means that these properties follow directly from the definition, i.e., from the fact that linear transformations
preserve addition and scalar multiplication.
10.1. Definition and examples 373
Proof. To prove the first property, let k = 0 in the equation T (kv) = kT (v). Since 0v = 0 and 0T (v) = 0
by Proposition 9.9, we therefore have T (0) = 0. Similarly, to prove the second property, let k = −1 in the
equation T (kv) = kT (v). Finally, the third property is a direct consequence of the fact that T preserves
addition and scalar multiplication:
Exercises
Exercise 10.1.1 Consider the following functions T : R3 → R2 . For each of these functions T , show that
it is not linear either by showing that T does not preserve addition, or by showing that it does not preserve
scalar multiplication, or by showing that it does not preserve the zero vector.
x
x + 2y + 3z + 1
(a) T y = .
2y − 3x + z
z
x
x + 2y2 + 3z
(b) T y = .
2y + 3x + z
z
x
sin x + 2y + 3z
(c) T y = .
2y + 3x + z
z
Exercise 10.1.2 Consider the following functions T : R3 → R2 . For each function T , show that T is a
linear transformation. Do this by showing that T is a matrix transformation, i.e., find a matrix A such that
T (v) = Av.
374 Linear transformation of vector spaces
x
x + 2y + 3z
(a) T y = .
2y − 3x + z
z
x
7x + 2y + z
(b) T y = .
3x − 11y + 2z
z
x
3x + 2y + z
(c) T y = .
x + 2y + 6z
z
x
2y − 5x + z
(d) T y = .
x+y+z
z
Exercise 10.1.4 Recall the vector space P of polynomials with coefficients in a field K. Consider the
function M : P → P defined by M(p(x)) = xp(x).
(a) Compute M(x3 ), M(2x2 + x), and M(ax2 + bx + c).
(b) Show that M is a linear transformation.
Exercise 10.1.5 Recall the vector space P of polynomials with coefficients in a field K. Consider the
function S : P → P defined by S(p(x)) = p(x + 1).
(a) Compute S(x3 ), S(2x2 + x), and S(ax2 + bx + c).
(b) Show that S is a linear transformation.
Exercise 10.1.6 Consider the shift function shift : SeqK → SeqK from Example 10.5. Find a basis for the
solution space of each of the following equations:
(a) shift(a) = a.
(b) shift(a) = −a.
(c) shift(shift(a)) = shift(a) + 2a.
10.2. The algebra of linear transformations 375
Outcomes
A. Use algebraic properties of linear transformations to manipulate expressions.
Two linear transformations are considered to be equal if they act in the same way on all vectors. This is
the content of the following definition.
We now consider several operations on linear transformations. These include addition and scalar multipli-
cation of linear transformations, as well as the zero transformation.
(b) If T , S : V → W are linear transformations, then their sum T + S : V → W is the linear trans-
formation defined by
(T + S)(v) = T (v) + S(v)
for all v ∈ V .
Proof. To show that T + S is linear, we must verify that it preserves addition and scalar multiplication. Let
v, u ∈ V . Then we have
(T + S)(v + u) = T (v + u) + S(v + u) by definition of T + S,
= (T (v) + T (u)) + (S(v) + S(u)) by linearity of T and S,
= (T (v) + S(v)) + (T (u) + S(u)) by the associative and commutative laws of vectors,
= (T + S)(v) + (T + S)(u) by definition of T + S.
376 Linear transformation of vector spaces
Therefore, T + S preserves scalar multiplication, and hence T + S is a linear transformation. The proof
that kT is a linear transformation is left as an exercise. ♠
There operations satisfy the following properties:
But these 8 properties are just the vector space laws (A1)–(A4) and (SM1)–(SM4)! Therefore, the set of
linear transformations from V to W , with the above operations of addition and scalar multiplication, forms
a vector space.
Another important operation is the composition of linear transformations. We have already encountered
this in Definition 6.16 for the case of Rn . Here, we generalize it to linear transformations of arbitrary
vector spaces. We also consider the identity transformation on a vector space, which forms the unit for
composition.
10.2. The algebra of linear transformations 377
(T ◦ S)(v) = T (S(v))
for all v ∈ V .
T ◦ S = 1V and S ◦ T = 1U .
Proof. Suppose both S and S′ are inverses of T . Consider S ◦ T ◦ S′ . Since S and T are inverses, this is
equal to S′ , but since T and S′ are inverses, it is also equal to S. Therefore, S = S′ . ♠
378 Linear transformation of vector spaces
(T ◦ S)−1 = S−1 ◦ T −1 .
• 1−1 = 1.
Exercises
Exercise 10.2.1 Suppose V and W are vector spaces, T , S : V → W and R, Q : W → W are linear trans-
formations, and k is a scalar. Which of the following equalities are valid?
(a) R ◦ (T + kS) = R ◦ T + kR ◦ S.
(b) (R + Q) ◦ (R + Q) = R ◦ R + 2R ◦ Q + Q ◦ Q.
(c) (R + Q) ◦ (T + S) = R ◦ T + R ◦ S + Q ◦ T + Q ◦ S.
Exercise 10.2.2 Finish the proof of Proposition 10.11, i.e., prove that if T : V → W is a linear transfor-
mation and k a scalar, then kT : V → W is a linear transformation.
Outcomes
A. Check whether two linear transformations are equal by considering their action on a spanning
set.
Recall that, by definition, two linear transformations S, T : V → W are equal if and only if for all v ∈ V , we
have S(v) = T (v). However, this is not a very practical way of checking whether S = T , as it theoretically
10.3. Linear transformations defined on a basis 379
requires checking S(v) = T (v) for each one of infinitely many vectors v. The following proposition states
that it is sufficient to check the actions of S and T on a spanning set of vectors.
Proof. Assume that S(v) = T (v) holds for all v ∈ X . To show that S = T , let u ∈ V be an arbitrary vector.
Since X is a spanning set, we can write u = a1 v1 + . . . + an vn , for some v1 , . . . , vn ∈ X and a1 , . . . , an ∈ K.
By assumption, S(vi ) = T (vi ) for all i, because vi ∈ X . Then we have
S(u) = S(a1v1 + . . . + an vn )
= a1 S(v1 ) + . . . + an S(vn )
= a1 T (v1 ) + . . . + an T (vn )
= T (a1 v1 + . . . + an vn )
= T (u).
Proof. To show that such a linear transformation T exists, we first define a function T : V → W as follows.
Given any v ∈ V , there exists a unique set of coordinates a1 , . . . , an ∈ K such that
v = a1 v1 + . . . + an vn .
Then define
T (v) = a1 w1 + . . . + an wn .
This defines a function T : V → W . Next, we must check that T is linear. To show that T preserves
addition, consider v, v′ ∈ V , with v = a1 v1 + . . . + an vn and v′ = b1 v1 + . . . + bn vn . Then
= (a1 w1 + . . . + an wn ) + (b1 w1 + . . . + bn wn )
= T (v) + T (v′ ).
Therefore, T preserves scalar multiplication. It follows that T is linear. Next, we must show that T satisfies
the condition of the theorem, i.e., that T (vi ) = wi for each i. But this is clearly the case, because in this
case, ai = 1 and a j = 0 for all j 6= i. We have shown that there exists a linear function T satisfying all of
the conditions required by the theorem.
Finally, the only thing left to show is uniqueness. But this follows from Proposition 10.19. Namely, if
T ′ is another linear transformation such that T ′ (vi ) = wi for all i, then T and T ′ agree on v1 , . . . , vn , which
is a basis and hence a spanning set. By Proposition 10.19, T = T ′ . ♠
Find T (4x).
i h h i h i
Solution. Let v1 = x2 ,v2 = (x + 1)2 ,
v3 = 1
(x + 2)2 , 1 0 1
w1 = 0 0 , w2 = 0 1 , and w3 = 1 1 . We 0 0
must first find a, b, c such that 4x = av1 + bv2 + cv3 . We do this by solving a system of equations, using
the same method as in Example 9.12. We find that a = −3, b = 4, and c = −1. Therefore
−3 1
T (4x) = T (−3v1 + 4v2 − v3 ) = − 3w1 + 4w2 − w3 = .
−1 3
Exercises
Exercise 10.3.2 Let vectors v1 , . . . , vn ∈ Rn and w1 , . . . , wn ∈ Rm be given. Let A be the matrix whose
columns are v1 , . . . , vn , and assume that A−1 exists. Show that there exists a linear transformation T such
that T (vi ) = wi for i = 1, . . . , n.
Outcomes
A. Find the matrix of a linear transformation with respect to general bases in vector spaces.
In Section 6.2, we saw that linear transformations T : Rn → Rm are in one-to-one correspondence with
m ×n-matrices. Here, we will generalize this correspondence to arbitrary finite-dimensional vector spaces.
There is an important difference, however. While Rn comes with a natural coordinate system (i.e., every
vector in Rn has a first component, second component, and so on), there is no distinguished coordinate
system on an arbitrary vector space. To define the matrix of a linear transformation T : V → W , we must
first choose a basis, or equivalently a coordinate system, for V and for W . Different choices of basis will
give rise to different matrices.
Let V be a vector space with basis B = {v1 , . . . , vn }. Recall from Section 5.4.3 that the coordinates of
a vector v with respect to the basis B are the unique scalars a1 , . . . , an such that
v = a1 v1 + . . . + an vn .
As before, we write
a1
[v]B = a2
a3
to denote the coordinates of v with respect to the basis B. We will now see how to use bases and coordinates
to encode any linear map between finite-dimensional vector spaces as a matrix.
Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. By Theorem 10.20, the linear transformation T is
completely determined by the images of the basis vectors, T (v1 ), . . ., T (vn ) ∈ W . Since {w1 , . . . , wm } is a
basis of W , we can write each T (vi ) as a linear combination of w1 , . . . , wm :
T (v1 ) = a11 w1 + a21 w2 + . . . + am1 wm ,
T (v2 ) = a12 w1 + a22 w2 + . . . + am2 wm ,
···
T (vn ) = a1n w1 + a2n w2 + . . . + amn wm .
Let
a11 a12 · · · a1n
a21 a22 · · · a2n
A= .. .. . . . .
. . . ..
am1 am2 · · · amn
Then A is an m × n-matrix. We must prove that it has the desired property, i.e., that A [v]B = [T v]C , for
all v ∈ V . Since both the left-hand side and the right-hand side are linear functions of v, it suffices to
check that this property holds for basis vectors. Consider, therefore, one of the basis vectors vi . Note that
vi = 0v1 + 0v2 + . . . + 1vi + . . . + 0vn . Therefore, the coordinates of vi with respect to the basis B are
0
...
[vi ]B = 1 = ei ,
..
.
0
where ei is the usual i th standard basis vector. On the other hand, since T (vi ) = a1i w1 + . . . + ami wm , we
have " #
a1i
[T (vi )]C = .. = Aei .
.
ami
Here, in the last equation, we have used the fact that Aei is the same thing as the i th column of A. We
therefore have [T (vi )]C = Aei = A [vi ]B , as desired. ♠
Solution. We first find the images of each basis vector of the basis B, and we write each of them as a linear
combination of basis vectors from the basis C. Let us denote the basis vectors of B as v1 = 1, v2 = x,
v3 = x2 , and v4 = x3 , and the basis vectors of C as w1 = 1, w2 = x, and w3 = x2 . We have
Therefore, we have
0 1 0 0
A = [D]C,B = 0 0 2 0 .
0 0 0 3
♠
Solution. This is the same linear transformation as in the previous example, but we are given different
bases. Let us denote the basis vectors of B′ as v1 = 1, v2 = x + 1, v3 = x2 + x + 1, and v4 = x3 + x2 + x + 1,
and the basis vectors of C′ as w1 = 1, w2 = x − 1, and w3 = x2 − 1. We must write each D(vi ) as a linear
combination of w1 , w2 , and w3 , which requires solving a system of linear equations for each of them. We
have:
D(v1 ) = D(1) = 0 = 0w1 + 0w2 + 0w3 ,
D(v2 ) = D(x + 1) = 1 = 1w1 + 0w2 + 0w3 ,
2
D(v3 ) = D(x + x + 1) = 2x + 1 = 3w1 + 2w2 + 0w3 ,
D(v4 ) = D(x3 + x2 + x + 1) = 3x2 + 2x + 1 = 6w1 + 2w2 + 3w3 .
Therefore, the matrix is
0 1 3 6
[D]C′ ,B′ = 0 0 2 2 .
0 0 0 3
♠
The last two examples illustrate that a linear transformation can have many different matrices, because the
matrix depends not only on the linear transformation, but also on the given bases. The art of linear algebra
often lies in choosing “convenient” bases for a given application. Often, a “convenient” basis is one that
gives rise to simple matrices, for example, matrices containing many zeros, or matrices that are diagonal.
384 Linear transformation of vector spaces
(b) Find a basis C′ of M2,2 such that [T ]C′,B is the identity matrix.
Therefore,
1 0 0 1
0 1 −1 0
[T ]C,B =
0
.
1 1 0
1 0 0 −1
(b) Since the matrices T (x3 ), T (x2 ), T (x), and T (1) form a basis of M2,2 , we can take C′ to consist of
these four matrices, i.e.,
′ 1 0 0 1 0 −1 1 0
C = , , , .
0 1 1 0 1 0 0 −1
Then
1 0 0 0
0 1 0 0
[T ]C′ ,B =
0
0 1 0
0 0 0 1
is the identity matrix. ♠
10.4. The matrix of a linear transformation 385
(d) [0]C,B = 0.
Exercises
2 3 2 5
Exercise 10.4.1 Let B = , be a basis of R and let x = be a vector in R2 . Find
−1 2 −7
[x]B .
1 2 −1 5
Exercise 10.4.2 Let B = −1 , 1 , 0 be a basis of R3 and let x = −1 be a vector
2 2 2 4
2
in R . Find [x]B .
Exercise 10.4.6 Consider the linear transformation T : P3 → P3 given by T (p(x)) = p(x+1). Find [T ]B,B ,
where B = 1, x, x2 , x3 .
1
Exercise 10.4.7 Let v = −2 and consider the linear function T (w) = projv (w). Find the matrix of
3
T with respect to the standard basis of R3 .
1
Exercise 10.4.8 Let v = −2 and consider the linear function T (w) = projv (w). Find the matrix of
3
T with respect to the basis
1 2 3
B = {v1 , v2 , v3 } = −2 , 1 , 0 .
3 0 1
Exercise 10.4.9 Suppose that V and W are finite-dimensional vector spaces with bases B and C, respec-
tively. Let T : V → W be a linear transformation such that
T (vi ) = wi
for i = 1, . . . , n. Let M be the matrix whose columns are [v1 ]B , . . . , [vn ]B , and let N be the matrix whose
columns are [w1 ]C , . . . , [wn ]C . Suppose that M is invertible. Show that [T ]C,B = NM −1 .
11. Inner product spaces
One structure that we often use in Rn , but which is missing from abstract vector spaces, is the dot product.
In Chapter 2, we saw how to use dot products to compute the length of a vector, the angle between two
vectors, and to decide when two vectors are orthogonal. We also used dot products to define the projection
of one vector onto another, and to find shortest distances between various objects (such as points and
planes, two lines, etc).
In this chapter, we will consider inner product spaces. An inner product space is essentially an abstract
vector spaces that has been equipped with an operation that works “like” the dot product.
Outcomes
A. Check whether an operation is an inner product.
B. Give an example of a vector space on which more than one inner product can be defined.
C. Calculate the inner product of vectors in various examples of inner product spaces.
D. Calculate the norm of a vector and the angle between two vectors.
E. Use the Cauchy-Schwarz inequality and the triangle inequality to reason about the size of
inner products and norms of vectors.
3. The positive definite property: hu, ui ≥ 0, and moreover, hu, ui = 0 if and only if u = 0.
387
388 Inner product spaces
·
hu, vi = u v = u1 v1 + . . . + un vn
Proof. We must show the three properties are satisfied. For symmetry, note that a scalar is equal to its own
transpose. Therefore
hu, vi = uT Av = (uT Av)T = vT AT u.
But this is equal to hv, ui because A is a symmetric matrix, i.e., AT = A. For linearity, note that, by
properties of matrix multiplication,
hu, kv + ℓwi = uT A(kv + ℓw) = kuT Av + ℓuT Aw = khu, vi + ℓhu, wi.
For the positive definite property, we have
hu, ui = u21 + 2u1 u2 + 2u22 = (u21 + 2u1 u2 + u22 ) + u22 = (u1 + u2 )2 + u22 ≥ 0,
because the sum of two squares is always non-negative. Moreover, if equality holds in the last equation,
then we must have u1 + u2 = 0 and u2 = 0, and this is only possible if u = 0. ♠
[a, b] = {x | a ≤ x ≤ b}
be the closed interval. Let V = C[a, b] be the vector space of all continuous functions f : [a, b] → R.
Given two functions f , g ∈ C[a, b], define
Z b
h f , gi = f (x)g(x) dx.
a
Proof. This follows from well-known properties of integrals. For symmetry, we have
Z b Z b
h f , gi = f (x)g(x) dx = g(x) f (x) dx = hg, f i.
a a
This is ≥ 0 because a < b and the integral over a non-negative function is non-negative. Moreover, since f
is continuous, we know from calculus that the last integral can only be equal to 0 if f is the constant zero
function. ♠
These are called the square summable sequences. (One needs to do a little bit of work to show that
it is indeed a vector space; in particular, to show that the sum of two square summable sequences is
square summable.) On this space, we can define an inner product as follows:
ha, bi = a0 b0 + a1 b1 + a2 b2 + . . . .
The details will be worked out in Exercise 11.1.6. This inner product space is called (real) Hilbert
space.
We now look at some properties of inner products. The first thing to note is that while the axioms require
linearity in the right component, symmetry ensures that linearity in the left component also holds.
390 Inner product spaces
♠
Another important property of inner products is the Cauchy-Schwarz inequality. We have already encoun-
tered this in the context of the dot product in Section 2.6.
Proof. The proof is almost identical to that of Proposition 2.27. First note that if u = 0, then both sides of
(11.1) are equal to zero, and so there is nothing to show. Therefore, we will assume in what follows that
u 6= 0. Define a function of t ∈ R by
Then by the positive definite property, we know that f (t) ≥ 0 for all t ∈ R. Also from linearity and
symmetry, we have
This means the graph of y = f (t) is a parabola which opens upwards and is never negative. It follows
that this function has at most one root. From the quadratic formula, we know that a quadratic function
at 2 + bt + c has one or zero roots if and only if b2 − 4ac ≤ 0. Applying this reasoning to the function f (t),
we obtain
(2hu, vi)2 − 4hu, uihv, vi < 0,
which is equivalent to hu, vi2 ≤ hu, ui · hv, vi. ♠
Finally, we can use the inner product to define the norm of a vector.
Solution. In the vector space C[−1, 1], the inner product is defined as
Z 1
h f , gi = f (x)g(x) dx.
−1
We therefore have 1
Z 1 Z 1
2 1 2
hf, fi = f (x) dx = x dx = x54
= .
−1 −1 5 −1 5
p q
2
Therefore, k f k = hf, fi = 5. ♠
A vector u in an inner product space is called normalized or a unit vector if kuk = 1. We note that if v is
any non-zero vector in an inner product space, then
1
u= v
kvk
is normalized.
By the Cauchy-Schwarz inequality, we have the following properties:
Proof. The Cauchy-Schwarz inequality states that hu, vi2 ≤ kuk2 kvk2 . The claim follows by taking the
square root of both sides of the equation. ♠
ku + vk ≤ kuk + kvk .
Proof. The proof is essentially the same as that of Proposition 2.28. We have
ku + vk2 = hu + v, u + vi
= hu, ui + hu, vi + hv, ui + hv, vi
= kuk2 + 2hu, vi + kvk2
392 Inner product spaces
|hu, vi|
≤ 1,
kuk kvk
and therefore,
hu, vi
−1 ≤ ≤ 1.
kuk kvk
This ensures that the following definition is well-defined.
Solution. We have
Z 1
h1, 1i = 1 · 1 dx = 2,
−1
Z 1
2
hx2 , x2 i = x2 · x2 dx =
,
−1 5
Z 1
2 2
h1, x i = 1 · x2 dx = .
−1 3
Therefore √
2
h1, x2 i 3q 5
cos θ = = √ = .
k1k kx2 k 2 25 3
√
5
The angle θ is cos−1 ( 3 ), which is approximately 0.7297 radians or 41.81 degrees.
♠
11.1. Real inner product spaces 393
Exercises
Exercise 11.1.1 For each matrix A, determine whether the formula hu, vi = uT Av determines an inner
product on R2 .
1 0 3 1 1 1 1 0 1 1
(a) A = , (b) A = , (c) A = , (d) A = , (e) A = .
0 2 1 3 0 2 0 −1 1 1
Exercise 11.1.2 Consider the inner product space C[0, 1] as in Example 11.4. Compute the following
inner products:
(a) h1, xi, (b) hx, x2 i, (c) h1 + x, 2 + x2 i.
Exercise 11.1.3 Consider the inner product space C[0, 1] as in Example 11.4. Compute the following
norms:
(a) k1k , (b) kxk , (c)
x2 + 1
.
Exercise 11.1.4 For u, v vectors in R3 , define the product u ∗ v = u1 v1 + 2u2 v2 + 3u3 v3 . Show that
|u ∗ v| ≤ (u ∗ u)1/2 (v ∗ v)1/2 .
Hint: first show that the operation hu, vi = u ∗ v is an inner product on R3 , then use Proposition 11.11.
Exercise 11.1.5 In C[−1, 1], find (a) the angle between x and x2 , (b) the angle between x and x3 .
Exercise 11.1.6 . In this exercise, we will work out the details of Example 11.6. We must show that HilbR
is a vector space. We will do this by showing that it is a subspace of SeqR . Further, we must show that the
inner product is well-defined. This requires some knowledge of convergent series from calculus.
(a) Assume a = (a0 , a1 , . . .) and b = (b0 , b1 , . . .) are square summable sequences. Show that the series
a0 b0 + a1 b1 + a2 b2 + . . . converges to a real number. Hint: consider the series |a0 b0 | + |a1 b2 | + . . .
and use the Cauchy-Schwarz inequality and the absolute convergence test.
(b) Using the result of part (a), show that HilbR is a subspace of SeqR .
11.2 Orthogonality
Outcomes
A. Determine whether two vectors in an inner product space are orthogonal.
hu, vi = 0.
We note that the zero vector 0 is orthogonal to all vectors, because hu, 0i = 0 follows from the linearity
of the inner product. We also note that u ⊥ v if and only if v ⊥ u; this follows from the symmetry of the
inner product.
Proof. We clearly have 0 ∈ S⊥, because 0 is orthogonal to all vectors. To show that S⊥ is closed under
addition, assume v, v′ ∈ S⊥. We have to show v + v′ ∈ S⊥ . So take an arbitrary w ∈ S. Then we have
hv + v′ , wi = hv, wi + hv′ , wi = 0 + 0 = 0.
Therefore, v + v′ ∈ S⊥ . Finally, to show that S⊥ is closed under scalar multiplication, assume v ∈ S⊥ and
k ∈ R. We have to show kv ∈ S⊥. So take an arbitrary w ∈ S. Then we have
hkv, wi = khv, wi = k0 = 0.
11.2. Orthogonality 395
S
S⊥
Solution. We have to compute the set of all polynomials of the form p(x) = ax3 + bx2 + cx + d that are
orthogonal to x2 . So let us compute the inner product:
Z 1
2
hp(x), x i = (ax3 + bx2 + cx + d)x2 dx
−1
Z 1
= ax5 + bx4 + cx3 + dx2 dx
−1
2 2
= 0a + b + 0c + d.
5 3
Setting this equal to 0, we see that hp(x), x2 i = 0 if and only if 52 b + 23 d = 0, or equivalently, 3b + 5d = 0.
The basic solutions are
a 1 a 0 a 0
b 0 b 5 b 0
= , =
c 0 c 0 , c = 1 ,
d 0 d −3 d 0
♠
We now consider orthogonal sets and bases.
396 Inner product spaces
The interest of orthogonal and orthonormal sets of vectors lies, among other things, in the fact that they
are automatically linearly independent.
Proof. Assume {u1 , . . . , uk } is an orthogonal set. To show that u1 , . . . , uk are linearly independent, assume
a1 u1 + . . . + ak uk = 0.
g0 (x) = 1
f1 (x) = sin x
g1 (x) = cos x
f2 (x) = sin 2x
g2 (x) = cos 2x
.
..
fk (x) = sin kx
gk (x) = cos kx
.
..
11.2. Orthogonality 397
Proof. We have to check that any two functions are orthogonal to each other. This is true because of
trigonometric identities. We have the following trigonometric formulas:
1
sin α sin β = (cos(α − β ) − cos(α + β )),
2
1
cos α cos β = (cos(α − β ) + cos(α + β )),
2
1
sin α cos β = (sin(α − β ) + sin(α + β )).
2
Using these, we can compute the relevant inner products. Assume i 6= j. Then
Z 2π
hgi , g j i = cos(ix) cos( jx) dx
0
Z
1 2π
= cos((i − j)x) + cos((i + j)x) dx
2 0
2π
1 1 1
= sin((i − j)x) + sin((i + j)x)
2 (i − j) (i + j) 0
= 0,
and therefore gi ⊥ g j . The proofs of fi ⊥ f j and fi ⊥ g j are similar. ♠
We note that, by Proposition 11.20, every orthogonal (or orthonormal) basis is automatically linearly
independent, and therefore an actual basis of W .
So why are we interested in orthogonal and orthonormal bases? The main reason is that finding coordinates
is much easier when the bases are orthogonal (and even better when they are orthonormal). We have the
following property:
v = a1 u1 + . . . + an un ,
where
hui , vi
ai = .
hui , ui i
In this situation, the coordinates a1 , . . . , an are also called the Fourier coefficients of v (with respect
to the orthogonal basis B).
In case B is an orthonormal basis, the formula is even simpler. In that case:
ai = hui , vi.
Proof. Since B is a basis of W , we know that there exist coefficients a1 , . . . , an such that v = a1 u1 + . . . +
an un . It remains to verify that the coefficients satisfy the required formula. This is a simple calculation.
We have
hui , vi = hui , a1 u1 + . . . + an un i
= a1 hui , u1 i + . . . + ai hui , ui i + . . . + an hui , un i
= 0 + . . . + ai hui , ui i + . . . + 0
= ai hui , ui i.
v = a1 u1 + a2 u2 + a3 u3 .
11.2. Orthogonality 399
hu1 , vi hu1 , vi 3
a1 = = 2
= = 3,
hu1 , u1 i ku1 k 1
hu2 , vi hu2 , vi −1
a2 = = 2
= = − 0.5,
hu2 , u2 i ku2 k 2
hu3 , vi hu3 , vi 2
a3 = = 2
= = 0.5.
hu3 , u3 i ku3 k 4
♠
Proposition 11.24 shows that if {u1 , . . . , un } is an orthogonal basis, then we can solve v = a1 u1 +. . .+an un
without having to solve a system of equations. While this is a useful thing to be able to do, perhaps it is
merely a convenience (solving a system of equations would also be fine). However, there is another very
useful property of Fourier coefficients. The coefficient ai only depends on the basis vector ui , and not on
any of the other basis vectors. This is useful in situations where only part of an orthogonal basis is known.
We can calculate the corresponding coefficients without having to know the rest of the basis.
0 3
but it is not known what u3 and u4 are. Find the first two coordinates of the vector
1
0
v= 2
hu1 , vi −1 hu2 , vi 4
a1 = = and a2 = = .
hu1 , u1 i 6 hu2 , u2 i 14
A word of warning is in order: Fourier coefficients do not work when the basis is not orthogonal. Consider
the following picture, where u1 and u2 are not orthogonal.
a2
u2 v
u1 a1
The coordinates of v with respect to u1 , u2 are (2, 1), because v = 2u1 + 1u2 , as indicated by the shaded
parallelogram. On the other hand, the formula for the Fourier coefficients is concerned with the orthogonal
projections of v onto u1 and u2 , as indicated by the dashed lines. It yields the coefficients
hu1 , vi
a1 = ≈ 2.5,
hu1 , u1 i
hu2 , vi
a2 = ≈ 1.8.
hu2 , u2 i
These are not the same as the coordinates of v, because the dashed lines are orthogonal to u1 and u2 ,
instead of parallel to them. In summary, the Fourier coefficients of a vector are equal to its coordinates
only if the basis is orthogonal.
When u1 , . . . , uk are orthogonal, we have a convenient formula for the norm of a1 u1 + . . . + ak uk :
Proof. We have
ka1 u1 + . . . + ak uk k2 = ha1 u1 + . . . + ak uk , a1 u1 + . . . + ak uk i
= a21 hu1 , u1 i + a1 a2 hu1 , u2 i + . . . + a1 an hu1 , un i
+ a2 a1 hu2 , u1 i + a22 hu2 , u2 i + . . . + a2 an hu2 , un i
.. .. ..
+ . . .
+ an a1 hun , u1 i + an a2 hun , u2 i + . . . + a2n hun , un i
= a21 ku1 k2 + . . . + a2k kuk k2 .
Here we have used the fact that hui , u j i = 0 when i 6= j. ♠
We conclude this section by making a connection between arbitrary inner products and dot products. Once
an orthonormal basis has been chosen on an inner product space, computing inner products is essentially
the same as computing dot products. The following proposition makes this more precise.
11.2. Orthogonality 401
·
hv, wi = [v]B [w]B .
hv, wi = ha1 u1 + . . . + an un , b1 u1 + . . . + bn un i
= a1 b1 hu1 , u1 i + a1 b2 hu1 , u2 i + . . . + a1 bn hu1 , un i
+ a2 b1 hu2 , u1 i + a2 b2 hu2 , u2 i + . . . + a2 bn hu2 , un i
.. .. ..
+ . . .
+ an b1 hun , u1 i + an b2 hun , u2 i + . . . + an bn hun , un i
= a1 b1 + a2 b2 + . . . + an bn
·
= [v]B [w]B .
Here we have used the fact that hui , u j i = 1 when i = j and hui , u j i = 0 otherwise, which holds because B
is orthonormal. ♠
Exercises
1 1 0
Exercise 11.2.1 Let A = 1 2 0 , and consider R3 with the inner product given by hu, vi = uT Av.
0 0 2
Which of the following vectors are orthogonal to each other?
1 −1 7 10
u1 = 1 , u2 = 2 , u3 = −5 , u4 = −2 .
1 −2 −2 −7
Exercise 11.2.2 On C[−1, 1], which of the following functions are orthogonal to each other?
Exercise 11.2.3 Consider the inner product space P3 of polynomials of degree at most 3, with the inner
product defined by
Z 1
h f , gi = f (x)g(x) dx.
−1
(a) Find the orthogonal complement of x2 , x .
Exercise 11.2.4 Consider R3 as an inner product space with the usual dot product. For each of the
following bases of R3 , state whether it is orthonormal, orthogonal, or neither.
1 0 0
(a) 0 , 1 , 0 .
0 0 1
1 0 0
(b) 0 , 1 , 0 .
1 1 1
1 0 −2
(c) 0 , 1 , 0 .
2 0 1
3 4
5 0 5
4 3
(d) 5 , 0 , − 5 .
0 −1 0
Exercise 11.2.6 Suppose B = {u1 , u2 , u3 } is an orthogonal basis of R3 . We have been told that
1
u1 = 1 ,
0
but it is not known what u2 and u3 are. Find the first coordinate of the vector
1
v= 0
2
Outcomes
A. Use the Gram-Schmidt procedure to find an orthogonal basis of a subspace of an inner product
space.
Although we have already seen some potential uses for orthogonal bases, we have not yet seen very many
examples of such bases. In this section, we will look at the Gram-Schmidt orthogonalization procedure, a
method for turning any basis into an orthogonal one.
The basic idea is very simple: if two vectors v1 , v2 are not orthogonal, then we can make them orthog-
onal by replacing v2 by a vector of the form u2 = v2 − tv1 , for a suitable parameter t.
v1 v2
u2 = v2 − tv1
But what is the correct value of t? It turns out that this value is uniquely determined by the requirement
that v1 and u2 must be orthogonal. We calculate
hv1 , u2 i = hv1 , v2 − tv1 i = hv1 , v2 i − thv1 , v1 i.
Setting this equal to 0 yields the unique solution
hv1 , v2 i
t=
hv1 , v1 i
Note that this is exactly the same thing as the Fourier coefficient of v2 in the direction of v1 . The following
proposition summarizes what we have found so far. For consistency with our later notation, we also
rename the first basis vector v1 to u1 .
u 1 = v1 ,
hu1 , v2 i
u 2 = v2 − u1 .
hu1 , u1 i
1 0
Solution. Let v1 = 1 and v2 = 1 . We calculate
0 1
1
u 1 = v1 = 1 ,
0
0 1 −1/2
hu1 , v2 i 1
u 2 = v2 − u1 = 1 − 1 = 1/2 .
hu1 , u1 i 2
1 0 1
1 −1/2
Therefore the desired orthogonal basis is 1 , 1/2 . ♠
0 1
The procedure for finding an orthogonal basis of a k-dimensional space is very similar. We adjust each
basis vector vi by subtracting a suitable linear combination of previous orthogonal basis vectors.
u1 = v1 ,
hu1 , v2 i
u2 = v2 − u1 ,
hu1 , u1 i
hu1 , v3 i hu2 , v3 i
u3 = v3 − u1 − u2 ,
hu1 , u1 i hu2 , u2 i
.
..
hu1 , vk i hu2 , vk i huk−1 , vk i
uk = vk − u1 − u2 − . . . − uk−1 .
hu1 , u1 i hu2 , u2 i huk−1 , uk−1 i
Proof. First, it is clear that {v1 , . . . , vk } and {u1 , . . . , uk } span the same subspace, as each vi is a linear
combination of u1 , . . . , ui and conversely, each ui is a linear combination of v1 , . . . , vi . So the only thing
we must check is that {u1 , . . . , uk } is an orthogonal set. In other words, we must show that hu j , ui i = 0 for
11.3. The Gram-Schmidt orthogonalization procedure 405
all j < i. We prove this by induction on i, i.e., we assume it is already true for all pairs of indices smaller
than i. To show hu j , ui i = 0, we calculate:
hu1 ,vi i hu j ,vi i hui−1 ,vi i
hu j , ui i = hu j , vi − hu1 ,u1 i u1 − ... − hu j , u j i u j − ... − hui−1 ,ui−1 i ui−1 i
hu1 ,vi i hu j ,vi i hui−1 ,vi i
= hu j , vi i − hu1 ,u1 i hu j , u1 i − ... − hu j ,u j i hu j , u j i − ... − hui−1 ,ui−1 i hu j , ui−1 i
hu j ,vi i
= hu j , vi i − 0 − . . . − hu j ,u j i hu j , u j i − ... − 0
= hu j , vi i − hu j , vi i
= 0.
Solution. Let
1 1 1
1 1 1
v1 =
1 , v2 =
1 , and v3 =
0 .
1 0 0
We calculate
1
1
u1 = v1 =
1 ,
1
1 1 1/4
hu1 , v2 i 1
u2 = v2 − u1 = − 3 1 = 1/4 ,
hu1 , u1 i 1 4 1 1/4
0 1 −3/4
hu1 , v3 i hu2 , v3 i
u3 = v3 − u1 − u2
hu1 , u1 i hu2 , u2 i
1 1 1/4 1/3
1
= − 2 1 − 1/2 1/4 = 1/3 .
0 4 1 3/4 1/4 −2/3
0 1 −3/4 0
406 Inner product spaces
1 1/4 1/3
1/3
1 1/4
Therefore the orthogonal basis is {u1 , u2 , u3 } = 1 ,
, . ♠
1/4 −2/3
1 −3/4 0
The Gram-Schmidt procedure is sensitive to reordering the vectors. For example, if we order the original
basis vectors in Example 11.32 in the opposite order, we end up with a different orthogonal basis at the
end. Sometimes this can simplify the calculations, as the following example shows.
Solution. Note that these are the same basis vectors as in Example 11.32, but listed in a different order.
Let
1 1 1
1 1 1
v1 =
0 , v2 = 1 , and v3 = 1 .
0 0 1
We calculate
1
1
u1 = v1 =
0 ,
0
1 1 0
hu1 , v2 i 1
u 2 = v2 − u1 = − 2 1
=
0 ,
hu1 , u1 i 1 2 0 1
0 0 0
hu1 , v3 i hu2 , v3 i
u 3 = v3 − u1 − u2
hu1 , u1 i hu2 , u2 i
1 1 0 0
1 0
= − 2 1 − 1 0 = .
1 2 0 1 1 0
1 0 0 1
1 0 0
1 0 0
This time, we end up with the orthogonal basis {u1 , u2 , u3 } =
, , . ♠
0 1 0
0 0 1
In the next example, we will consider Rn , but with a non-standard inner product.
11.3. The Gram-Schmidt orthogonalization procedure 407
Apply the Gram-Schmidt procedure to v1 , v2 , v3 to find an orthogonal basis {u1 , u2 , u3 } for R3 with
respect to the above inner product.
Next, we calculate
Therefore,
0 1 −2
hu1 , v2 i 2
u2 = v2 − u1 = 1 − 0 = 1 .
hu1 , u1 i 1
0 0 0
Finally, we calculate
Therefore,
0 1 −2 5
hu1 , v3 i hu2 , v3 i −2 3
u3 = v3 − u1 − u2 = 0 − 0 − 1 = −3/2 .
hu1 , u1 i hu2 , u2 i 1 2
1 0 0 1
Note that it is not orthogonal with respect to the dot product, but with respect to the inner product defined
above. ♠
u1 = v1 = 1.
Then
hu1 , v2 i 0
u 2 = v2 − u1 = x − · 1 = x.
hu1 , u1 i 2
Then
hu1 , v3 i hu2 , v3 i 2/3 1
u3 = v3 − u1 − u2 = x2 − · 1 − 0 · x = x2 − .
hu1 , u1 i hu2 , u2 i 2 3
Z 1 1 1
hu3 , v4 i = (x2 − 13 ) · x3 dx = 6 1 4
6 x − 12 x −1
= 0,
−1
Z 1 1
hu3 , u3 i = (x2 − 13 )2 dx = 5 − 2 x3 + 1 x 1 8
5 x 9 9 −1 = 45 .
−1
Then
hu1 , v4 i hu2 , v4 i hu3 , v4 i
u4 = v4 − u1 − u2 − u3
hu1 , u1 i hu2 , u2 i hu3 , u3 i
0 2/5 0 1 3
= x3 − · 1 − ·x − · (x2 − ) = x3 − x.
2 2/3 8/45 3 5
Thus, we obtain the orthogonal basis {u1 , u2 , u3 , u4 } = 1, x, x2 − 13 , x3 − 35 x . ♠
The orthogonal polynomials from Example 11.35 are known (up to scalar multiples) as Legendre poly-
nomials. We can continue in the same fashion applying the Gram-Schmidt procedure to the polynomials
1, x, x2 , x3 , x4 , x5 , x6 , . . . to get an infinite sequence of orthogonal polynomials. The first few elements of
this sequence are:
p0 (x) = 1,
p1 (x) = x,
1
p2 (x) = x2 − ,
3
3
p3 (x) = x3 − x,
5
6 3
p4 (x) = x − x2 + ,
4
7 35
10 5
p5 (x) = x5 − x3 + x,
9 21
15 5 5
p6 (x) = x6 − x4 + x2 − ,
11 11 231
21 105 3 35
p7 (x) = x7 − x5 + x − x,
13 143 429
28 14 28 2 7
p8 (x) = x8 − x6 + x4 − x + .
15 13 143 1287
410 Inner product spaces
p(x)
x2 − 13
x4 − 67 x2 + 35
3
x6 − 15 4 5 2 5
11 x + 11 x − 231 x
x5 − 10 3 5
9 x + 21 x −1 1
x3 − 35 x
The Gram-Schmidt procedure yields an orthogonal basis. If we want to compute an orthonormal basis,
we also have to normalize each basis vector. Since normalization usually involves dividing by a square
root, it is best to do this at the end, i.e., after the entire Gram-Schmidt procedure is complete, rather than
normalizing each ui immediately after it is found. Note that the Gram-Schmidt procedure itself does not
involve computing any square roots.
for this space. So all that is left to do is to normalize each vector. The orthonormal basis is
1 1/4 √ 1/3
u1 u2 u3 1 2 3 1/3
1
, √ 1/4
, ,
ku1 k ku2 k ku3 k
=
2 1 3 1/4 , √2 −2/3 .
1 −3/4 0
Alternatively, we could have also normalized the orthogonal basis we found in Example 11.33. In that
case, we obtain the orthonormal basis
1 0 0
1 0 0
1
√
, , .
0 1 0
2
0 0 1
♠
Solution. In Example 11.35, we found the orthogonal basis {u1 , u2 , u3 , u4 } = 1, x, x2 − 13 , x3 − 35 x . We
also computed
2 8
hu1 , u1 i = 2, hu2 , u2 i = , and hu3 , u3 i = .
3 45
We also need to compute hu4 , u4 i:
Z 1 Z 1
3 3 1 8
hu4 , u4 i = (x3 − 53 x)2 dx = x6 − 65 x4 + 259 2
x dx = 71 x7 − 25
6 5
x + 25 x −1 = .
−1 −1 175
Therefore, the orthonormal basis is:
( r r r )
u1 u2 u3 u4 1 3 45 2 1 175 3 3
, , , = √ , x, (x − ), (x − x) .
ku1 k ku2 k ku3 k ku4 k 2 2 8 3 8 5
♠
We finish this section by remarking that the formula
hu, vi
u
hu, ui
is exactly what we called the projection of v onto u in Section 2.6.5, except that we have generalized this
concept from Rn to an arbitrary inner product space. We can define
hu, vi
proju (v) = u.
hu, ui
With this definition, the Gram-Schmidt procedure can also be expressed more succinctly as follows.
u1 = v1 ,
u2 = v2 − proju1 (v2 ),
u3 = v3 − proju1 (v3 ) − proju2 (v3 ),
..
.
uk = vk − proju1 (vk ) − proju2 (vk ) − . . . − projuk−1 (vk ).
412 Inner product spaces
Exercises
Exercise 11.3.1 In R3 with the usual dot product, find an orthogonal basis for
1 2
span 2 , 6 .
3 0
Exercise 11.3.2 In R4 with the usual dot product, find an orthogonal basis for
0 3
1 0
span
1 , 2 .
0 −1
Exercise 11.3.3 In R4 with the usual dot product, find an orthogonal basis for
1 1 2
0 3 4
span
1 , 1 , 2 .
0 −1 2
1 2 3
11.4. Orthogonal projections and Fourier series 413
and let W = span {v1 , v2 , v3 }. Apply the Gram-Schmidt procedure to v1 , v2 , v3 to find an orthogonal basis
{u1 , u2 , u3 } for W with respect to the above inner product.
Exercise 11.3.6 Consider the inner product space C[0, 2], with the inner product given by
Z 2
hp, qi = f (x)g(x) dx.
0
Use the Gram-Schmidt procedure to find an orthogonal basis for span 1, x, x2 .
Exercise 11.3.7 Find an orthonormal basis for the subspace of R3 from Exercise 11.3.1.
Outcomes
A. Compute the orthogonal projection of a vector onto a subspace.
We now reconsider a problem that we briefly encountered, only in the context of R3 , in Section 3.2:
how to find the shortest distance between a point and a subspace. The method we used in Section 3.2
(see Example 3.26) relies on the existence of normal vectors and does not generalize beyond R3 . The
following proposition gives a much better method for solving this problem, provided that we have an
orthogonal basis of the subspace.
414 Inner product spaces
v′
u2
0
u1 W
The vector v′ is called the orthogonal projection of v onto W . We also say that v′ is the best
approximation of v in W .
Here, a1 , . . . , ak+1 are the fixed coordinates of v, and x1 , . . . , xk depend on w. Therefore, kv − wk takes its
11.4. Orthogonal projections and Fourier series 415
This is what had to be shown. Finally, to show that v − v′ is orthogonal to W , note that v − v′ = ak+1 uk+1 ,
which is orthogonal to each u1 , . . . , uk , and therefore to W . ♠
Solution. We calculate hu1 , vi = 9, hu1 , u1 i = 6, hu2 , vi = 12, and hu2 , u2 i = 12. Therefore, by Proposi-
tion 11.38, the desired vector is
1 −1 1/2
hu1 , vi hu2 , vi 9 1
+ 12 −1 = 1/2 .
v′ = u1 + u2 =
hu1 , u1 i hu2 , u2 i 6 2 12 1 4
0 3 3
♠
Note how straightforward the calculations in this example are. All we had to do is calculate a few inner
products. It is not even necessary to solve a system of equations. Such is the power of orthogonal bases.
The next example shows that we can use exactly the same method to find approximations of functions.
416 Inner product spaces
Find the closest approximation to f by a polynomial of degree at most 2. Graph both f and the
approximating polynomial.
Solution. Let W = span 1, x, x2 be the subspace of V consisting of polynomials of degree at most 2.
What we are looking for is an element g ∈ W such that k f − gk is as small as possible. We can solve this
problem using Proposition 11.38.
First we need an orthogonal basis for W . We found such an orthogonal basis in Example 11.35, namely
the Legendre polynomials p0 (x) = 1, p1 (x) = x, and p2 (x) = x2 − 31 . By Proposition 11.38, the desired
approximation g ∈ W is given by:
hp0 , f i hp1 , f i hp2 , f i
g = p0 + p1 + p2 .
hp0 , p0 i hp1 , p1 i hp2 , p2 i
We already computed the inner products hp0 , p0 i = 2, hp1 , p1 i = 23 , and hp2 , p2 i = 8
45 in Example 11.35.
We calculate the remaining inner products:
Z 1 Z 0 Z 1
1 1
hp0 , f i = 1 · f (x) dx = 1 · (1 + x) dx + 1 · (1 − x) dx =
+ = 1,
−1 −1 0 2 2
Z 1 Z 0 Z 1
1 1
hp1 , f i = x · f (x) dx = x · (1 + x) dx + x · (1 − x) dx = − + = 0,
−1 −1 0 6 6
Z 1
1
hp2 , f i = (x2 − ) · f (x) dx
−1 3
Z 0 Z 1
1 1 1 1 1
= (x − ) · (1 + x) dx + (x2 − ) · (1 − x) dx = − −
2
= − .
−1 3 0 3 12 12 6
Therefore,
hp0 , f i hp1 , f i hp2 , f i
g = p0 + p1 + p2
hp0 , p0 i hp1 , p1 i hp2 , p2 i
1 0 1/6
= p0 + p1 − p2
2 2/3 8/45
1 15 2 1
= − (x − )
2 16 3
13 15 2
= − x .
16 16
The following graph shows the function f (x) as well as the polynomial g(x):
11.4. Orthogonal projections and Fourier series 417
f (x) = 1 − |x|
13
g(x) = 16 − 15
16 x
2
x
−1 1
♠
When approximating a function f by a polynomial, as in the last example, there is no need to stop with
polynomials of degree 2. We can also ask what is the best approximation of f by a polynomial of degree
3, of degree 4, of degree 5, and so on. By increasing the degree of the polynomials, we get better and
better approximations to f . This leads us to the concept of a generalized Fourier series.
hu1 , vi
v1 = u1 ,
hu1 , u1 i
hu1 , vi hu2 , vi
v2 = u1 + u2 ,
hu1 , u1 i hu2 , u2 i
hu1 , vi hu2 , vi hu3 , vi
v3 = u1 + u2 + u3 ,
hu1 , u1 i hu2 , u2 i hu3 , u3 i
hu1 , vi hu2 , vi hu3 , vi hu4 , vi
v4 = u1 + u2 + u3 + u4 ,
hu1 , u1 i hu2 , u2 i hu3 , u3 i hu4 , u4 i
.
..
By Proposition 11.38, we know that each vi is the best approximation of v in the subspace span {u1 , . . . , ui }.
In particular, kv − vi+1 k ≤ kv − vi k, so each vi is potentially a better approximation of v than the previous
one. In a course on analysis1 , you will learn that in many situations, the sequence v1 , v2 , v3 , . . . can be
shown to converge to v.
Solution. We use the Legendre polynomials p0 , . . . , p8 from Section 11.3 and calculate the relevant inner
products, each of which requires solving an integral. As these integrals get a bit complicated, it is best to
use a computer algebra system to compute them.
hp0 , f i = 1, hp0 , p0 i = 2,
2
hp1 , f i = 0, hp1 , p1 i = 3,
hp2 , f i = − 16 , hp2 , p2 i = 8
45 ,
8
hp3 , f i = 0, hp3 , p3 i = 175 ,
1 128
hp4 , f i = 105 , hp4 , p4 i = 11025 ,
128
hp5 , f i = 0, hp5 , p5 i = 43659 ,
1 512
hp6 , f i = − 924 , hp6 , p6 i = 693693 ,
512
hp7 , f i = 0, hp7 , p7 i = 2760615 ,
1 32768
hp8 , f i = 6435 , hp8 , p8 i = 703956825 .
f0
f4
f8
x
f6
−1 1 f
2
♠
11.4. Orthogonal projections and Fourier series 419
f1
f f5
f7
f3
f0
x
−1 1
Comparing the last two examples, we see that the function in Example 11.43 is much harder to ap-
proximate well by polynomials. This is due to the discontinuity in the latter function. Nevertheless, the
sequence of approximations eventually converges to f .
by polynomials up to degree 7.
420 Inner product spaces
f1
f f7
f5
f3
f0
x
−1 1
So far, we have worked with Legendre polynomials, but there are of course other examples of or-
thogonal sets of functions. An important such set is given by sine and cosine waves. We will see that
every periodic function can be decomposed into sine and cosine waves of varying frequencies. This was
Fourier’s original discovery, and the corresponding series are just known as Fourier series (i.e., not “gen-
eralized”).
u0 = 1
u1 = sin x
u2 = cos x
u3 = sin 2x
u4 = cos 2x
u5 = sin 3x
u6 = cos 3x
...
Consider the function f (x) = x − π, where x ∈ [0, 2π ]. Find its Fourier series.
Solution. Following Definition 11.41, we must calculate a number of inner products. We have hu0 , u0 i =
2π . Also, for all i ≥ 1, we have hui , ui i = π . We note the following antiderivatives, for k ≥ 1:
Z Z
x 1 x 1
x sin kx dx = − cos kx + 2 sin kx, x cos kx dx = sin kx + 2 cos kx.
k k k k
Using these formulas, we can compute the following inner products quite easily:
Z 2π
hu0 , f i = (x − π ) = 0,
0
11.4. Orthogonal projections and Fourier series 421
Z 2π
hu1 , f i = (x − π ) sin x = − 2π ,
0
Z 2π
hu2 , f i = (x − π ) cos x = 0,
0
Z 2π
2π
hu3 , f i = (x − π ) sin 2x = − ,
0 2
Z 2π
hu4 , f i = (x − π ) cos 2x = 0,
0
Z 2π
2π
hu5 , f i = (x − π ) sin 3x = − ,
0 3
Z 2π
hu6 , f i = (x − π ) cos 3x = 0,
0
f1 = −2 sin x,
2
f3 = −2 sin x − sin 2x,
2
2 2
f5 = −2 sin x − sin 2x − sin 3x,
2 3
2 2 2
f7 = −2 sin x − sin 2x − sin 3x − sin 4x,
2 3 4
2 2 2 2
f9 = −2 sin x − sin 2x − sin 3x − sin 4x − sin 5x,
2 3 4 5
and so on. ♠
The following graph illustrates the successive approximations of this Fourier series.
f7 f
f5
f3
f1
x
π 2π 3π 4π 5π 6π
Note that, although the function f is defined on the interval [0, 2π ], we have extended it periodically for
[2π , 4π ], [4π , 6π ], and so on. This makes sense because all of the orthogonal functions 1, sin x, cos x,
sin 2x, and so on, have period 2π .
We can also illustrate the same information differently, by showing the individual sine waves making
up the wave form of the function f (x). In the context of audio signals, these sine waves are also called the
harmonics of the signal.
422 Inner product spaces
− 21 sin(x)
+ − 22 sin(2x)
= + − 23 sin(3x)
+ − 24 sin(4x)
+ ···
f
f8
f6
f4
f2
x
π 2π 3π 4π 5π 6π
Note how rapidly this Fourier series converges to f . The following image shows the function f (extended
periodically outside the interval [0, 2π ]) as a sum of its harmonics. Compared to Example 11.45, we see
that the higher harmonics have much smaller amplitudes, which explains the rapid convergence.
− π4 cos(x)
+ − 94π cos(3x)
= + − 254π cos(5x)
+ − 494π cos(7x)
+ ···
11.4. Orthogonal projections and Fourier series 423
f1 f3 f5 f7
f
x
π 2π 3π 4π 5π 6π
Once again, here is an image showing the function f as a sum of its harmonics:
4
π sin(x)
+ 4
sin(3x)
3π
= + 4
5π sin(5x)
+ 4
sin(7x)
7π
+ ···
Watch the video at http://y2u.be/3IAMpH4xF9Q for a demonstration of what the functions from Exam-
ples 11.45–11.47, and their harmonics, sound like as audio signals.
Exercises
Note that u1 and u2 are orthogonal. Find the best approximation of v in span {u1 , u2 }.
424 Inner product spaces
Note that u1 , u2 , and u3 are orthogonal. Find the best approximation of v in span {u1 , u2 , u3 }.
Exercise 11.4.3 In the inner product space V = C[−1, 1], consider the function f ∈ V given by
(
1 if x < 0,
f (x) =
1 − x if x ≥ 0.
Find the closest approximation to f by a polynomial of degree at most 0, 1, 2, 3, and 4. Graph both f and
the approximating polynomials.
Exercise 11.4.4 In the inner product space C[−π , π ], consider the orthogonal set of functions from Ex-
ample 11.21: {1, sin x, cos x, sin 2x, cos 2x, sin 3x, cos 3x, . . .}. Let f (x) = x2 , where x ∈ [−π , π ]. Find the
Fourier series of f .
Outcomes
A. Find least squares approximations for a system of equations.
B. Find best fit lines and parabolas for a set of data points.
In this section, we will consider the problem of finding approximate solutions to a system of linear equa-
tions Av = b. This can be useful when the system is inconsistent, but we would still like to find a “best”
answer. For example, consider the following system of equations.
This system has the solution (x, y) = (1, 1), so it is clearly consistent. Now imagine that we introduce
some small inaccuracies into the equations. The inaccuracies might perhaps be due to round-off errors, or
11.5. Application: Least squares approximations and curve fitting 425
due to measurement errors if the coefficients are obtained from experimental data. We might end up with
the following system of equations:
Except for a tiny error in one of the coefficients, this is the same system of equations as before. We would
expect that such a small error does not affect the result much. However, this last system of equations
is inconsistent; it has no solutions at all. You can see this by observing that (x, y) = (1, 1) is still the
unique solution to the last two equations; substituting this into the first equation, we get 0.3000001 = 0.3,
which almost, but not exactly, true. What we would like to find in a situation like this is an “approximate
solution”, i.e., numbers x and y such that each of the three equations “almost” holds. We can formulate
the problem more precisely as follows:
kAv − bk
is as small as possible. We call such a vector v a least squares approximation for the system of
equations.
To see why this is called a “least squares” approximation, consider a system of equations
a11 x1 + · · · + a1n xn = b1 ,
a21 x1 + · · · + a2n xn = b2 ,
···
am1 x1 + · · · + amn xn = bm .
Then
a11 x1 + · · · + a1n xn − b1
Av − b = ··· ,
am1 x1 + · · · + amn xn − bm
and therefore
Therefore, minimizing kAv − bk is the same as minimizing the sum of the squares of the errors of all the
equations, where the error of each equation is defined to be the difference between its left-hand side and
right-hand side.
426 Inner product spaces
AT Av = AT b.
Proof. Let a1 , . . . an be the columns of the matrix A. Recall that span {a1 , . . . an } is called the column space
of A, which we write as col(A). If
x1
v = ... ,
xn
is any vector, then by the definition of matrix multiplication, we have
Av = x1 a1 + . . . + xn an .
Therefore, a vector is of the form Av if and only if it is an element of col(A). In particular, the equation
Av = b has a solution if and only if b is an element of the column space of A. For v to be a least squares
approximation, we want kAv − bk to be as small as possible. This means that we are looking for the
element of col(A) that is closest to b.
b Av − b
Av
0 col(A)
From Proposition 11.38, we know that this happens when Av − b is orthogonal to col(A). Since col(A) =
span {a1 , . . . an }, this is equivalent to saying that Av − b is orthogonal to each of the vectors a1 , . . . an .
Therefore,
aTi (Av − b) = 0
for i = 1, . . . , n. Since aT1 , . . . aTn are the rows of the matrix AT , this system of n equations is equivalent to
the single equation
AT (Av − b) = 0,
or equivalently,
AT Av = AT b.
11.5. Application: Least squares approximations and curve fitting 427
2x + 2y + 2z = 1,
x − y − z = −2,
−x − y + 2z = 4,
2x + 2y − z = −8.
Solution. We first write down a system of equations that expresses the relationship y = a + bx for each of
the seven data points (x, y):
a + 17b = 320,
a + 22b = 570,
a + 26b = 850,
a + 20b = 470,
a + 24b = 750,
a + 23b = 620,
a + 23b = 680.
This is a system of seven equations in two variables (the variables are a and b). This system is likely
inconsistent, because there are more equations than variables, and also because it is unlikely that the
relationship between temperature and sales is exactly (as opposed to approximately) linear. Instead, we
find the least squares approximation for the system of equations. We let
1 17 320
1 22 570
1 26 850
A=
1 20 and b = 470 .
1 24 750
1 23 620
1 23 680
800
600
400
200 x (temperature)
15 20 25
The following table compares the observed data to the computed linear regression (sorted by increasing
temperature). It also shows the error for each data point.
Temperature Actual sales Best linear fit Error
17 320 300 −20
20 470 480 +10
22 570 600 +30
23 620 660 +40
23 680 660 −20
24 750 720 −30
26 850 840 −10
The sum of the squares of the errors is 202 + 102 + 302 + 402 + 202 + 302 + 102 = 4400. ♠
Solution. We are looking for an equation of the form y = a + bx + cx2 . Substituting each of the seven data
points into the equation, we obtain 7 equations in the unknowns a, b, and c:
a + 3b + 9c = 23.5,
a + 4b + 16c = 13.5,
a + 5b + 25c = 12.5,
a + 6b + 36c = 5.5,
a + 7b + 49c = 9.0,
a + 8b + 64c = 8.0,
a + 9b + 81c = 19.0.
We write this in matrix form at Av = b, where
1 3 9 23.5
1 4 16 13.5
1 5 25 12.5
A= 1 6 36
and b =
5.5 .
1 7 49 9.0
1 8 64 8.0
1 9 81 19.0
To find the least squares approximation, we calculate
7 42 280 91
AT A = 42 280 2016 and AT b = 518
280 2016 15316 3430
and solve the system of equations AT Av = AT b, i.e.,
7 42 280 a 91
42 280 2016 b = 518 .
280 2016 15316 c 3430
After doing some row operations, we find that the unique solution is (a, b, c) = (67, −19, 1.5). Therefore,
the desired quadratic approximation is y = 67 − 19x + 1.5x2 . The following plot shows this function along
with the original data points.
y
25
20
15
10
0 x
0 5 10
11.5. Application: Least squares approximations and curve fitting 431
The following table shows the original data, the values of the best fit parabola, and the error for each data
point.
x Actual y Best fit parabola Error
3 23.5 23.5 0.0
4 13.5 15.0 +1.5
5 12.5 9.5 −3.0
6 5.5 7.0 +1.5
7 9.0 7.5 −1.5
8 8.0 11.0 +3.0
9 19.0 17.5 −1.5
The sum of the squares of the errors is 02 + 1.52 + 32 + 1.52 + 1.52 + 32 + 1.52 = 27. ♠
Exercises
Exercise 11.5.1 Find the least squares approximation for the system of equations
x + 2y + 2z = 5,
x + y − z = 11,
x + 2y − z = −18,
2x − y + 2z = 0.
Exercise 11.5.2 Find the least squares approximation for the system of equations
−1 2 1 1
−1 0 −1 x 1
2 0 2 y = 3 .
0 0 2 z −2
1 2 2 4
Exercise 11.5.3 Consider the points (x1 , y1 ) = (−1, 0), (x2 , y2 ) = (0, 3), (x3 , y3 ) = (1, 3), (x4 , y4 ) = (2, 5),
(x5 , y5 ) = (3, 9). Find the least squares line for these points.
Exercise 11.5.4 Consider the points (x1 , y1 ) = (−1, 4), (x2 , y2 ) = (0, −2), (x3 , y3 ) = (1, 4), (x4 , y4 ) =
(2, 2). Find the least squares parabola for these points.
432 Inner product spaces
Outcomes
A. Determine whether a linear transformation is an isometry and/or orthogonal.
p p
In particular, if T is an isometry, we have kT vk = hT v, T vi = hv, vi = kvk, so isometries preserve
the norm. (This is fact where the name “isometry” comes from: from Greek “isos”, meaning “equal”, and
“metron”, meaning “measure”). Conversely, any norm-preserving linear function is an isometry, as the
following proposition shows.
Therefore, T is an isometry. ♠
We also note that if T is an isometry and u ⊥ v, then T (u) ⊥ T (v), because hT (u), T (v)i = hu, vi = 0.
So isometries preserve right angles. In fact, isometries also preserve arbitrary angles, due to the formula
hu,vi hT (u),T (v)i
cos θ = kuk kvk = kT (u)k kT (v)k .
11.6. Orthogonal functions and orthogonal matrices 433
T1 T2
7−→ 7−→
T3
7−→
not orthogonal
·
Proof. (a) Recall from Proposition 11.28 that for all u, v ∈ V , we have hu, vi = [u]B [v]B . Consider two
vectors u, v ∈ V and let t = [u]B and s = [v]B . We have:
Solution. We have
T 0 1 0 −1 1 0
A A= = = I,
−1 0 1 0 0 1
and therefore A defines an isometry and (since it is also a square matrix) an orthogonal transformation.
We have
T 1 1 1 1 2 1
B B= = 6= I,
1 0 1 0 1 1
so the linear transformation defined by B is neither an isometry nor orthogonal. A similar pair of calcula-
tions shows that CT C = I and DT D = I, so both C and D are isometries. Since C is a square matrix, it is
also orthogonal. D is not a square matrix, and therefore not orthogonal. ♠
The following proposition gives several equivalent ways of checking whether an n × n-matrix P is orthog-
onal.
11.6. Orthogonal functions and orthogonal matrices 435
(a) P is orthogonal.
(b) PT P = I .
(d) PPT = I .
(e) PT is orthogonal.
Proof. The equivalence (a) ⇔ (b) is just the definition of orthogonality, bearing in mind that P is assumed
to be a square matrix. The equivalences (b) ⇔ (c) ⇔ (d) follow from Theorem 4.64, since a square matrix
is invertible if and only if it is left invertible if and only if it is right invertible. The equivalence (d) ⇔
(e) is again just the definition of orthogonality, this time applied to PT . To show (b) ⇔ (f), let a1 , . . . , an
be the columns of P, and let e1 , . . . , en be the standard basis vectors of Rn . Then we have PT P = I if and
only if for all i, j, the (i, j)-entry of PT P is equal to the (i, j)-entry of I. But the (i, j)-entry of PT P is
·
eTi PT Pe j = aTi a j = ai a j . On the other hand, the (i, j)-entry of I is 1 if i = j and 0 if i 6= j. It follows that
PT P = I if and only if for all i, j, (
1 when i = j and
·
ai a j =
0 otherwise.
But this is exactly what it means for a1 , . . . , an to form an orthonormal set. The proof of (d) ⇔ (g) is
completely analogous, using rows instead of columns. ♠
Solution. By Proposition 11.59, it suffices to check whether the columns of each matrix are orthonormal.
This is the case for A and B. For example, the columns of B are
2 −1 2
1 1 1
b1 = −1 , b2 = 2 , and b3 = 2 ,
3 3 3
2 2 −1
436 Inner product spaces
and we have
2 2 2 −1
1 1
b1 · b1 = −1
9
2
· −1 = 1,
2
b1 · b2 = −1
9
2
· 2 = 0,
2
−1 −1 2 2
1 1
b2 · b2 = 2
9
2
· 2 = 1,
2
b1 · b3 = −1
9
2
· 2 = 0,
−1
2 2 −1 2
1 1
b3 · b3 = 2
9
−1
· 2 = 1,
−1
b2 · b3 = 2
9
2
· 2 = 0.
−1
So the columns of B are orthonormal. The columns of C are orthogonal, but not normalized, so C is not an
orthogonal matrix. The columns of D are normalized, but not orthogonal, so D is not an orthogonal matrix
either. ♠
Exercises
Exercise 11.6.2 Which of the following matrices define orthogonal transformations? Which ones define
isometries?
1 2
1 0 1 1 1 1 1 −2 1
A= , B= , C= √ , D= 2 2 .
0 −1 2 −1 1 5 2 1 4
2 −1
Exercise 11.6.3 Determine which of the following matrices are orthogonal by checking whether the
columns form an orthonormal set.
1 2 2 0 0 2 2 −1 2 3 4 0
1 1 1
A = 2 1 −2 , B = 0 2 0 , C = −1 2 −2 , D = 4 −3 0 .
3 3 5
−2 2 −1 2 0 0 2 −2 1 0 0 5
Outcomes
A. Compute the real eigenvalues of a symmetric matrix.
In Chapter 8, we saw that some matrices are diagonalizable, and others are not. In this section, we will see
that the theory of diagonalization is much nicer when the matrix to be diagonalized is symmetric. Recall
that a matrix A is symmetric if A = AT . We begin with some observations about the eigenvalues and
eigenvectors of a symmetric matrix.
Proof. (a) Suppose λ is a (possibly complex) eigenvalue of A, with (possibly complex) eigenvector v. We
will show that λ is in fact real. Let λ be the complex conjugate of λ , and let v be the complex conjugate
of v, i.e., the vector obtained from v by taking the complex conjugate of each of its entries. Since v is an
eigenvector for eigenvalue λ of A, we have
vT Av = λ vT v. (11.2)
438 Inner product spaces
Taking the complex conjugate of both sides of the equation, and using the fact that A = A (since A is real),
we get
vT Av = λ vT v. (11.3)
Then, taking the transpose of both sides of the equation, and using the fact that AT = A (since A is sym-
metric), we have
vT Av = λ vT v. (11.4)
Comparing equations (11.2) and (11.4), we find that λ vT v = λ vT v. Since vT v is a non-zero scalar, it
follows that λ = λ , i.e., λ is real.
(b) Suppose v is an eigenvector for eigenvalue λ , w is an eigenvector for eigenvalue µ , and λ 6= µ . We
evaluate vT Aw in two different ways. On the on hand, we have
vT Aw = vT (Aw) = vT (µ w) = µ vT w.
Recall that an n × n-matrix A is diagonalizable if there exists an invertible matrix P and a diagonal matrix
D such that D = P−1 AP. We say that A is orthogonally diagonalizable if P can, moreover, be chosen
to be orthogonal. Orthogonal diagonalizability is a convenient property, because when P is orthogonal,
then P−1 = PT , and we can interchangeably write D = P−1 AP or D = PT AP. The following is the main
theorem about the diagonalization of real symmetric matrices.
Proof. By induction on the size of the matrix. For a 1 × 1-matrix, there is nothing to show, as it is already
diagonal. Now consider a real symmetric n × n-matrix A with n ≥ 2. By the fundamental theorem of
algebra, the characteristic polynomial has at least one root, so that A has at least one (possibly complex)
eigenvalue λ . By Proposition 11.62(a), λ is real. Since we can solve the system of equations (A− λ I)v = 0
over the real numbers, there exists a real eigenvector v for the eigenvalue λ . We can assume without loss of
generality that v is normalized, because else we could replace v by k1vk v. By the Gram-Schmidt method, we
can find an orthonormal basis {u1 , . . . , un } of Rn such that u1 = v. Let Q be the orthogonal matrix that has
u1 , . . . , un as its columns, and consider B = Q−1 AQ. Since Be1 = Q−1 AQe1 = Q−1 Au1 = λ Q−1 u1 = λ e1 ,
the matrix B is of the form
λ b12 · · · b1n
0 b22 · · · b2n
B = .. .. . . .. .
. . . .
0 bn2 · · · bnn
11.7. Diagonalization of symmetric matrices 439
Moreover, since B = Q−1 AQ = QT AQ, the matrix B is symmetric. Therefore b12 , . . . , b1n = 0, and B is of
the form
λ 0 ··· 0
0
B = .. ,
. C
0
where C is a symmetric matrix of dimension n − 1. By induction hypothesis, C is orthogonally diagonal-
izable, i.e., there exists an orthogonal matrix R such that R−1CR = D is diagonal. Let
1 0 ··· 0
0
S = .. .
. R
0
Therefore, E is diagonal. Let P = QS. Then P is orthogonal by Proposition 11.61(1). Moreover, we have
Solution. We proceed in much the same way as in Chapter 8, except that at a crucial moment, we ensure
that P is orthogonal. We start by calculating the characteristic polynomial of A:
3−λ 2
det(A − λ I) = = (3 − λ )(6 − λ ) − 4 = λ 2 − 9λ + 14.
2 6−λ
We find the roots using the quadratic formula. The roots of the characteristic polynomial, and therefore
the eigenvalues of A, are λ = 7 and λ = 2. For the eigenvalue λ = 7, we find the eigenvector
1
v= ,
2
440 Inner product spaces
Solution. Again, we start by calculating the characteristic polynomial and its roots:
3−λ 1 −2
det(A − λ I) = 1 3 − λ 2 = − λ 3 + 6λ 2 − 32.
−2 2 −λ
The roots are λ = 4 and λ = −2. To find the eigenvectors for λ = 4, we solve the system of equations
−1 1 −2
(A − 4I)v = 1 −1 2 v = 0.
−2 2 −4
The solution space is 2-dimensional, with basis
1 0
v1 = 1
and v2 = 2 .
0 1
11.7. Diagonalization of symmetric matrices 441
We now have an orthogonal basis {u1 , u2 , v3 } of eigenvectors of the matrix A. We normalize the three
vectors and use them as the columns of P:
√1 − √ 1 √1
2 3 6
√1 1
P= √ − √1 .
2 3 6
0 √1 √2
3 6
It seems that inverting P will not be all that easy, but in fact, since P is orthogonal, the inverse is just the
transpose:
√1 √1 0
2 2
−1 T √1 1 √1
P = P = − √ .
3 3 3
√1 − √1 √2
6 6 6
We have
4 0 0
P−1 AP = D = 0 4 0 .
0 0 −2
♠
442 Inner product spaces
Exercises
Exercise 11.7.1 For each of the following symmetric matrices, find the eigenvalues, an orthonormal basis
for each eigenspace, and then orthogonally diagonalize the matrix.
1 1 0 2 1 2
−1 2 3 1
(a) A = , (b) A = , (c) A = 1 2 1 , (d) A = 1 2 2 .
2 2 1 3
0 1 1 2 2 5
Exercise 11.7.2 Prove the converse of Theorem 11.63: if a matrix A is orthogonally diagonalizable, then
A is symmetric.
Outcomes
A. Determine whether a matrix is positive semidefinite and/or positive definite, either directly or
by looking at the eigenvalues.
B. Determine whether a matrix is positive semidefinite and/or positive definite using Descartes’
rule of signs.
In Example 11.3, we saw that it is sometimes possible to define an inner product on Rn from an n × n-
matrix A by the formula
hu, vi = uT Av.
We will now explore in more detail under what conditions this formula defines an inner product.
• A is called symmetric if A = AT .
Equivalently, the positive definite property can also be stated as follows: A is positive definite if it is
symmetric and for all v ∈ Rn with v 6= 0, we have vT Av > 0.
11.8. Positive semidefinite and positive definite matrices 443
x
Solution. The matrix A is positive definite. To see why, consider any vector v = . Then
y
2 −1 x
T
v Av = x y = 2x2 − 2xy + y2 = x2 + (x − y)2 ≥ 0.
−1 1 y
This inequality implies that A is positive semidefinite. Moreover, vT Av = 0 if and only if x = 0 and
x − y = 0, which implies x = y = 0. So v = 0 is the only solution of vT Av = 0, and A is positive definite.
The matrix
B isnot positive semidefinite (and therefore not positive definite either). For example,
1
consider v = . Then
−1
T
1 2 1
v Bv = 1 −1 = 1 − 2 − 2 + 1 = − 2 < 0,
2 1 −1
showing that B is not positive semidefinite.
x
The matrix C is positive semidefinite, but not positive definite. To see why, consider v = . Then
y
1 −1 x
T
v Cv = x y = x2 − 2xy + y2 = (x − y)2 ≥ 0.
−1 1 y
Therefore,
C is positive semidefinite. However, it is not positive definite because, for example, for v =
1
, we have v 6= 0 but vT Cv = 0.
1
The matrix D is not symmetric, and therefore neither positive semidefinite nor positive definite. ♠
The interest of positive definite matrices lies in the following proposition.
hu, vi = uT Av
defines an inner product on Rn if and only if A is positive definite. Moreover, every inner product
on Rn arises from some positive definite matrix A in this way.
Proof. Define hu, vi = uT Av. We will check each of the three properties of an inner product from Defini-
tion 11.1. Linearity holds for all matrices A, because
hu, kvi = uT A(kv) = k(uT Av) = khu, vi
444 Inner product spaces
and
hu, v + v′ i = uT A(v + v′ ) = uT Av + uT Av′ = hu, vi + hu, v′ i.
For symmetry, first observe that for all u, v,
Therefore hu, vi = hv, ui if and only if uT Av = uT AT v. This holds for all u, v if and only if A = AT .
Therefore, symmetry holds if and only if A is symmetric. Finally, the positive definite property of the
inner product holds, by definition, if and only if A is positive definite.
To prove the second part, consider any inner product
hu, vi on Rn . Let e1 , . . . , en be the standard basis
vectors, and define ai j = hei , e j i. Then A = ai j is a matrix. We claim that hu, vi = uT Av for all u, v.
Indeed, let u = u1 e1 + . . . + un en and v = v1 e1 + . . . + vn en . Then
hu, vi = hu1 e1 + . . . + un en , v1 e1 + . . . + vn en i
= u1 v1 he1 , e1 i + u1 v2 he1 , e2 i + . . . + un vn hen , en i
= u1 v1 a11 + u1 v2 a12 + . . . + un vn ann
= uT Av.
Proof. By Theorem 11.63, we know that A is orthogonally diagonalizable, i.e., D = PT AP, where P is an
orthogonal matrix and
λ1 · · · 0
D = ... . . . ... .
0 · · · λn
Let v = [x1 , . . . , xn ]T be any vector, and let w = Pv. Then wT Aw = vT PT APv = vT Dv, so that A is pos-
itive (semi)definite if and only if D is positive (semi)definite. Also, we have vT Dv = λ1 x21 + . . . + λn x2n .
Therefore vT Dv ≥ 0 for all v if and only if λ1 , . . . , λn ≥ 0. Moreover, vT Dv > 0 for all v 6= 0 if and only if
λ1 , . . . , λn > 0, as claimed. ♠
11.8. Positive semidefinite and positive definite matrices 445
The eigenvalues are λ1 = 0, λ2 = 5, and λ3 = 6. Therefore, the matrix A is positive semidefinite, but not
positive definite. ♠
While Proposition 11.69 gives us a method to determine whether a matrix A is positive definite (or semidef-
inite), it still requires finding all of the eigenvalues of A, i.e., to factor the characteristic polynomial. This
can be a difficult calculation, especially if the degree of the polynomial is large or if the roots are not inte-
gers. Fortunately, there is a better way to determine whether a matrix is positive definite (or semidefinite)
by directly looking at the characteristic polynomial, without having to calculate its roots. The method was
found by René Descartes in 1637 and is called Descartes’ rule of signs. The general form of Descartes’
rule of signs is actually more complicated than what we consider here. We state a version that has been
specialized to characteristic polynomials of symmetric matrices.
Let a0 , . . . , an be a sequence of real numbers. We say that a0 , . . . , an have strongly alternating signs
if a0 > 0, a1 < 0, a2 > 0, and so on (i.e., ai > 0 when i is even, and ai < 0 when i is odd). We say that
a0 , . . . , an have weakly alternating signs if a0 ≥ 0, a1 ≤ 0, a2 ≥ 0, and so on (i.e., ai ≥ 0 when i is even,
and ai ≤ 0 when i is odd).
Proof. We first prove a general fact about polynomials. Suppose d1 , . . . , dn are real numbers and
Proof: To prove the first claim, assume d1 , . . . , dn > 0. It is easy to see that by multiplying out (x +
d1 )(x + d2 ) · · · (x + dn ), we can only obtain positive coefficients, so b0 , . . . , bn > 0. Conversely, assume
b0 , . . . , bn > 0. Then for every x ≥ 0, we clearly have bn xn + bn−1 xn−1 + . . . + b1 x + b0 > 0, and therefore
no such x ≥ 0 can be a root of this polynomial. In other words, all of the roots must be negative. Since the
roots are −d1 , . . . , −dn , it follows that d1 , . . . , dn > 0. The proof of the second claim (using “≥” instead of
“>”) is similar.
We are now ready to prove Proposition 11.71. By Theorem 11.63, we know that A is diagonalizable,
i.e., A = PDP−1 for some real diagonal matrix D. Note that A and D have the same characteristic polyno-
mial. If d1 , . . . , dn are the diagonal entries of D, the characteristic polynomial can therefore be written in
two different ways:
We have: A is positive definite if and only if d1 , . . . , dn > 0, if and only if a0 , −a1 , a2 , −a3 , . . . > 0, if and
only if a0 , . . . , an are strongly alternating. Moreover, A is positive semidefinite if and only if d1 , . . . , dn ≥ 0,
if and only if a0 , −a1 , a2 , −a3 , . . . ≥ 0, if and only if a0 , . . . , an are weakly alternating. ♠
Example 11.72: Using Descartes’ rule of signs to check if a matrix is positive definite
Use Descartes’ rule of signs to check whether the matrix
5 0 2
A= 0 5 1
2 1 1
Solution. We already found the characteristic polynomial in Example 11.70: it is −λ 3 + 11λ 2 − 30λ . The
coefficients are a0 = 0, a1 = −30, a2 = 11, and a3 = −1. Note that we have included all of the coefficients,
even ones that are zero. Since the signs are weakly alternating, but not strongly alternating, the matrix is
positive semidefinite, but not positive definite. ♠
Example 11.73: Using Descartes’ rule of signs to check if a matrix is positive definite
Use Descartes’ rule of signs to determine which of the following matrices are positive definite
and/or positive semidefinite.
2 1 2 2 1 −2 2 −1 2
A = 1 1 0 , B = 1 1 −1 , C = −1 2 0 .
2 0 2 −2 −1 2 2 0 3
11.8. Positive semidefinite and positive definite matrices 447
Exercises
Exercise 11.8.1 Determine by direct calculation (i.e., without calculating the characteristic polynomial
or the eigenvalues) which of the following matrices are positive definite, positive semidefinite, or neither.
2 1 1 1 0 0
1 1 1 −2 2 3
A= , B= , C= , D = 1 1 0 , E = 0 2 0 .
1 1 −2 5 1 3
1 0 1 0 0 −1
Exercise 11.8.2 Calculate the eigenvalues of each symmetric matrix, then determine for each matrix
whether it is positive definite, positive semidefinite, or neither.
2 1 0 3 −4 2
2 2 4 −6
A= , B= , C= 1 1 −1 , D = −4 4 0 .
2 5 −6 9
0 −1 2 2 0 4
is positive definite, positive semidefinite, or neither.
Exercise 11.8.3 Which of the following formulas define an inner product on R3 ? Here, u = [u1 , u2 , u3 ]T
and v = [v1 , v2 , v3 ]T .
Exercise 11.8.4 Use Descartes’ rule of signs to determine which of the following matrices are positive
definite and/or positive semidefinite.
2 1 −1 1
2 −1 0
2 2 1 1 0 0
A= , B = −1 1 0 , C =
.
2 1 −1 0 2 −1
0 0 1
1 0 −1 1
448 Inner product spaces
Outcomes
A. Determine whether or not a function of several variables is a quadratic form.
In this section, we will explore an application of the diagonalization of symmetric matrices, namely, the
simplification of quadratic forms. Quadratic forms are special kinds of functions that arise, for example,
in calculus when we approximate some quantity up to terms of second order.
The numbers q1 , . . . , qn , q12 , . . . , qn−1,n , which may be positive, negative, or zero, are called the
coefficients of the quadratic form.
Solution. The function f is a quadratic form. The function g is not a quadratic form, because the constant
term, +3, is not of degree 2. It should be either a coefficient times the square of a variable, or a coefficient
11.9. Application: Simplification of quadratic forms 449
times the product of two variables. The function h is a quadratic form. We can simplify it to h(x, y, z) =
x2 + 2xy + y2 − z2 . ♠
f (x1 , . . . , xn ) = vT Av
is a quadratic form in n variables. Conversely, every quadratic form in n variables can be uniquely
written in this way. We call this the matrix form of the quadratic form.
Solution. We have
1 2 4 x
f (x, y, z) = x y z 2 0 −1 y = x2 − 2z2 + 4xy + 8xz − 2yz.
4 −1 −2 z
Note that there is a term 2xy and a term 2yx, which together yield 4xy. Similarly the term 4xz and 4zx are
combined into 8xz, and the terms −yz and −zy are combined into −2yz. ♠
in matrix form.
is
d e
a 2 2
A=
d
2 b f
2
.
e f
2 2 c
♠
We will now turn to the question of how to simplify quadratic forms. The primary tool we have for doing
so is a change of variables. This means replacing the variables x1 , . . . , xn by new variables y1 , . . . , yn that
are linear combinations of x1 , . . . , xn .
x = u
y = v−w
z = w
Solution. We have
3x2 + y2 + 2xy + 2xz + 2yz
= 3u2 + (v − w)2 + 2u(v − w) + 2uw + 2(v − w)w
= 3u2 + v2 − w2 + 2uv.
♠
The simplest kind of quadratic form is one that involves only squared variables, and no products of two
different variables. We call such quadratic forms diagonal.
f (x1 , . . . , xn ) = vT Av,
11.9. Application: Simplification of quadratic forms 451
is diagonal. Let w = [y1 , . . . , yn ]T be a new set of variables such that v = Pw. Then
We orthogonally diagonalize the matrix A. In Example 11.64, we found that P−1 AP = D, where
1
√ − √13 √1 4 0 0
2 6
P = √12 √13 − √16 and D = 0 4 0 .
0 √1 √2 0 0 −2
3 6
Next, we turn our attention to the task of sketching the solutions of quadratic equations in 2 or more
variables.
(a) x2 + y2 = 1,
(b) x2 + 2y2 = 1.
11.9. Application: Simplification of quadratic forms 453
x
−1 1
−1
√
(b) The curve x2 + 2y2 = 1 is the same, except that it has been stretched by a factor √ of 1/ 2 in the
y-direction. In other words, it is an ellipse with x-intercepts ±1 and y-intercepts ±1/ 2.
√1 y
2
x
−1 1
− √12
♠
√
In general,
√ for a, b > 0, the curve ax2 + by2 = 1 is an ellipse with x-intercepts ±1/ a and y-intercepts
±1/ b. Similarly, for 2 2 2
√ a, b, c > 0, the equation
√ ax + by + cz = 1√describes a 3-dimensional ellipsoid
with x-intercepts ±1/ a, y-intercepts ±1/ b, and z-intercepts ±1/ c:
z
√1
c
y
x √1 √1
a b
Each ellipse or ellipsoid has a set of principal axes, which are its axes of symmetry. When the quadratic
forms are diagonal, as in the above examples, the principal axes are the standard coordinate axes. When
454 Inner product spaces
the quadratic forms are not diagonal, the principal axes are the eigenvectors of the matrix A. This is the
content of the following proposition.
vT Av = 1
form an n-dimensional ellipsoid whose principal axes are parallel to u1 , . . . , un and whose ui -
intercepts are ± √1 .
λi
Proof. Let P be the orthogonal matrix whose columns are u1 , . . . , un . From the proof of Proposition 11.81,
we know that the equation vT Av = 1 is equivalent to λ1 y21 + . . . + λn y2n = 1, where y1 , . . . , yn are variables
such that
y1
v = P ... . (11.5)
yn
Since A is positive definite, we have λ1 , . . . , λn > 0 by Proposition 11.69. We therefore know that in
the (y1 , . . . , yn )-coordinate
√ system, the equation λ1 y21√ + . . . + λn y2n = 1 describes an ellipsoid whose y1 -
intercepts are ±1/ λ1 , whose y2 -intercepts are ±1/ λ2 , and so on. The only thing that remains to do
is to figure out the direction of the coordinate axes. The y1 -axis points in the direction of the point with
coordinates (y1 , . . . , yn ) = (1, 0, . . ., 0). Using the change of variables formula (11.5), we find that this
corresponds to the first column of P, i.e., v = u1 . Similarly, the y2 -axis points in the direction of u2 , and
so on. ♠
Solution. The matrix for the quadratic form 3x2 + 4xy + 6y2 is
3 2
A= .
2 6
with respective eigenvalues λ1 = 7 and λ2 = 2. Therefore, by Proposition 11.85, the√curve 3x2 +√ 4xy +
2
6y = 1 is an ellipse with principal axes u1 and u2 , and with respective intercepts ±1/ 7 and ±1/ 2.
y
1 1 u1
u2 √1
7
1
√1
2 x
−1 1
− √1
7
− √1
2 −1
−1
−1
Exercises
Exercise 11.9.2 Find the coefficients of the quadratic form f (x, y, z) = vT Av, where
0 1 2
A = 1 −3 −4 .
2 −4 5
456 Inner product spaces
Exercise 11.9.4 Apply the change of variables x = u + 2w, y = v, z = w − v to the quadratic form
x2 + 7y2 + 5z2 − 4xy − 4xz + 11yz.
Exercise 11.9.5 Perform a change of variables so that the quadratic form f (x, y) = 3x2 − 2xy + 3y2 be-
comes diagonal.
Exercise 11.9.7 Find the principal axes of the following curves, and sketch them:
(a) x2 + 12 y2 = 1,
Exercise 11.9.8 Find the principal axes of the ellipsoid 2x2 + 2y2 + 3z2 + 2xz − 2yz.
Outcomes
A. Compute dot products in Cn .
B. Use properties of the complex dot product to prove equalities and inequalities.
G. Use the Gram-Schmidt procedure to find an orthogonal basis of a subspace of a complex inner
product space.
So far, in this chapter, the field K was always R, the set of real numbers. The reason we have not con-
sidered inner products over other fields K is that the positive definite property requires hu, ui ≥ 0, and the
requirement that a scalar is “greater or equal to 0” does not make sense if K is, say, the field of integers
modulo p.
In this section, we will consider inner product spaces over the complex numbers. It turns out that the
theory of complex inner product spaces is similar, but not completely identical, to that of real inner product
spaces. To explain the difference, consider the definition of the dot product. In Rn , the dot product of two
vectors
x1 y1
v = ... and w = ...
xn yn
is defined to be
·
v w = x1 y1 + . . . + xn yn .
One of the most important properties of the dot product is positivity: for all v, we have
·
v v = x21 + . . . + x2n ≥ 0.
The reason positivity holds is that the square of a real number is always greater or equal to 0. If we blindly
replaced x1 , . . . , xn by complex numbers and kept the same definition of dot product, positivity would no
longer hold. This is because for a complex number z, it is not in general true that z2 ≥ 0. In fact, z2 may
not be a real number, and even in cases where z2 is real, it may not be positive. For example, if z = i, then
z2 = −1.
Fortunately, all is not lost: the complex numbers actually do have a useful positivity property. Namely,
if z = a + bi is a complex number and z = a − bi is its complex conjugate, then
zz = (a − bi)(a + bi) = a2 + b2 ≥ 0.
So instead of squaring a complex number, we should multiply it by its conjugate. With this in mind, we
arrive at the following definition of dot product on Cn :
·
v w = v1 w1 + . . . + vn wn .
The complex dot product satisfies properties that are similar to, but not exactly the same as, the properties
satisfied by the real dot product.
458 Inner product spaces
· ·
• Conjugate symmetry: u v = v u.
We note that the complex dot product can be equivalently expressed as a matrix product, namely
w1
·
v w = v1 · · · vn ... = vT w.
wn
Here, v denotes the complex conjugate of a vector (i.e., taking the complex conjugate of each component
of a vector), and (−)T denotes the transpose as usual. As a matter of fact, when working with complex
vectors and matrices, it turns out that we should almost always take the complex conjugate at the same
time as taking the transpose. For this reason, we introduce a special name and notation for the conjugate
transpose of a vector or matrix.
·
v w = v∗ w.
We are now ready to state the definition of a complex inner product, which is a generalization of the
complex dot product.
Note that conjugate symmetry implies that hu, ui is a real number for every vector u. Namely, let z = hu, ui.
Then by conjugate symmetry, we have
z = hu, ui = hu, ui = z.
Since z is equal to its own conjugate, it must be a real number. Therefore, the positive definite property
makes sense: when we require that hu, ui ≥ 0, we are talking about a real number that is greater than or
equal to 0. (It would not in general make sense to ask whether a complex number is greater than or equal
to 0).
The space Cn with the complex dot product is evidently an example of a complex inner product space.
Here is another example:
Armed with this definition of complex inner products, we can now pretty much redo everything we did for
real inner products in the complex case. The only thing we have to be careful about is to put the complex
conjugate operation in the correct places.
p
• The norm of a vector in a complex inner product space is defined to be kuk = hu, ui. This
definition makes sense because hu, ui ≥ 0.
• The Cauchy-Schwarz inequality |hu, vi| ≤ kuk kvk and the triangle inequality ku + vk ≤ kuk +
kvk hold in complex inner product spaces.
• Two vectors u, v in a complex inner product space are called orthogonal, in symbols u ⊥ v, if
hu, vi = 0.
• A set of vectors {u1 , . . . , un } is called an orthogonal set if the vectors are non-zero and pairwise
orthogonal, and an orthonormal set if the vectors are moreover normalized.
• If u1 , . . . , uk are orthogonal, then ka1 u1 + . . . + ak uk k2 = |a1 |2 ku1 k2 +. . .+|ak |2 kuk k2 . The absolute
value signs are necessary because ai ai = |ai |2 .
460 Inner product spaces
Solution. We have
∗
−i
hu, vi = u v = 1 −i = −i − i = −2i 6= 0,
1
so u and v are not orthogonal. We have
∗
i
hu, wi = u w = 1 −i = i − i = 0,
1
so u and w are orthogonal. Note that it is crucial here that we did not forget to take the complex conjugate
of u, or else we would have gotten a different answer. ♠
In some formulas, we must be careful about whether we use hv, wi or hw, vi. Although this did not make
any difference in the case of real inner products, it does make a difference for complex inner products,
because in general, hv, wi =6 hw, vi. In particular, we have to be careful about this in the formulas for
Fourier coefficients, projections, and the Gram-Schmidt procedure.
v = a1 u1 + . . . + an un .
hv, ui i hui , vi
ai = or ai = ?
hui , ui i hui , ui i
v = a1 u1 + . . . + an un ,
then
hui , vi hv, ui i
ai = and ai = .
hui , ui i hui , ui i
hu1 , vi hu1 , vi i
a1 = = = = i,
hu1 , u1 i ku1 k2 1
hu2 , vi hu2 , vi −2 2
a2 = = = = − ,
hu2 , u2 i ku2 k2 5 5
hu3 , vi hu3 , vi 1 − 2i 1 1
a3 = = = = − i.
hu3 , u3 i ku3 k2 4 4 2
♠
The Gram-Schmidt orthogonalization procedure works without changes in complex inner product spaces,
as long as we are careful not to confuse hui , v j i with hv j , ui i.
462 Inner product spaces
u1 = v1 ,
hu1 , v2 i
u2 = v2 − u1 ,
hu1 , u1 i
hu1 , v3 i hu2 , v3 i
u3 = v3 − u1 − u2 ,
hu1 , u1 i hu2 , u2 i
.
..
hu1 , vk i hu2 , vk i huk−1 , vk i
uk = vk − u1 − u2 − . . . − uk−1 .
hu1 , u1 i hu2 , u2 i huk−1 , uk−1 i
Use the Gram-Schmidt procedure to find an orthogonal basis for span {v1 , v2 }.
The desired orthogonal basis is {u1 , u2 }. We double-check that u1 and u2 are indeed orthogonal:
1
hu1 , u2 i = u∗1 u2 = 1 − i 1 −i −1 − i = (1 − i)1 + 1(−1 − i) + (−i)(−2) = 0.
−2
♠
Orthogonal projections also work in complex inner product spaces.
Solution. First notice that hu1 , u2 i = 0, so that u1 and u2 are orthogonal. Therefore, the best approximation
is given by
hu1 , vi hu2 , vi
v′ = u1 + u2 .
hu1 , u1 i hu2 , u2 i
We calculate the relevant inner products:
0
hu1 , vi = 1 −i 0 2 = − 2i,
4
0
hu2 , vi = −i 1 1 2 = 6,
4
464 Inner product spaces
1
hu1 , u1 i = 1 −i 0 i = 2,
0
i
hu2 , u2 i = −i 1 1 1 = 3.
1
Therefore,
1 i i
hu1 , vi hu2 , vi −2i 6
v′ = u1 + u2 . = u1 + u2 = −i i +2 1 = 3 .
hu1 , u1 i hu2 , u2 i 2 3
0 1 2
To double-check the answer, we can check that v − v′ is indeed orthogonal to u1 and u2 . We have
−i
v − v′ = −1
2
and
−i
hu1 , v − v′ i = 1 −i 0 −1 = 0,
2
−i
hu2 , v − v′ i = −i 1 1 −1 = 0.
2
♠
We finish this section with some remarks on the differences between the notations used in mathematics and
in physics. Complex inner product spaces are very important in physics because they are the foundation
of quantum mechanics. The adjoint of a matrix A is usually denoted A∗ in mathematics and A† in physics.
In quantum mechanics, a column vector v is often written |vi, and the corresponding row vector v∗ is then
written as hv|. This is the so-called Dirac notation. With this convention, an inner product v∗ w is hv||wi,
which is usually written as hv|wi. Also, the matrix vw∗ is written |vihw| and is called an outer product. In
mathematics, it is customary for inner products to be linear in the left component and antilinear in the right
component. In physics, it is customary to use the opposite convention, i.e., inner products are antilinear in
the left component and linear in the right component. In this book, we have used the physics convention
of antilinearity in the left component, because it is the better convention.
Exercises
· · ·
Exercise 11.10.3 Suppose u, v ∈ Cn and u u = 1, v v = 2, and u v = 2i. Then compute:
·
(a) (2u + v) (u − v).
Exercise 11.10.5 Suppose u and v are vectors in a complex inner product space such that hu, ui = 3,
hv, vi = 2, and hu, vi = i + 1. Then compute:
Exercise 11.10.6 In C[0, 2π ], consider the functions f (x) = sin x+i cos x, g(x) = 1, and h(x) = x. Compute
the following:
(a) h f , gi, (b) h f , hi, (c) h f , f i, (d) k f k .
Exercise 11.10.8 Suppose√ that B = {u1 , u2 , u3 } is√an orthogonal basis for a complex inner product space
V , such that ku1 k = 2, ku2 k = 2, and ku3 k = 3. Moreover, suppose that v ∈ V is a vector such that
hu1 , vi = 1, hu2 , vi = 3i, and hu3 , vi = 1 − i. Find the coordinates of v with respect to B.
Use the Gram-Schmidt procedure to find an orthogonal basis for span {v1 , v2 }. Then find an orthonormal
basis.
Outcomes
A. Determine whether a complex linear transformation is an isometry and/or unitary.
In the context of real inner product spaces, we studied orthogonal functions, orthogonal matrices, and
symmetric matrices. The corresponding concepts in the context of complex inner product spaces are
unitary functions, unitary matrices, and hermitian matrices. We will introduce these concepts in this
section. Since the proofs are similar to those in Section 11.6, we omit most of them.
Definition 11.100: Isometries and unitary maps of complex inner product spaces
Let V ,W be complex inner product spaces. A linear transformation T : V → W is called an isometry
if for all u, v ∈ V ,
hT (u), T (v)i = hu, vi.
An isometry that is also invertible is called a unitary transformation, or simply unitary.
In the case of real inner product spaces, we found in Proposition 11.56 that a square matrix P is the matrix
of an orthogonal transformation (with respect to orthonormal bases) if and only if PT P = I. In the complex
case, we have an analogous property, except that we must use the adjoint instead of the transpose.
11.11. Unitary and hermitian matrices 467
This motivates the following definition. We therefore define a unitary matrix to be an n × n-matrix satis-
fying P∗ P = I.
(a) P is unitary.
(b) P∗ P = I .
(d) PP∗ = I .
(e) P∗ is unitary.
Solution. We have
∗ 1 0 1 0 1 0
A A= = = I,
0 −i 0 i 0 1
468 Inner product spaces
so C is not unitary. Equivalently, we could have checked whether the columns of A, B, and C form
an orthonormal set of vectors (they do, in the case of A and B, but don’t, in the case of C. See also
Example 11.92). ♠
Of course, if P happens to be a matrix with real entries, then P is unitary if and only if it is orthogonal,
T
because in that case P∗ = P = PT .
Recall that a matrix A is called symmetric if A = AT . In the complex world, we are more often
interested in the property A = A∗ . A matrix with this property is called hermitian (after the French
mathematician Charles Hermite, 1822–1901).
Solution. The matrix A is symmetric and also hermitian. The matrix B is symmetric but not hermitian. In
fact, we have
∗ 1 −2i
B = 6= B.
−2i 1
The matrix C is hermitian but not symmetric. The matrix D is symmetric but not hermitian. ♠
A matrix A = ai j is hermitian if and only if ai j = a ji , for all i, j. In particular, the diagonal entries of a
hermitian matrix are always real, and the off-diagonal entries come in complex conjugate pairs. If all of
the entries in a matrix A are real, then A is hermitian if and only if it is symmetric.
Hermitian matrices are of interest, among other things, because their eigenvalues are always real.
Moreover, eigenvectors for distinct eigenvalues are orthogonal. The following proposition is analogous to
Proposition 11.62.
11.11. Unitary and hermitian matrices 469
Proof. (a) Suppose λ is an eigenvalue of A, with eigenvector v. We will evaluate v∗ Av in two different
ways:
v∗ Av = v∗ (Av) = v∗ (λ v) = λ v∗ v,
v∗ Av = (v∗ A)v = (v∗ A∗ )v = (Av)∗ v = (λ v)∗ v = λ v∗ v.
v∗ Aw = v∗ (Aw) = v∗ (µ w) = µ v∗ w,
v∗ Aw = (v∗ A)w = (A∗ v)∗ w = (Av)∗ w = (λ v)∗ w = λ v∗ w.
Its roots are λ1 = 2 and λ2 = 7. For the eigenvalue λ1 = 2, we find the normalized eigenvector
1 2i
v1 = √ .
5 1
For the eigenvalue λ2 = 7, we find the normalized eigenvector
1 1
v2 = √ .
5 2i
Note that these eigenvectors are orthogonal to each other, confirming Proposition 11.107. We therefore
have D = P−1 AP, where
2 0 1 2i 1
D= and P = √ .
0 7 5 1 2i
Note that P is unitary and D is real diagonal. ♠
Exercises
Exercise 11.11.4 Assume A and B are unitary 3 × 3-matrices. Then which of the following matrices are
unitary?
(a) AB (b) A + B (c) iA (d) − A (e) 2A ( f ) B−1 (g) AT (h) A∗ (i) A2 ( j) ABA−1
Exercise 11.11.5 Assume A and B are 3 × 3-matrices, and assume A is unitary and B is invertible. Then
which of the following matrices are unitary?
Exercise 11.11.6 Which of the following matrices are hermitian? Which ones are symmetric?
0 1 i 2 0 i 1 1+i
A= , B= , C= , D= .
1 0 2 −i −i 0 1−i 2
Exercise 11.11.7 Which of the following matrices are hermitian? Which ones are symmetric?
0 −2 1 + i 3 i −1 + 2i i i i
A = −2 3 i , B = −i 0 −1 + i , C = i i i .
1−i i 2 −1 − 2i −1 − i 2 i i i
Exercise 11.11.8 Assume A and B are hermitian 3 × 3-matrices, and A is invertible. Then which of the
following matrices are hermitian?
(a) AB (b) A + B (c) iA (d) − A (e) 2A ( f ) A∗ BA (g) A−1 BA (h) BB∗ (i) A2
2 1+i
Exercise 11.11.9 Unitarily diagonalize the hermitian matrix A = .
1−i 3
0 −1 + 2i
Exercise 11.11.10 Unitarily diagonalize the hermitian matrix A = .
−1 − 2i 4
2 1+i 1−i
Exercise 11.11.11 Unitarily diagonalize the hermitian matrix A = 1 − i 3 −2i .
1 + i 2i 3
472 Inner product spaces
Outcomes
A. Compute the principal components of a matrix A.
C. Find the k-dimensional subspace that best approximates a given collection of data points.
D. Find the k-dimensional affine subspace that best approximates a given collection of data
points.
E. Compute the total squared distance of the data points to the best fit subspace (or best fit affine
subspace).
In this section, we will explore an application of the diagonalization of symmetric matrices called princi-
pal component analysis. Imagine we are given a collection of data points such as the following:
(11.6)
Although these points are spread out in two dimensions, they seem to be located pretty close to a 1-
dimensional subspace. Probably the best way to interpret this particular data set is to think of the points
as being “essentially” on a line, up to some small random errors.
More generally, suppose we are given a collection of data points in n-dimensional space, and we
are looking for a k-dimensional subspace that all data points are close to. This is an important way to
make sense of high-dimensional data. For example, it would be very difficult to visualize data in a 100-
dimensional space. However, it we knew that the data points lie very close to a 2-dimensional subspace,
then we could project all of the points to the subspace to obtain a 2-dimensional image of the data.
To state the problem more precisely, let us introduce the following notation. If W is a subspace of Rn
and v ∈ Rn is a vector, let us write d(v,W ) for the shortest distance from v to W (i.e., the distance from v
11.12. Application: Principal component analysis 473
The following proposition gives us a method for solving the subspace fitting problem. It turns out that
the key ingredient in solving this problem is the diagonalization of symmetric matrices. The method was
discovered by Gale Young in 1937.
1. Let A be the m × n-matrix whose rows are vT1 , . . . , vTm . (Or equivalently, AT is the n × m-matrix
whose columns are v1 , . . . , vm .)
3. By Proposition 11.69, all eigenvalues of B are real and non-negative. Let λ1 , . . . , λn be the
eigenvalues of B, listed according to their multiplicity and in decreasing order, i.e., so that
λ1 ≥ λ2 ≥ . . . ≥ λn ≥ 0. Let u1 , . . . , un be the corresponding eigenvectors.
4. Then W = span {u1 , . . . , uk } is the solution to the subspace fitting problem. Moreover, the
total squared distance of the points to this subspace is
D = λk+1 + . . . + λn .
Find the 1-dimensional subspace that best approximates this collection of points. What is the total
squared distance of the points to the subspace?
1. We have
T 2 −1 2 −6 6 0 1 −2 −7
A = .
−3 0 3 −7 11 −1 6 −3 −6
2. We calculate
T 135 162
B=A A= .
162 270
4. The desired subspace W is spanned by the eigenvector corresponding to the largest eigenvalue, i.e.,
W = span {u1 }. The total squared distance is λ2 = 27.
The space W is shown in the following illustration, along with the original points:
10
10
Of course, the example was rigged to ensure that the eigenvalues are integers. In real life, the entries of A
and B, as well as the eigenvalues and components of the eigenvectors are usually arbitrary real numbers.
♠
(a) Find the 1-dimensional subspace that best approximates this collection of points.
(b) Find the 2-dimensional subspace that best approximates this collection of points.
(c) What is the 3-dimensional subspace that best approximates this collection of points?
In each case, what is the total squared distance of the points to the subspace?
Solution. Again, we follow the steps from Proposition 11.111. We can do the calculations for parts (a),
(b), and (c) at the same time.
11.12. Application: Principal component analysis 475
1. We have
−7 0 2 10 −2 −8 5 −6 9 −2
AT = 4 3 −5 −4 5 −1 4 9 −6 −7 .
5 3 −4 1 4 −5 2 6 3 −8
2. We calculate
367 −154 16
B = AT A = −154 274 170 .
16 170 205
3. The eigenvalues of B are λ1 = 513, λ2 = 306, and λ3 = 27, with corresponding eigenvectors
−2 2 −1
u1 = 2 , u2 = 1 and u3 = −2 .
1 2 2
For part (a), the desired 1-dimensional subspace is spanned by the eigenvector corresponding to the
largest eigenvalue, i.e., it is span {u1 }. The total squared distance is λ2 + λ3 = 306 + 27 = 333.
For part (b), the desired 2-dimensional subspace is spanned by the eigenvectors corresponding to the
two largest eigenvalues, i.e., it is span {u1 , u2 }. The total squared distance is λ3 = 27.
Finally, in part (c), the desired 3-dimensional subspace is spanned by all three eigenvectors; it is of
course R3 itself, since it is the only 3-dimensional subspace. The total squared distance is 0, since all
points lie in the subspace. ♠
The vectors u1 , . . . , un that appear in the solution of the subspace fitting problem are called the principal
components of the matrix A.
The first principal component u1 gives the direction in which the rows of A show the most variability.
The second principal component u2 gives the direction in which the rows of A show the most remaining
variability that is orthogonal to u1 . The third principal component u3 gives the direction of most variability
that is orthogonal to u1 and u2 , and so on.
476 Inner product spaces
In the particular case where n = 2 and k = 1, we are looking for a 1-dimensional subspace, i.e., a line
through the origin, which best fits the given 2-dimensional data, as in the illustration (11.6) above or as in
Example 11.112. On its face, the subspace fitting problem in this case seems similar to the linear curve
fitting problem we solved in Section 11.5. However, there is a subtle but important difference: in linear
curve fitting, we were seeking to minimize the distances of the points from the line in the y-direction,
whereas in subspace fitting, we are seeking to minimize the distances of the points from the subspace in
the direction perpendicular to the subspace. The following pair of pictures illustrates the difference:
minimize
minimize
distances
vertical
perpendicular
distances
to subspace
Linear curve fitting Subspace fitting
Affine fitting
So far, we have been looking to approximate a given collection of points by a subspace, which necessarily
passes through the origin. But sometimes the points may not be near the origin, as in this example:
In this case, approximating the points by a subspace passing through the origin does not make much sense.
Instead, we should be looking for an affine subspace. An affine subspace is similar to a subspace, except
it does not necessarily contain the origin.
11.12. Application: Principal component analysis 477
For example, in Chapter 3, we considered lines and planes in Rn that pass through a given point (not
necessarily the origin). These are examples of affine subspaces of Rn . The affine subspace fitting problem
is analogous to the subspace fitting problem:
It turns out that the optimal solution to the affine subspace fitting problem can be computed by first com-
puting the centroid of the points, shifting the whole problem so that the centroid is at the origin, and then
solving an ordinary subspace fitting problem.
Find the 1-dimensional affine subspace that best approximates this collection of points. What is the
total squared distance of the points to the subspace?
Next, we shift all vectors by −v to get a new collection of vectors w1 , . . . , w9 centered at the origin. For
example,
10 5 5
w1 = v1 − v = − = ,
−6 4 −10
2 5 −3
w2 = v2 − v = − = ,
10 4 6
Next we, we proceed as in Proposition 11.111 to find the best subspace fitting w1 , . . . , w9 . We have
T 5 −3 0 3 −3 −2 −1 5 −4
A =
−10 6 −5 −1 1 −1 7 −5 8
and
T 98 −136
B=A A= .
−136 302
The eigenvalues of B are λ1 = 370 and λ2 = 30, with corresponding eigenvectors
1 2
u1 = and u2 =
−2 1
Thus, the best-fitting 1-dimensional subspace for w1 , . . . , w9 is W = span {u1 }, and the best-fitting 1-
dimensional affine subspace for v1 , . . . , v9 is
5 1
v +W = {v + w | w ∈ W } = +t t ∈R .
4 −2
11.12. Application: Principal component analysis 479
Note that this is the equation of a line passing through the centroid v, and with direction vector u1 . The
points v1 , . . . , v9 , their centroid, and the affine subspace v +W are shown in the following illustration:
10
centroid
10
The United States Senate votes on a lot of things: motions, resolutions, amendments, and bills, among
other things. Many of these votes are roll call votes, which means that the vote of every individual
senator is recorded (as opposed to a voice vote, where only the outcome is recorded). Roll call data
for the last 3 decades is publicly available and can be downloaded from the U.S. Senate website at
https://www.senate.gov/legislative/votes.htm .
We will now explore how to use linear algebra, and in particular principal component analysis, to
gain some useful information from the voting records.2 I have made a spreadsheet containing the votes
of 99 senators for the first 200 roll call votes of 2007. Each row in the spreadsheet corresponds to a
senator, listed in alphabetical order from Daniel Akaka of Hawaii to Ron Wyden of Oregon. I omitted
one senator who died during 2007. Each column of the spreadsheet corresponds to a vote. For exam-
ple, the first roll call vote of 2007 was on a resolution to honour President Gerald Ford (it passed 88
to 0). Each cell of the spreadsheet contains the number 1 if the senator voted “yes”, the number −1
if the senator voted “no”, and the number 0 if the senator did not vote. The spreadsheet is available
from https://www.mathstat.dal.
a/~selinger/linear-algebra/ under “Supplementary materi-
als”. Here are the first few rows and columns of the spreadsheet.
Akaka, Daniel (D) HI 1 1 1 1 1 −1 1 . . .
Alexander, Lamar (R) TN 0 1 −1 1 −1 −1 1 . . .
Allard, A. (R) CO 1 1 −1 −1 −1 1 1 ...
Baucus, Max (D) MT 1 1 1 1 1 −1 1 . . .
Bayh, Evan (D) IN 1 1 1 −1 1 −1 1 . . .
Bennett, Robert (R) UT 1 1 −1 1 1 −1 1 . . .
.. .. .. .. .. .. .. .. . .
. . . . . . . . .
The human mind is not very well equipped to deal with such massive amounts of data. Rather than
listing 122 motions that Senator X supported and 78 motions that she opposed, we like to come up with
2 Thisexample was inspired by Examples 11.2.13 and 11.3.15 of “Coding the Matrix: Linear Algebra through Computer
Science Applications” by Philip N. Klein.
480 Inner product spaces
abstractions, such as Senator X is “conservative”, “pro choice”, “pro business”, “hawkish”, etc. However,
the problem with abstractions is that they do not necessarily mean anything in the real world. In the real
world, a senator’s record is just a sequence of votes.
We will represent each senator by a vector in R200 , which corresponds to a row of the above table. For
example, to Senator Akaka, we associate the vector
1
1
1
1
1 ∈ R200 .
−1
1
..
.
Thus, we can represent each senator (or more precisely, each senator’s voting record) as a point in 200-
dimensional space. In this way, the voting data can be interpreted as 99 points in R200 .
Unfortunately, 200 dimensions are impossible to visualize. But what if the voting records of all the
senators lie on (or at least close to) a much-smaller-dimensional affine subspace? This is actually not an
unreasonable expectation; after all, there are probably only a handful of issues most senators care about.
For example, if a certain senator supports gun control, he will be likely to vote a certain way on measures
that affect gun control. If another senator supports the gun lobby, she is likely to vote the opposite way.
We can thus consider this as an instance of the affine subspace problem: we are looking for a low-
dimensional affine subspace that is close to all 99 points. Following the method of Proposition 11.118,
we first find the centroid of the points, and then we compute a certain 99 × 200-matrix A and a positive
semidefinite 200 × 200-matrix B = AT A. Using software, we can find the eigenvalues and -vectors of B.
The first few eigenvalues (in decreasing order) are:
All of the remaining eigenvalues are less than 200, and the sum of the remaining eigenvalues is λ6 + . . . +
λ200 = 3913.46. This means that the vast majority of the voting behavior of each senator is determined
by a single dimension, given by the eigenvector corresponding to the eigenvalue λ1 . In other words, there
is a 1-dimensional affine subspace that all 99 points are pretty close to. If we project each senator to this
11.12. Application: Principal component analysis 481
McConnell, KY 2007 U.S. Senate voting data, projection to the first principal component:
Whitehouse, RI
Lieberman, CT
Alexander, TN
Coleman, MN
Feinstein, CA
Johnson, SD
McCain, AZ
Clinton, NY
Bennett, UT
Graham, SC
DeMint, SC
Salazar, CO
Snowe, ME
Warner, VA
Nelson, NE
Specter, PA
Nelson, FL
Allard, CO
Obama, IL
Bond, MO
Smith, OR
Hatch, UT
Hagel, NE
Pryor, AR
Enzi, WY
Webb, VA
Cornyn, TX
Voinovich, OH
Brownback, KS
Collins, ME
McCaskill, MO
Murkowski, AK
Reid, NV
Schumer, NY
Sanders, VT
Sessions, AL
Vitter, LA
Cochran, MS
Kennedy, MA
Roberts, KS
Shelby, AL
Lugar, IN
Baucus, MT
Coburn, OK
Biden, DE
Kyl, AZ
Conrad, ND
centroid
For convenience, Republican senators have been shown in red and Democratic and independent senators
in blue. Not all senators have been named, because in some areas they are clustered very densely. An
interpretation of the principal component then immediately suggests itself: it appears to be the “conser-
vative” vs. “liberal” axis. We can use this picture to assist in answering questions such as: “Which party
votes more uniformly?”, “Which state are the most liberal Republicans from?”, “Which state are the most
conservative Democrats from?”, “Was Obama really a radical?”, and “Was McCain really a maverick?”.
If we repeat the same calculation for the 2017 senate, we get the following picture:
2017 U.S. Senate voting data, projection to the first principal component:
McConnell, KY
Manchin, WV
Feinstein, CA
Franken, MN
Donnelly, IN
Warren, MA
Sanders, VT
Collins, ME
Cardin, MD
Bennet, CO
Warner, VA
Heller, NV
Durbin, IL
Kaine, VA
Rubio, FL
Risch, ID
Paul, KY
McCaskill, MO
King, ME
Cruz, TX
Tester, MT
Portman, OH
Klobuchar, MN
Gillibrand, NY
Duckworth, IL
McCain, AZ
Heitkamp, ND
Inhofe, OK
Isakson, GA
Carper, DE
Cortez Masto, NV
Schumer, NY
centroid
We can use this to help answer questions such as “Has the senate become more partisan between 2007
and 2017?”.
If we instead project the data onto the first two principal components, we get the following picture for
482 Inner product spaces
Coburn, OK 2007 U.S. Senate voting data, projection to the first two principal components:
DeMint, SC
Johnson, SD
McCaskill, MO
Vitter, LA
McCain, AZ
Allard, CO
Graham, SC
Obama, IL
Cornyn, TX
Sessions, AL
Clinton, NY
Schumer, NY
Brownback, KS
Biden, DE
Sanders, VT
Whitehouse, RI
Feinstein, CA
Hagel, NE
Reid, NV
Voinovich, OH
Bayh, IN
Shelby, AL
Lieberman, CT
McConnell, KY
Baucus, MT
centroid
Salazar, CO
Pryor, AR
Nelson, NE
Smith, OR
Cochran, MS
Warner, VA
Bond, MO
Snowe, ME
Collins, ME
Lugar, IN
Specter, PA
Murkowski, AK
Coleman, MN
Bennett, UT
The picture clearly shows senators clustering in certain areas. We can use this to help answer certain ques-
tions, for example, “How different was Sanders’s voting record from Clinton’s?”. However, although the
2-dimensional picture seems to reveal more detail, its interpretation is less clear. While it seems obvious
that the horizontal axis corresponds to a conservative vs. liberal world view, it is much less obvious what
the political meaning of the vertical axis is. Maybe it is related to some issue that does not typically follow
party lines, such as North vs. South, rich states vs. poor states, pro-immigration vs. anti-immigration, and
so on. To find a convincing interpretation of the vertical axis, further investigation of the data would be
required (such as, looking at the actual content of the votes in question).
Finally, a word of caution. Whenever we use mathematics to try to draw real-world conclusions
from data, these conclusions should be taken with an extra-large grain of salt. People have an outsized
tendency to trust mathematics and to take its results as infallible. We therefore have a special responsibility
not to overstate any conclusions, and to point out potential pitfalls with the analysis. No matter how
wonderful principal complement analysis is, we must keep in mind that what we are still only looking
at a 2-dimensional projection of a 200-dimensional space. Therefore it is inevitable that lots of details
and nuances are lost. We could get a completely different picture by looking at a different 2-dimensional
projection.
To see how the data can sometimes be misleading, consider the question “How similar is Senator
Tim Johnson, Democrat of South Dakota, to Senators Olympia Snowe and Susan Collins of Maine?”.
In the 1-dimensional picture, it looked as if they were very similar. We could easily rationalize this by
pointing out that Johnson is the most conservative Democrat, and Snowe and Collins are the most liberal
11.12. Application: Principal component analysis 483
Republicans. However, the 2-dimensional picture reveals an interesting nuance, which is that the voting
records of Johnson is not all that similar to that of Snowe and Collins. It is entirely possible that if we add
a third or fourth dimension to the picture, many more additional such details will emerge. In summary,
while principal component analysis is a useful tool, it is just one tool among many, and we always need to
exercise our best judgement in drawing conclusions from data.
Exercises
(a) Find the 1-dimensional subspace that best approximates this collection of points.
(b) Find the 2-dimensional subspace that best approximates this collection of points.
(c) What is the 3-dimensional subspace that best approximates this collection of points?
In each case, what is the total squared distance of the points to the subspace?
Compute the centroid, and then find the 1-dimensional affine subspace that best approximates this collec-
tion of points. What is the total squared distance of the points to the subspace?
Find the 1- and 2-dimensional affine subspaces that best approximate this collection of points. What is the
total squared distance of the points to each subspace?
A. Complex numbers
Throughout history, mankind has invented more and more complicated number systems in an effort to
make algebra easier.
• In the beginning, there were the natural numbers N = {1, 2, 3, . . .}. However, after a while, it
became a problem that certain equations, such as x + 5 = 3, do not have a solution in the natural
numbers.
• To solve this problem, zero and negative numbers were invented, resulting in the set of integers
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .}. In the integers, the equation x + 5 = 3 has a solution, namely
x = −2. But some other equations, such as 2x = 1, still do not have a solution in the integers.
• To solve this problem, the rational numbers Q were invented. In the rational numbers, the equation
2x = 1 has a solution, namely x = 21 . However, some other equations, such as x2 = 2, still do not
have a solution in the rational numbers.
The purpose of this section is to summarize the most important facts about the complex numbers. We
will also see that the above process does not continue. In the complex numbers, all non-trivial polynomial
equations have a solution, and therefore, no additional numbers are “missing”. This property is called the
fundamental theorem of algebra. Gauss is usually credited with giving a proof of this theorem in 1797
but many others worked on it and the first completely correct proof was due to Argand in 1806.
Outcomes
A. Add, subtract, multiply, and divide complex numbers.
485
486 Complex numbers
z = a + bi,
where a and b are real numbers. The set of all complex numbers is denoted C.
The form z = a + bi is called the standard form or Cartesian form of the complex number z. We refer to
a as the real part and to b as the imaginary part of z.
Addition, subtraction, and multiplication of complex numbers are defined in the obvious way, keep-
ing in mind that i 2 = −1. Namely, we have
Division of complex numbers is more complicated. We first note that it is easy to divide a complex number
by a real number. Namely,
a + bi a b
= + i.
r r r
But how can we divide by a complex number? We use the following trick. Let z = a + bi be a complex
number, and consider the product (a + bi)(a − bi). It is equal to
(a + bi)(a − bi) = a2 − b2 i 2 = a2 + b2 .
Therefore, (a + bi)(a − bi) is always a real number, and therefore easy to divide by. Therefore, we can
compute the multiplicative inverse of a complex number z = a + bi as follows:
1 1 1 a − bi a − bi
z−1 = = = = 2 .
z a + bi a + bi a − bi a + b2
2 5
You should verify that this is indeed the inverse, by multiplying 2 + 5i by 29 − 29 i and checking that the
answer is indeed 1.
Division of complex numbers can then be defined in terms of the multiplicative inverse, i.e., wz = zw−1 .
The complex numbers form a field, i.e., they satisfy the nine field axioms. See Section 1.8 for the
definition of a field.
Another useful operation on complex numbers is the complex conjugate. Let z = a + bi be a complex
number. Then the conjugate of z, written z, is given by
z = a − bi.
The magnitude is also sometimes called the absolute value or the modulus of the complex number.
• 3 + 5i = 3 − 5i.
• i = −i.
• 7 = 7.
√ √
• |3 + 5i| = 32 + 52 = 34.
• |i| = 1.
• |−6| = 6.
√
Note that for a real number a, we have a = a. Also, when a is real, the magnitude |a| = a2 is just the
usual absolute value of real numbers. The following two propositions list some basic properties of the
conjugate and of the magnitude.
• z ± w = z ± w.
• (zw) = z w.
• z−1 = z−1 .
• z/w = z/w.
• z = z.
• |z| = |z|.
Exercises
Exercise A.1.1 Let z = 2 + 7i and let w = 3 − 8i. Compute z + w, z − 2w, zw, and wz .
Exercise A.1.4 Use the properties of complex numbers to prove that if z is a complex number, then there
exists a complex number w with |w| = 1 and wz = |z|.
Outcomes
A. View complex numbers as points in the plane.
B. Understand the geometric meaning of addition, subtraction, multiplication, and the complex
conjugate.
C. Understand the geometric meaning of the magnitude and argument of a complex number.
Just as a real number can be considered as a point on the line, a complex number z = a + bi can be
considered as a point (a, b) in the plane whose x coordinate is a and whose y coordinate is b. For example,
in the following picture, the complex number z = 3 + 2i can be represented as the point in the plane with
coordinates (3, 2).
2i z = 3 + 2i
r
i
θ
−1 0 1 2 3
490 Complex numbers
The magnitude r = |z| of a complex number is its distance from the origin. We also define the argument
of z to be the angle θ between the x-axis and the line from the origin to z, counted positively in the
counterclockwise direction. The magnitude r and argument θ are shown in the above picture.
Addition of complex numbers is like vector addition. The effect of multiplying two complex numbers
is to multiply their magnitudes and add their arguments. For example, the following picture illustrates the
multiplication (3 + 2i)(1 + i) = 1 + 5i.
5i 1 + 5i
4i
3i
2i 3 + 2i 2i 2i
i i 1+i i
−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3
To take the multiplicative inverse of a complex number, we take the reciprocal of the magnitude and negate
the argument.
i
z
θ
−θ
1
1
r z−1
The effect of taking the complex conjugate is to reflect the given complex number about the x axis (or
equivalently, keep the magnitude unchanged and negate the argument).
i
z
θ
−θ
1
r
z̄
A.3. The fundamental theorem of algebra 491
Exercises
Exercise A.2.1 Draw the complex numbers z = 2 + i and w = −2 + 3i as points in the plane. Then use
the geometric interpretation to find z + w, z − w, zw, z−1 , z, and |z|.
Exercise A.2.2 Use the geometric interpretation to find a complex number z such that z2 = i. Can you
find two such numbers?
Exercise A.2.3 Use the geometric interpretation to find 3 different complex numbers z such that z3 = −1.
Hint: these numbers will lie on the unit circle.
Outcomes
A. Find the complex roots of a quadratic polynomial.
The complex numbers were invented so that equations such as z2 +1 = 0 would have solutions. In fact, this
equation has two complex solutions, namely z = i and z = −i. However, something much more general
(and surprising) is true: every non-trivial polynomial equation has a solution in the complex numbers. To
understand this statement, recall that a polynomial is an expression of the form
The constants a0 , . . . , an are called the coefficients of the polynomial. If an is the largest non-zero coeffi-
cient, we say that the polynomial has degree n. A polynomial of degree 0 is of the form p(z) = a0 , and is
also called a constant polynomial. Recall that a root of a polynomial is a number z such that p(z) = 0.
The fundamental theorem of algebra is the following:
The proof of this theorem is beyond the scope of this book. Note that the theorem does not say that the
roots are always easy to find. To find the roots of a polynomial of degree 2, we can use the quadratic
formula. However, if the degree is greater than 2, we may sometimes have to use fancier methods, such
492 Complex numbers
as Newton’s method from calculus, or even a computer algebra system, to locate the roots. We give some
examples.
Solution. By the intermediate value theorem of calculus, we know that a cubic polynomial with real
coefficients always has at least one real root. This is because p(z) goes to −∞ when z → −∞ and to ∞
when z → ∞. By trial and error, we find that z = 2 is a root of this polynomial. We can therefore factor out
(z − 2) from this polynomial:
p(z) = z3 − 4z2 + 9z − 10 = (z − 2)(z2 − 2z + 5).
Now we can use the quadratic formula to find the roots of z2 − 2z + 5. We find
√
2 ± −16 2 ± 4i
z= = = 1 ± 2i.
2 2
Thus, the three complex roots of p(z) are z = 2, z = 1 + 2i, and z = 1 − 2i. ♠
The following proposition is an important and useful consequence of the fundamental theorem of algebra:
Proof. If n = 0, then p(z) = a and there is nothing to show. Otherwise, by the fundamental theorem of
algebra, p(z) has at least one complex root, say b1 . From calculus, we know that we can factor out (z − b1 )
from p(z), i.e., we can find a polynomial q(z) of degree n − 1 such that
p(z) = (z − b1 )q(z),
We can repeatedly apply the same procedure to q(z) until p(z) has been factored into linear factors. ♠
Solution. From Example A.11, we know that p(z) has three distinct roots b1 = 2, b2 = 1 + 2i, and b3 =
1 − 2i. We can therefore write
p(z) = a(z − b1 )(z − b2 )(z − b3).
Since the leading term is z3 , we find that a = 1. Therefore
Exercises
Exercise A.3.2 Find the roots of p(z) = z3 + 5z2 + 4z − 10. Hint: one of the roots is z = 1.
Exercise A.3.4 Let p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0 be a polynomial with real coefficients, i.e.,
such that all the ak are real numbers. Suppose that z is a root of p. Show that z is also a root of p.
B. Answers to selected exercises
x + 3y = 1 10 1
1.1.1 , Solution is: x = 13 , y = 13 .
4x − y = 3
3x + y = 3
1.1.2 , Solution is: [x = 1, y = 0]
x + 2y = 1
1.1.4
1.2.2 (a) is a solution to equations 1 and 2, (b) is a solution to equations 2 and 3, (c) is a solution to
equations 1, 2, and 3, (d) is a solution to 1 and 3, and (e) is a solution to equations 1, 2, and 3. Only (c)
and (e) are solutions to the system of equations.
x + 3y = 1 10 1
1.3.2 , Solution is: (x, y) = 13 , 13
4x − y = 3
x + 2y = 1
3 1
1.3.3 2x − y = 1 , Solution is: (x, y) = 5, 5
4x + 3y = 3
x + y − 3z = 2 x + 4z = 0
1.3.4 No solution exists. 2x + y + z = 1 , after elementary operations: y − 7z = 0 . Thus one of
3x + 2y − 2z = 0 0 =1
the equations says 0 = 1 in an equivalent system of equations.
495
496 Answers to selected exercises
4g − I = 150
4I − 17g = −660
1.3.7 , Solution is : {g = 60, I = 90, b = 200, s = 50}
4g + s = 290
g+I +s−b = 0
1.4.6 These can have a solution. For example, x + y = 1, 2x + 2y = 2, 3x + 3y = 3 even has an infinite set
of solutions.
1.4.7 h = 4.
1.4.10 If h 6= 2 there will be a unique solution for any k. If h = 2 and k 6= 4, there are no solutions. If h = 2
and k = 4, then there are infinitely many solutions.
1.4.11 If h 6= 4, then there is exactly one solution. If h = 4 and k 6= 4, then there are no solutions. If h = 4
and k = 4, then there are infinitely many solutions.
1.4.12
There is no solution.
The system is inconsistent. You can
see this from the augmented matrix.
1 2 1 −1 2 1 2 1 −1 2
1 −1 1 1 1 0 −3 0 2 −1
, echelon form: .
2 1 −1 0 1 0 0 −3 0 −2
4 2 1 0 5 0 0 0 0 1
1 0 0 0 9 3
0 1 0 0 −4 0
(d) The echelon form is
0 0 1 0 −7 −1 and so x5 = t, x4 = 1 − 6t, x3 = −1 + 7t, x2 = 4t,
0 0 0 1 6 1
x1 = 3 − 9t.
1 0 2 0 − 12 5
2
1 3
0 1 0 0
(e) The echelon form is 2 2 . Therefore, let x = t, x = s. Then the other
3 1 5 3
0 0 0 1 2 −2
0 0 0 0 0 0
variables are given by x4 = − 2 − 2 t, x2 = 2 − t 2 , x1 = 25 + 12 t − 2s.
1 3 3 1
1.4.25 The last column must not be a pivot column. The remaining columns must each be pivot columns.
1
4 (20 + 30 + w + x) − y = 0
1
1.4.26 You need 4 (y + 30 + 0 + z) − w = 0
. Solution is: [x = 15, y = 20, z = 10, w = 15].
1
4 (20 + y + z + 10) − x = 0
1
4 (x + w + 0 + 10) − z = 0
1.4.28 The rank is the number of pivot entries in the echelon form. There is at most one pivot entry in each
row and column. Therefore, the rank cannot be larger than the number of rows or the number of columns;
in other words, the rank is at most min(m, n).
1.4.29 (a) The echelon form has 4 non-zero rows and 6 columns, so there are 2 free variables, and the
system has infinitely many solutions.
(b) Such a system of equations of equations does not exist. If you add in another column, the rank does
not get smaller.
498 Answers to selected exercises
(c) Such a system of equations does not exist, because the rank cannot equal 4 if there are only two
columns.
(d) The echelon form has 4 non-zero rows on the left-hand side, but 5 non-zero rows if we also include
the right-hand side. Therefore the system is inconsistent, i.e., it has no solutions.
(e) The echelon form has 2 non-zero rows, so there are 2 pivot variables and no free variables. There-
fore, the system has a unique solution.
1.4.30 These are not legitimate row operations. They do not preserve the solution set of the system.
1.5.7 The rank of the coefficient matrix is 3, so both systems have a unique solution. The solution of the
first system is (x, y, z) = (1, 0, 1), and the solution of the second system is (x, y, z) = (1, 1, 2).
1.6.2 The systems (b), (c), and (d) have non-trivial solutions, because the rank is less than the number of
variables.
1.6.3 The general solution is (x, y, z) = (1, 2, 4) + s(1, 0, 1) + t(0, 1, −1), or equivalently, (x, y, z) = (1 +
s, 2 + t, 4 + s − t), where s and t are parameters.
1 0 1 1
1.8.6 (a) Reduced echelon form: . General solution: x = 1 − t, y = 1 − it, z = t.
0 1 i 1
(b) Solution: x = 1, y = 1 + i, z = i.
1.11.2 We have
10I1 − 5I2 = 10
−5I1 + 16I2 − I3 = −12
−I2 + 11I3 = 0.
2.1.2
5−2 3
−→ − → − →
PQ = 0Q − 0P = −2 − 0 = −2
1 − (−4) 5
2−5 −3
−→ − → − →
QP = 0P − 0Q = 0 − (−2) = 2
(−4) − 1 −5
2
2.1.3 We need 5x − 3y = 2x − 2y and 4 = 2y. The unique solution is x = 3 and y = 2.
1
2.2.1 9 .
0
500 Answers to selected exercises
2.3.1
u
− 21 v
v
−u
2v
−55
13
2.3.2
−21 .
39
2.3.3 (a) (k + ℓ)(u + v) = (k + ℓ)u + (k + ℓ)v = ku + kv + ℓu + ℓv. Here we used the distributive law
over vector addition in the first step, and the distributive law over scalar addition in the second step.
(b) We have 0u = (0 + 0)u = 0u + 0u by properties of scalars and by the distributive law, respectively.
Adding −(0u) to both sides of the equation, and using the additive unit law and associativity, we
have 0 = 0u.
(c) We have (−1)u = (−1)u + 0 = (−1)u + (u + (−u)) = ((−1)u + u) + (−u) = ((−1)u + 1u) +
(−u) = ((−1) + 1)u + (−u) = 0u + (−u) = 0 + (−u) = −u. Here, we have used the additive unit
law, the additive inverse law, the associative law, the rule for multiplication by 1, the distributive
law, properties of scalars, the property of part (b), and the additive unit law, respectively.
2.4.2
4 3 2
4 = 2 1 − −2 .
−3 −1 1
2.5.7 This follows from the last property of Proposition 2.20: k−uk = k(−1)uk = |−1| kuk = kuk.
2.5.9
5
−2
1 1 1 1 1 1 1 −3
u= √ , v= √ 3 , w= .
kuk 5 2 kvk 17 kwk 6 1
2
−1
1 2
2 0
2.6.1
·
3 1
= 17.
4 3
T T
2.6.4 cos θ =
3 −1 −1
√ √ ·
1 4 2 −3
=√ √ .
9 + 1 + 1 1 + 16 + 4 11 21
−10
2.6.5 cos θ = √ √ .
1 + 4 + 1 1 + 4 + 49
·
2.6.6 This formula says that u v = kuk kvk cos θ where θ is the included angle between the two vectors.
Thus
·
ku vk = kuk kvk kcos θ k ≤ kuk kvk
and equality holds if and only if θ = 0 or π . This means that the two vectors either point in the same
direction or opposite directions. Hence one is a multiple of the other.
−
→ − → − →
2.6.7 This triangle has sides AB, BC, CA. The direction of the sides does not matter, since scalar multipli-
cation preserves orthogonality.
−→
PQ = (5, −2, 1) − (2, 0, 3) = (3, −2, 4),
−→
QR = (7, 5, 3) − (5, −2, 1) = (2, 7, 2),
−→
RP = (2, 0, −3) − (7, 5, 3) = (−5, −5, −6).
5
1 − 14
2.6.8
v w·v=
−5 5
2 = − 7 .
v v· 14
3 − 15
14
1 − 12
2.6.9
v w·v=
−5
0 = 0 .
v v· 10
3 − 32
1
h iT h iT 1 − 14
2.6.10
v w·v=
1 2 3 0 · 1 2 −2 1
2
=
2
− 14
.
v v· 1+4+9 3 3
− 14
0 0
2.6.13 No, it does not. The vector 0 has no direction. The formula for proj0 (w) doesn’t make sense either.
2.6.14
−3/2 9/2
a = 3/2 , b = 1/2 .
−3 −2
2.6.15 ! !
u− ·
v u
2
v · u− ·
v u
2
v = kuk2 − 2(v u)2 · 1
2
+ (v u)2· 1
≥ 0,
kvk kvk kvk kvk2
and so
kuk2 kvk2 ≥ (v u)2 . ·
We get equality exactly when u = projv (u) =
v u ·
v, or in other words, when u is a multiple of v.
kvk2
2.6.16
u·v
·
2.6.17 u (v−proju (v)) = u v −
kuk2
· · · ·
u u = u v −u v = 0. Therefore, we can write v = (v−proju (v)) +
proju (v). The first is orthogonal to u and the second is a multiple of u so is parallel to u.
503
2.7.2 If a 6= 0, then the condition says that ka × uk = kak sin θ = 0 for all angles θ . Hence a = 0.
1 3 8 √
2.7.3 2 × −2 = 8 . The area of the parallelogram is 8 3.
3 1 −8
1 4 6 √ √
2.7.4 0 × −2 = 11 . The area is of the parallelogram is 36 + 121 + 4 = 161.
3 1 −2
4
−→ − →
2.7.5 Let P = (−2, 3, 1), Q = (2, 1, 1), R = (1, 2, −1), and S = (5, 0, −1). We have PQ × PR = −2 ×
0
3 5
−→
−1 = 8 . The area of the parallelogram is
−→
√ 2 √
PQ × PR
= 5 + 82 + 22 = 93.
−2 2
3 −4 1
−→ − →
2.7.6 Let P = (1, 0, 3), Q = (4, 1, 0), and R = (−3, 1, 1). PQ × PR = 1 × 1 = 18 . The
−3 −2 7
1
−→ −
→
1 p 1 √
area of the triangle is
PQ × PR
= 1 + 182 + 72 = 374.
2 2 2
1 2 0
−→ − →
2.7.7 Let P = (1, 2, 3), Q = (2, 3, 4), and R = (3, 4, 5). PQ × PR = 1 × 2 = 0 . The area of
1 2 0
the triangle is 0. It means the three points are on a line.
2.7.8 (i × j) × j = k × j = −i. However, i × (j × j) = 0 and so the cross product is not associative. The
expression u × v × w has no meaning.
and
v1 w1
· ·
(u w) v − (u v) w = (u1 w1 + u2 w2 + u3 w3 ) v2 − (u1 v1 + u2 v2 + u3 v3 ) w2
v3 w3
504 Answers to selected exercises
u1 w1 v1 + u2 w2 v1 + u3 w3 v1 − u1 v1 w1 − u2 v2 w1 − u3 v3 w1
= u1 w1 v2 + u2 w2 v2 + u3 w3 v2 − u1 v1 w2 − u2 v2 w2 − u3 v3 w2 .
u1 w1 v3 + u2 w2 v3 + u3 w3 v3 − u1 v1 w3 − u2 v2 w3 − u3 v3 w3
We can see by careful inspection that these are equal.
2.7.11
u × (v × w) + v × (w × u) + w × (u × v)
· · · · ·
= ((u w) v − (u v) w) + ((v u) w − (v w) u) + ((w v) u − (w u) v) ·
· · · · ·
= ((u w) v − (w u) v) + ((v u) w − (u v) w) + ((w v) u − (v w) u)
= 0.
·
1 −7 −5 8 −5
2 3
·
2.7.12 1 × −2 −6 = −23 −6 = 113.
3 5 3
·
2.7.13 · ·
(a) We have [u, v, w] = (u × v) w = [−2, 1, 2]T [0, 0, 1]T = 2, so the box product is positive.
This means that the system of vectors u, v, w is right-handed.
· ·
(b) We have [u, v, w] = (u × v) w = [−2, −1, 1]T [1, 1, 2]T = −1, so the box product is negative and the
system is left-handed.
· ·
(c) We have [u, v, w] = (u × v) w = [−1, −1, 1]T [3, 1, 4]T = 0, so the box product is zero. This means
that the volume of the parallelepiped spanned by u, v, w is empty, i.e., the vectors are coplanar.
· ·
(d) We have [u, v, w] = (u × v) w = [0, 0, 2]T [2, 0, −1]T = −2, so the box product is negative and the
system is left-handed.
2.7.14 Yes. It will involve the sum of a product of integers and so it will be an integer.
2.7.15 It means that if you place them so that they all have their tails at the same point, the three will lie
in the same plane.
· ·
(v × w) × (w × z) = ((v × w) z) w − ((v × w) w) z.
Since v × w is orthogonal to w, their dot product (v × w)·w is zero, and therefore
(v × w) × (w × z) = ((v × w)·z) w.
But then
· · · ·
(u × v) ((v × w) × (w × z)) = (u × v) (((v × w) z) w) = ((v × w) z) ((u × v) w). ·
2.7.18 We have
ku × vk2 = kuk2 kvk2 sin2 θ
= kuk2 kvk2 (1 − cos2 θ )
= kuk2 kvk2 − kuk2 kvk2 cos2 θ
= kuk2 kvk2 − (u v)2 , ·
which implies the expression equals 0.
505
3.1.6 We have
x 1 1 3 −1
y = 2 + (2 − s) 0 = 2 + s 0 .
z 0 1 2 −1
3.1.15 From trigonometry, we have the following properties of the cosine function:
• cos(π − θ ) = − cos θ .
cos θ = ·
u v
.
kuk kvk
If 0 ≤ θ ≤ π2 , the answer is θ . If π
2 ≤ θ ≤ π , the answer is φ = π − θ . But in the last case, the dot product
is negative and we have
3.2.2 We have
x 1 1 −1 2 −2 −1
y 2
−1 2
+ (1 − r1 ) 0
+ (r1 + r2 ) + r1 −1 + r2 −1 .
z = 0 0 1 = 0 1 1
w 0 1 0 1 −1 0
3.2.8 The general solution of ax + by + cz + dw = e involves at least three parameters. But the vector
equation of a plane only has two parameters, therefore ax + by + cz + dw = e does not describe a plane. (It
describes a 3-dimensional so-called hyperplane inside R4 ).
3.2.13 (a) If the dot product is negative, φ will be greater than π2 , and therefore θ will end up being
negative. We can fix this, similarly to Exercise 3.1.15, by taking the absolute value of the dot product, i.e.,
by solving
cos φ =
|n d| · (2.1)
knk kdk
(b) From trigonometry, we know that sin θ = sin( π2 − φ ) = cos φ . Together with (2.1), this gives the desired
formula.
·
3.2.15 x (a × b) = 0.
4.1.1 Yes, a 1×1-matrix is both a row vector and a column vector. However, a column vector of dimension
2 or greater can never be equal to a row vector, because one is an n × 1-matrix and the other is a 1 × n-
matrix.
506 Answers to selected exercises
4.1.2 x = 2, y = −1, z = 2.
4.1.3 2 × 3, 3 × 3, 4 × 2.
4.1.4 7.
0 6 0
4.2.3 The equation simplifies to X = B + B, so X = .
2 −2 2
By definition of equality of matrices, this is equivalent to the system of four scalar equations x + 0y + 0z +
w = 1, 2x + y + 2z + 0w = 2, 3x + y − z + 0w = 6, 5x + y + 2z + w = 5. Solving the system of equations,
we find x = 1, y = 2, z = −1, w = 0.
4.3.3 0A = (0 + 0)A = 0A + 0A. Now add −(0A) to both sides. Then 0 = 0A.
4.4.1
1 2 4 0 1
(a) , (b) , (c) , (d) , (e) .
−1 0 −1 0 2
4.4.3 By columns:
a11 a12 a13 a11 x1 + a12 x2 + a13 x3
x1 a21 + x2 a22 + x3 a23 = a21 x1 + a22 x2 + a23 x3 .
a31 a32 a33 a31 x1 + a32 x2 + a33 x3
1 3 2 0
1 0 2 0
4.4.5 A =
0
0 6 0
1 3 0 1
−3 −6 −9
4.4.6 (a) .
−6 −3 −21
8 −5 3
(b) .
−11 5 −4
(c) Not possible.
−3 3 4
(d) .
6 −1 7
(e) Not possible.
4.4.9
1 2 1 2 7 2k + 2
= ,
3 4 3 k 15 4k + 6
1 2 1 2 7 10
= .
3 k 3 4 3k + 3 4k + 6
4.4.10
1 2 1 2 3 2k + 2
= ,
3 4 1 k 7 4k + 6
1 2 1 2 7 10
= .
1 k 3 4 3k + 1 4k + 2
However, 7 6= 3 and so there is no possible choice of k which will make these matrices commute.
−71 168
4.4.12 .
−112 265
1 −1 1 1 2 2
4.4.13 A = ,B= ,C= .
−1 1 1 1 2 2
1 −1 1 1
4.4.14 A = ,B= .
−1 1 1 1
0 1 0 1 2 0 3 4 0 2 1 0
4.4.15 A = 1 0 0 , B = 3 4 0 . Then AB = 1 2 0 and BA = 4 3 0 .
0 0 0 0 0 0 0 0 0 0 0 0
1 0
4.4.16 A =
0 −1
x 1
4.5.8 (a) y = − 23 .
z 0
x −12
(b) y = 1 .
z 5
x 3c − 2a
y = 13 b − 23 c .
z a−c
4.5.10 Multiply on both sides on the left by A−1 . Thus 0 = A−1 0 = A−1 (AX ) = (A−1 A)X = IX = X .
4.5.11 (AB)B−1A−1 = A(BB−1 )A−1 = AA−1 = I and B−1 A−1 (AB) = B−1 (A−1 A)B = B−1 IB = B−1 B = I.
510 Answers to selected exercises
4.5.12 It is not possible, because in that case, we would have A = IA = (AB)A = A(BA) = A0 = 0, and
therefore AB = 0, contradicting AB = I.
4.5.15 A−1 A = AA−1 = I and so A is an inverse of A−1 . Since (A−1 )−1 , by definition, is also an inverse of
A−1 , we have (A−1 )−1 = A by uniqueness.
4.5.16 (a) B is a right inverse of A, (b) B is a left inverse of A, (c) B is both a right inverse and left inverse
of A, (d) B is neither a right inverse nor a left inverse of A.
1 0 1 0
1 0 0
4.5.17 A = , B = 0 1 , C = 0 1 .
0 1 0
0 0 1 1
4.5.19 To solve for A, we invert both sides of the equation (A + B)−1 = CB−1 and use matrix algebra to
get A + B = (CB−1 )−1 = (B−1 )−1C−1 = BC−1 . Therefore, A = BC−1 − B.
To solve for B, we note that A = BC−1 − B = B(C−1 − I). Multiplying both sides of the equation on
the right by the inverse of C−1 − I, we get B = A(C−1 − I)−1 .
To solve for C, we take the original equation (A + B)−1 = CB−1 and right-multiply both sides of the
equation B. This yields C = (A + B)−1 B.
4.5.20 The matrix A is right invertible. Two possible right inverses are
1 −2 −2 1
0 1 and 0 1 .
0 0 1 −1
The matrices B and C are not right invertible. The matrix D is right invertible with inverse
−3 2
.
2 −1
Since D is square, its right inverse is actually an inverse, and therefore unique.
511
0 1 2 0 1 0
4.6.1 (a) E = , (b) E = , (c) E = .
1 0 0 1 2 1
0 −1 −2
4.7.1 X T Y = 0 −1 −2 , XY T = 1.
0 1 2
−3 −9 −3
4.7.2 (a) .
−6 −6 3
5 −18 5
(b) .
−11 4 4
(c) −7 1 5 .
1 3
(d) .
3 9
13 −16 1
(e) −16 29 −8 .
1 −8 5
5 7 −1
(f) .
5 15 5
(g) Not possible because B is a 2 × 3-matrix and E is a 2 × 1-matrix, cannot multiply BE.
4.7.4 We have A = AT = −A. Therefore, each entry ai j of A is equal to its own negation. This implies that
ai j = 0, and therefore A is the zero matrix.
4.7.5 A = 12 (A + AT ) + 12 (A − AT ).
4.7.6 If A is antisymmetric then A = −AT . It follows that aii = −aii and so each aii = 0.
· ·
4.7.8 (Au) v = (Au)T v = (uT AT )v = uT (AT v) = u (AT v).
4.7.9 We need to show that (A−1 )T is the inverse of AT . From properties of the transpose,
4.7.10 We have (A−1 )T = (AT )−1 = A−1 , and therefore A−1 is equal to its own transpose, hence symmet-
ric.
4.9.3 The first two plaintext blocks are (8, 5), (12, 12) and the first two ciphertext blocks are (20, 7), (22, 24).
Eve solves the equation
−1 20 22 8 12
A =
7 24 5 12
to find the secret decryption matrix
−1 3 5
A = .
1 2
The plaintext is “Hello, password is kiwifruit”.
5.1.1 (a) x is not in the span. (b) y = 7u1 − 5u2 . (c) z = −3u1 + 4u2 .
5.1.6 (a) x = 3u2 . (b) y is not in the span. (c) z = 2u1 + 1u2 .
2 1 3 3 1 4 5 −2
5.2.6 0 3 3 −3 ≃ 0 1 1 −1 . Linearly independent subset: {u1 , u2 }.
3 5 8 1 0 0 0 0
5.2.10
1 2 2 5 12 1 1 2 5 12
1 2 7 7 17 ≃ 0 0 5 2 5 .
−2 −4 −4 −10 −24 0 0 0 0 0
Linearly independent subset: {u1 , u3 }. Since the rank is 2, this is the smallest possible.
5.2.12 We write the vectors as the columns of a matrix and reduce to reduced echelon form over Z3 :
2 1 2 0 1 0 2 2
1 1 0 1 ≃ ... ≃ 0 1 1 2 .
0 2 2 1 0 0 0 0
5.2.13 From a(u + v) + b(2u + w) + c(w − 2v) = 0 we get (a + 2b)u + (a − 2c)v + (b + c)w = 0. Since
u, v, w are linearly independent, this last system has only the trivial solution, so a + 2b = 0, a − 2c = 0, and
b + c = 0. However, these three equations have non-trivial solutions, for example (a, b, c) = (2, −1, 1). So
the vectors u + v, 2u + w, and w − 2v are linearly dependent.
5.2.14 From a(u + v) + b(u + w) + c(w + v) = 0 we get (a + b)u + (a + c)v + (b + c)w = 0. Since u, v, w
are linearly independent, this last system has only the trivial solution, so a +b = 0, a +c = 0, and b +c = 0.
Solving, we find the unique solution (a, b, c) = (0, 0, 0). So the vectors u + v, u + w, and w + v are linearly
independent.
a1 Az1 + . . . + ak Azk = a1 w1 + . . . + ak wk = 0.
Since the wi are linearly independent, it follows that each ai = 0. Therefore the zi are linearly independent
as well.
1 1
0 0
5.3.1 (a) No. We have / V1 .
0 ∈ V1 but 10 0 ∈
0 0
1 1
1 1
(b) This is not a subspace. The vector is in V2 . However, (−1)
1
is not.
1
1 1
(c) This is a subspace. It contains the zero vector and is closed with respect to vector addition and scalar
multiplication.
514 Answers to selected exercises
0 0 0
0
0 0
(d) This is not a subspace. The vector is in V4 . However (−1)
1 = is not.
1 −1
0 0 0
(e) This is a subspace. It contains the zero vector and is closed with respect to vector addition and scalar
multiplication.
5.3.2 Yes, this is a subspace because it contains the zero vector and is closed with respect to vector addition
· · ·
and scalar multiplication. For example, if u, v ∈ M, then w u = 0 and w v = 0, therefore w (u + v) = 0,
therefore u + v ∈ M.
(c) V3 is a subspace.
(d) V4 is not a subspace. For example, it does not contain the zero vector.
5.3.5 Because 0 ∈ V and 0 ∈ W , we have 0 ∈ V ∩W . To show that V ∩W is closed under addition, assume
u, v ∈ V ∩W . Then u, v ∈ V , and since V is a subspace we have u + v ∈ V . Similarly u, v ∈ W , and since
W is a subspace we have u + v ∈ W . It follows that u + v is in both V and W , and therefore in V ∩ W .
To show that V ∩W is closed under scalar multiplication, assume u ∈ V ∩W and k ∈ R. Then u ∈ V , and
therefore ku ∈ V . Similarly ku ∈ W , and therefore also ku ∈ V ∩W .
2 2
3 , 2 , dimension 2.
5.4.11 (a) Basis 1 1
4 0
1
(b) Basis 3 , dimension 1.
1
2 3
(c) Basis 0 , 1 , dimension 2.
1 0
5.4.14 (a) No. For example, the vectors {0, 0, 0, 0, 0} are linearly dependent.
(b) No. For example, the vectors {0, 0, 0, 0} are linearly dependent.
515
(f) No, in fact no such set is a basis of R5 , since every basis of R5 consists of 5 vectors by Theorem 5.38.
(i) Yes, because a linearly independent set of vectors is a basis of the subspace it spans.
5.4.15 No. As a 5-dimensional space, R5 is spanned by 5 vectors. By the Exchange Lemma, any linearly
independent set can have size at most 5.
2 1
5.5.3 Rank 2, nullity 1, basis of column space 1 , 3 , basis of row space {[1, 2, 0] , [0, 0, 1]},
3 2
3
basis of null space 1 .
0
5.5.5 Let
a11 · · · a1n
A = ... . . . ... .
am1 · · · amn
Since col(A) is the span of the columns of A, we have v ∈ col(A) if and only if there exists scalars u1 , . . . , un
such that
a11 a1n
v = u1 ... + . . . + un ... .
am1 amn
But this equation is equivalent to
a11 · · · a1n u1
v = ... . . . ... ... = Au.
am1 · · · amn un
Therefore, v ∈ col(A) if and only if v is of the form Au, for some u ∈ Rn . In other words, col(A) =
{Au | u ∈ Rn }.
516 Answers to selected exercises
5.5.6 The row space of A is the same as the column space of AT (except that it uses row vectors instead of
column vectors). Therefore, rank(A) = dim(row(A)) = dim(col(AT )) = rank(AT ).
5.5.7 From the theory of elementary matrices, we know that B can be written as a product of elementary
matrices B = E1 · · · Ek . It follows that BA = E1 · · · Ek A. Since each elementary matrix corresponds to an
elementary row operation, it follows that BA and A are row equivalent. Therefore, BA and A have the
same row space. It follows that rank(BA) = dim(row(BA)) = dim(row(A)) = rank(A). This proves the
first claim. To show the claim about AC, first note that by the above argument, rank(CT AT ) = rank(AT ),
because CT is invertible. Then rank(AC) = rank(A) follows by taking the transpose.
5.5.8 Let {w1 , . . . , wk } be a basis of col(B) ∩ null(A). Then for each i, we have wi ∈ col(B), and therefore
by Exercise 5.5.5, we can find zi such that wi = Bzi . Also let {u1 , . . . , ur } be a basis for null(B). Now
assume x ∈ null(AB). Then ABx = 0, and therefore Bx ∈ null(A) ∩ col(B). We therefore have
Bx = c1 w1 + . . . + ck wk = B(c1 z1 + . . . + ck zk ).
This implies
x − (c1 z1 + . . . + ck zk ) ∈ null(B)
and so it is of the form
x − (c1 z1 + . . . + ck zk ) = d1 u1 + . . . + dr ur .
It follows that
x ∈ span {z1 , . . . , zk , u1 , . . . , ur } .
Since we have shown that every element of null(AB) is in the span of these k + r vectors, it follows that
dim(null(AB)) ≤ k + r
= dim(col(B) ∩ null(A)) + dim(null(B))
≤ dim(null(A)) + dim(null(B)),
6.1.1 T1 and T3 are linear, and T2 is not. The transformation T3 is called the zero transformation.
6.1.3 We have T (av + bw) = A(av + bw) = a(Av) + b(Aw) = aT (v) + bT (w) by properties of matrix
multiplication. Therefore, T is linear by Proposition 6.3.
6.1.4 We have
T (av + bw) = av + bw − ·
u (av + bw)
u
kuk2
= av − a ·
u v
2
u + bw − b ·
u w
u
kuk kuk2
= aT (v) + bT (w).
Therefore, T is linear.
517
6.1.5 If T were a linear transformation, it should satisfy T (0) = 0, but it does not. Also T (v + w) 6=
T v + T w.
6.2.1 (a) The matrix of T is the elementary matrix that is like the identity matrix, except that the ( j, j)-
entry is b.
(b) The matrix of T is the elementary matrix that is like the identity matrix, except that the (i, j)-entry
is b.
(c) The matrix of T is the elementary matrix that switches the i th and the j th rows.
6.2.2 Since u1 , . . . , un is a basis of Rn , we know that A is invertible by Proposition 5.31. For each i =
1, . . . , n, since Aei = ui , we have A−1 ui = ei , and therefore
Now let w be some arbitrary vector in Rn . Since u1 , . . . , un is a basis, there exists a1 , . . . , an such that
w = a1 u1 + . . . + an un . Then
6.2.3
−1
5 1 5 1 −1 0 5 1 5 3 2 1 37 17 11
1 1 3 2 −1 −1 = 1 1 3 2 2 1 = 17 7 5 .
3 5 −2 −6 5 2 3 5 −2 4 1 1 11 14 6
6.2.4
−1
1 2 6 1 −1 0 1 2 6 6 3 1 52 21 9
3 4 1 1 0 −1 = 3 4 1 5 3 1 = 44 23 8 .
1 1 −1 −8 6 3 1 1 −1 6 2 1 5 4 1
6.2.6 Recall that the desired matrix has i th column equal to proju (ei ) = ·
u ei
u. Therefore, the matrix is
kuk2
1 −2 3
1
−2 4 −6 .
14
3 −6 9
6.2.7
1 5 3
1
5 25 15 .
35
3 15 9
518 Answers to selected exercises
π
1 √
cos 3 − sin π3 − 1
3
6.3.1 π π = 1√
2 2
1
sin 3 cos 3 2 3 2
v
v′
T (v)
We have
v′ = proju v =
u v · u.
kuk2
·
But since u is a unit vector, this simplifies to v′ = (u v)u. From the above picture, we see that T (v) =
·
v + 2(v′ − v) = 2v′ − v = 2(u v)u − v. To get the matrix of this linear transformation, we must compute
the image of the standard basis vectors:
2
a 1 2a − 1
·
T (e1 ) = 2(u e1 )u − e1 = 2a
b
−
0
=
2ab
a 0 2ab
·
T (e2 ) = 2(u e2 )u − e2 = 2b
b
−
1
=
2b2 − 1
.
6.4.5 √
1 0 0 cos π6 − sin π6 0 1
2 3 − 12 0
√
0 1 0 sin π6 cos π6 0 = 12 1
2 3 0
0 0 −1 0 0 1 0 0 −1
519
6.4.6 (a) T (0) = T (00) = 0T (0) = 0. (b) T (−v) = T ((−1)v) = (−1)T (v) = −T (v). (c) T (a1 v1 + . . . +
ak vk ) = T (a1 v1 ) + . . . + T (ak vk ) = a1 T (v1 ) + . . . + ak T (vk ).
Now,
2 −4 2 8
(S ◦ T )(v) = BAv = = .
10 8 −1 12
6.4.8 We have
1 2 2 −4
(S ◦ T )(v) = S(T (v)) = B(T (v)) = = .
−1 3 −3 −11
6.4.9 The inverse of a reflection is a reflection, namely, itself. (For example, reflecting twice about the
x-axis returns each vector to its original position). The inverse of a rotation is a rotation by the same angle
in the opposite direction. The inverse of a shearing is a shearing in the opposite direction. The inverse of
a scaling by factor a is a scaling by factor 1/a.
7.2.4
1 2 1
2 1 3 = 6.
2 1 1
7.2.5
2 3 1 1
4 3 1 2
= 2.
1 1 0 1
3 2 1 2
7.2.6
1 2 −1 1
0 1 2 1
= 6.
0 2 1 3
1 4 0 2
520 Answers to selected exercises
7.2.7
1 0 0 1
2 1 1 0
= −4.
0 0 0 2
2 1 3 1
7.4.1 (a)
1 2 1
2 3 2 = −6.
−4 1 2
(b)
2 1 3
2 4 2 = −32.
1 4 −5
(c) One can row reduce this, using only row operations of the third kind, to
1 2 1 2
0 −5 −5 −3
9 .
0 0 2 5
0 0 0 − 63
10
(d) One can row reduce this, using only row operations of the third kind, to
1 4 1 2
0 −10 −5 −3
19 .
0 0 2 5
0 0 0 − 211
20
(c) The first row was added to the second row and det(B) = det(A).
7.5.2 By assumption, we can obtain a row of zeros by doing row operations. Row operations do not
change whether the determinant is zero, so the determinant must have been zero all along.
521
7.5.4 The matrix kI has k down the main diagonal and has determinant equal to kn by Theorem 7.14.
Using Theorem 7.23, it follows that det(kA) = det(kIA) = det(kI) det(A) = kn det(A).
7.5.5
1 2 −1 2
det = −8,
3 4 −5 6
1 2 −1 2
det det = −2 × 4 = −8.
3 4 −5 6
1 0 −1 0
7.5.6 This is not true at all. Consider A = and B = . Then det(A) = 1, det(B) = 1,
0 1 0 −1
and det(A + B) = 0.
7.5.7 Since Ak = 0, we have det(A)k = det(Ak ) = det(0) = 0. Therefore, it must be the case that det(A) = 0.
7.5.8 If A is orthogonal, we have det(A)2 = det(AT ) det(A) = det(AT A) = det(I) = 1. Therefore the only
possible values for det(A) are ±1.
1
7.5.9 det(A) = det(P−1 BP) = det(P−1 ) det(B) det(P) = det(P) det(B) det(P) = det(B).
Therefore the determinant is 0 if a = 1, b = 1, or a = b. In all other cases, the determinant is non-zero and
the matrix is invertible.
7.5.11 This follows because det(ABC) = det(A) det(B) det(C) and if this product is non-zero, then each
determinant in the product is non-zero. Therefore, each of these matrices is invertible.
7.5.12 The given condition is what it takes for the determinant to be non-zero. Recall that the determinant
of an upper triangular matrix is just the product of the entries on the main diagonal. The inverse will
also be upper triangular; this can be seen by noting that every invertible upper triangular can be written
as a product of upper triangular elementary matrices; the inverse of each such elementary matrix is upper
triangular, and therefore so is their product. The analogous statement about lower triangular matrices is
also true.
1 1 0
7.5.13 (a) False. Consider 1 2 1 .
0 1 1
(b) True.
522 Answers to selected exercises
(c) False.
(d) False.
(e) True.
(f) True.
(g) True.
(h) True.
(i) True.
(j) True.
T
1
1 3 −6 7 − 72 2
7
1
(b) det(B) = 7, so B is invertible. The inverse is −2 1 5
=
3
7 7
1
− 17 .
7
6 5 2
2 −1 2 −7 7 7
1 0 −3
(c) det(C) = 3, so C is invertible. The inverse is
−3
2 1
3
5 .
3
2 1 2
3 −3 −3
−1 −1 4
1
(b) det(B) = −15, so B is invertible. The inverse is B−1 = −4 11 1 .
15
8 −7 −2
7.6.4 We have
6 12 6 6 12 −6
det(A) = 36, cof(A) = 12 6 −12 , adj(A) = 12 6 6 ,
−6 6 6 6 −12 6
and therefore
1 1
6 12 −6 6 3 − 16
1 12
A−1 = 6 6 = 1 1 1 .
36 3 6 6
1
6 −12 6 6 − 31 1
6
7.6.5 We have
−9 11 8 −9 −2 5
det(A) = 2, cof(A) = −2 2 2 , adj(A) = 11 2 −5 ,
5 −5 −4 8 2 −4
and therefore
−9 −2 5 − 92 −1 5
2
1
11
A−1 = 2 −5
=
11
1 5
−2 .
2 2
8 2 −4 4 1 −2
7.6.6 No. It has non-zero determinant det(A) = cos2 t + sin2 t = 1 for all t, so it is invertible for all t.
1 t t2
√
7.6.7 det(A) = 0 1 2t = t 3 + 2, and so A has no inverse when t = − 3 2.
t 0 2
7.6.8 Since the matrix A has two identical rows, we have det(A) = 0 for all t. So this matrix is non-
invertible for all t.
7.6.9
et e−t cost e−t sint
det et −e−t cost − e−t sint −e−t sint + e−t cost = 5e−t 6= 0
et 2e−t sint −2e−t cost
and so this matrix is always invertible.
524 Answers to selected exercises
7.6.10
et 0 0
det 0 cost sint = et .
0 cost − sint cost + sint
Hence the inverse is
T −t
1 0 0 e 0 0
e−t 0 et cost + et sint −et (cost − sint) = 0 cost + sint − sint .
0 −et sint et cost 0 sint − cost cost
7.6.11 −1
1 −t 1 −t
et cost sint 2e 0 2e
et − sint cost = 1 1 1 1 .
2 cost + 2 sint − sint 2 sint − 2 cost
1 1
t
e − cost − sint 2 sint − 2 cost cost − 12 cost − 12 sint
7.7.1 False. Cramer’s rule only works when the coefficient matrix is invertible. In these cases, the solution
is always unique.
8.1.8 We have λ A−1 v = A−1 λ v = A−1 Av = v. Since v 6= 0, this implies λ 6= 0. Moreover, it implies
A−1 v = λ −1 v. Thus, λ −1 is an eigenvalue of A−1 .
525
8.1.9 Say Av = λ v. Then cAv = cλ v and so the eigenvalues of cA are just cλ where λ is an eigenvalue of
A.
8.1.10 Suppose v is an eigenvector of B, i.e., Bv = λ v. Then BAv = ABv = Aλ v = λ Av, and therefore Av
is an eigenvector of B.
8.1.12 The formula follows from properties of matrix multiplication. However, this vector might not be
an eigenvector because it might equal 0 and eigenvectors cannot equal 0.
0 1
8.2.8 Yes. works.
0 0
8.3.2 A rotation by 60◦ cannot map any non-zero vector to a scalar multiple of itself. The only rotations
of R2 that have real eigenvalues are rotations by 180◦ and 0◦ . The former has eigenvalue −1, and the latter
has eigenvalue 1 (with all vectors being eigenvectors).
1 0
8.3.3 The matrix of T is . The eigenvectors and eigenvalues are:
0 −1
0 1
for eigenvalue −1, for eigenvalue 1.
1 0
0 1
8.3.4 The matrix of T is . The eigenvectors and eigenvalues are:
1 0
1 −1
for eigenvalue −1, for eigenvalue 1.
1 1
1 0 0
8.3.5 The matrix of T is 0 1 0 The eigenvectors and eigenvalues are:
0 0 −1
0 1 0
0 for eigenvalue −1, 0 , 1 for eigenvalue 1.
1 0 0
8.4.3 The eigenvalues are −1 and 1. The eigenvectors corresponding to the eigenvalues are:
10 7
−2 for eigenvalue −1, −2 for eigenvalue 1.
3 2
Since there are only 2 linearly independent eigenvectors, this matrix is not diagonalizable.
0 0 2
8.5.5 A = −1 1 2 .
−1 0 3
8.9.1 (a) λ = 2 has algebraic multiplicity 2 and geometric multiplicity 1. Since the sum of the geometric
multiplicities of all eigenvalues is 1, the matrix is not diagonalizable.
(b) λ = 1 has algebraic and geometric multiplicity 1; λ = −3 has algebraic and geometric multiplicity
1. Since the sum of the geometric multiplicities is 2, the matrix is diagonalizable.
(c) λ = 2 has algebraic and geometric multiplicity 2, λ = 3 has algebraic and geometric multiplicity 1.
Since the sum of the geometric multiplicities is 3, the matrix is diagonalizable.
(d) λ = 3 has algebraic multiplicity 3 and geometric multiplicity 2. Since the sum of the geometric
multiplicities of all eigenvalues is 2, the matrix is not diagonalizable.
(e) λ = −1 has algebraic multiplicity 3 and geometric multiplicity 1. Since the sum of the geometric
multiplicities of all eigenvalues is 1, the matrix is not diagonalizable.
(b) Diagonalizable.
8.10.3 (a) The characteristic polynomial of A is a quadratic polynomial, and therefore it is of the form
p(λ ) = λ 2 + bλ + c, for some r, s ∈ R. By the Cayley-Hamilton theorem, p(A) = 0, therefore
A2 = −bA − cI. This proves that A2 is a linear combination of A and I.
0 1 0
(b) A = 0 0 1 .
0 0 0
(b) The characteristic polynomial is λ 2 + 1, the eigenvalues are λ1 = i and λ2 = −i, and the corre-
sponding basic eigenvectors are v1 = [1 − i, −1]T and v2 = [1 + i, −1]T , respectively. Therefore
B = PDP−1 , where
1−i 1+i i 0
P= and D = .
−1 −1 0 −i
528 Answers to selected exercises
(d) The characteristic polynomial is −λ 3 + 4λ 2 − 5λ + 2 = −(λ − 1)(λ − 1)(λ − 2). Therefore the
eigenvalues are λ = 1 with algebraic multiplicity 2 and λ = 2 with algebraic multiplicity 1. The
eigenspace for λ = 1 is 1-dimensional; it is spanned by [1, 1, −1]T . Since the geometric multiplicity
of the eigenvalue 1 is less than its algebraic multiplicity, the matrix is not diagonalizable, not even
over the complex numbers.
1
8.11.2 (a) By Proposition 8.51, the other eigenvalue is 1 − 2i with corresponding eigenvector .
−i
(b) From the eigenvalues and eigenvectors, we know that A = PDP−1 , where
1 1 1 + 2i 0
P= and D = .
i −i 0 1 − 2i
1 2
(c) A = PDP−1 = .
−2 1
9.1.13 Let f (i) be the i th component of a vector x ∈ Rn . Thus a typical element in Rn is ( f (1), . . ., f (n)).
9.2.8 Let pi (x) denote the i th of these polynomials. Suppose C1 p1 (x) + . . . +C4 p4 (x) = 0. Then collecting
terms according to the exponent of x, we have
The matrix of coefficients is just the transpose of the above matrix. There exists a non-trivial solution if
and only if the determinant of this matrix equals 0.
529
If the only solution is the trivial solution, the set is linearly independent. We rewrite the equation as
follows.
(2a + 21 c)u + (b + 3c)v + (−a + b)w = 0.
Since u, v, w are linearly independent, the coefficients in the last equation must all equal 0. In other words:
2a + 12 c = 0,
b + 3c = 0,
−a + b = 0.
We solve and find that the unique solution is a = b = c = 0. Therefore, the set R is linearly independent.
c1 = 0, c2 = 0, c3 = 0, c4 = 0,
and therefore, the polynomials are linearly independent. Since there are 4 linearly independent
polynomials in a 4-dimensional space, they form a basis.
(b) Yes.
9.4.6 Yes, because the set of 5 linearly independent vectors can be extended to a basis B of V . But since
V is 5-dimensional, B has only 5 elements, which must be the original 5 vectors.
530 Answers to selected exercises
9.4.7 No. Since V has a spanning set of size 5, the 6 vectors cannot be linearly independent by the
Exchange Lemma.
(1, 0, 0, 1, 0, 0, 1, 0, 0, 1, . . .),
(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, . . .),
(0, 0, 1, 0, 0, 1, 0, 0, 1, 0, . . .).
Therefore, W3 is a 3-dimensional space. For general k, the situation is analogous and the dimension
of Wk is k.
(b) Let U be the set of all periodic sequences of all periods. We can find an (infinite) spanning set for U
by taking all the basis vectors for all of the spaces Wk :
and so on. However, these sequences are not linearly independent. For example, we can obtain the
sequence (1, 0, 1, 0, 1, 0, 1, 0, . . .) of period 2 as a linear combination of two sequences of period 4,
namely (1, 0, 0, 0, 1, 0, 0, 0, . . .) and (0, 0, 1, 0, 0, 0, 1, 0, . . .). By Proposition 9.47, we know that it is
possible to shrink the above spanning set to a basis by removing certain sequences. How exactly to
do this is an interesting question. One way to construct a basis is to keep exactly those sequences of
period k that start with ℓ zeros, where gcd(ℓ, k) = 1. Proving that this really works is an interesting
project.
√
9.4.12 When we add two number of the √form a + b 2, we get another number of the same form. When
we multiply a number
√ of the form a + b 2 by a (rational) scalar, we get another number of the same form.
Also, 0 = 0 + 0 2 is of the required form. The 8 axiomsnof a vector space are satisfied because all of
√ o
them are laws of the arithmetic of real numbers. A basis is 1, 2 . By definition, the span of these gives
√
the collection of vectors. To prove
√that they are linearly independent, assume √ a + b 2 = 0, where a, b are
rational numbers. If b 6= 0, then 2 = − ba , which cannot happen because 2 is irrational. If a 6= 0, then
√
√1 = − b , which again cannot happen since √1 is irrational. Hence both a, b = 0. Therefore, 1 and 2 are
2 a 2
linearly independent over the rational numbers, and form a basis. The dimension of the space is 2.
9.5.6 The block length is n = r − 1 = 15 and the message length is k = n − r = 11. The check matrix and
generator matrix are not unique, because there are different ways of ordering the columns of the check
531
10.1.6 (a) Let a = (a0 , a1 , a2 , . . .). Then the equation shift(a) = a is equivalent to
which translates to the recurrence an+1 = an . The only free variable is a0 , and the only solutions are
the constant sequences. They form a 1-dimensional space with basis {(1, 1, 1, . . .)}.
which translates to the recurrence an+1 = −an . The only free variable is a0 , and the solutions form
a 1-dimensional space with basis {(1, −1, 1, −1, . . .)}.
532 Answers to selected exercises
10.3.2 The matrix A is invertible if and only if its rank is n, which means that v1 , . . . , vn are linearly
independent and therefore a basis of Rn . The existence of T then follows from Theorem 10.20.
2
10.4.2 [x]B = 1 .
−1
1 −1
10.4.4 MB2B1 = .
0 1
10.4.9 Since M is invertible, its columns [v1 ]B , . . . , [vn ]B are linearly independent and span Rn ; it follows
that v1 , . . . , vn is a basis of V . To show that [T ]C,B = NM −1 , it is sufficient to check that NM −1 [vi ]B = [wi ]C ,
for all i = 1, . . . , n. But by assumption, [vi ]B is the i th column of M, so that [vi ]B = Mei , where ei is the
i th basis vector. Therefore M −1 [vi ]B = ei . On the other hand, Nei is the i th column of N, i.e., [wi ]C . We
therefore have
NM −1 [vi ]B = Nei = [wi ]C ,
as desired.
11.1.1 (a) Yes, it is an inner product. Since A = AT , symmetry and linearity follow as in Example 11.3.
For the positive definite property, note that hu, ui = u21 + 2u22 ≥ 0, and equality holds if and only if
u1 , u2 = 0.
(b) Yes, it is an inner product. Since A = AT , symmetry and linearity follow as in Example 11.3. For
the positive definite property, we have hu, ui = 3u21 + 2u1 u2 + 3u22 = (u1 + u2 )2 + 2u21 + 2u22 ≥ 0.
Equality holds if and only if u1 + u2 , u1 , and u2 are all zero, which is the case if and only if u = 0.
1 0 0 1
(c) No, it is not symmetric. For example, h , i = 1 but h , i = 0.
0 1 1 0
0 0
(d) No, it is not positive definite. For example, h , i = −1.
1 1
1 1 1
(e) No, it is not positive definite. For example, h , i = 0 although 6= 0.
−1 −1 −1
R1 R1
11.1.2 (a) h1, xi = 0 1 · x dx = 0 x dx = 12 .
R1 R1 3 1
(b) hx, x2 i = 0 x · x2 dx = 0 x dx = 4 .
R1 R1 43
(c) h1 + x, 2 + x2 i = 2 2 3
0 (1 + x)(2 + x ) dx = 0 2 + x + 2x + x dx = 12 .
R1
11.1.3 (a) k1k2 = h1, 1i = 0 1 dx = 1, therefore k1k = 1.
534 Answers to selected exercises
R1 2
(b) kxk2 = hx, xi = 1
0 x dx = 3 , therefore kxk =
√1 .
3
2 R
q
(c)
x2 + 1
= hx2 + 1, x2 + 1i = 01 x4 + 2x2 + 1 dx = 15 + 23 + 1 = 28
15 , therefore
x2 + 1
= 28
15 .
11.1.4 The operation hu, vi = u ∗ v is an inner product on R3 . Symmetry and linearity are straightforward
to check, as is the positive definite property. Therefore, the claimed inequality holds by Proposition 2.27.
R1 2 2 2 2
R1 4 2 2
R1 3
11.1.5 (a) We have hx, xi = −1 x dx = 3 , hx , x i = −1 x dx = 5 , and hx, x i = −1 x dx = 0. Therefore
hx, x2 i
cos θ = = 0.
kxk kx2 k
Therefore, the angle θ is π /2 radians, or 90 degrees. In other words, x and x2 are orthogonal in
C[−1, 1].
R1 2 R1 1 4 R
(b) We have hx, xi = 2
−1 x dx = 3 , hx3 , x3 i = = 27 , and hx, x3 i = −1
6
−1 x dx x dx = 25 . Therefore
2 √
hx, x3 i 5 21
cos θ = 3
= q q = .
kxk kx k 2 2 5
3 7
√
The angle θ is cos−1 ( 521 ), which is approximately 0.4115 radians or 23.58 degrees.
11.1.6 (a) By assumption, a and b are square summable. Let N = a20 + a21 + . . . and M = b20 + b21 + . . .. By
the Cauchy-Schwarz inequality, for all n, we have
q q √ √
|a0 | |b0 | + . . . + |an | |bn | ≤ |a0 |2 + . . . + |an |2 |b0 |2 + . . . + |bn |2 ≤ N M.
Therefore the series |a0 b0 | + |a1 b1 | + . . . is bounded. By the absolute convergence test from calculus, it
follows that the series a0 b0 + a1 b1 + . . . converges.
(b) It is clear that the zero sequence is square summable, and also that a scalar multiple of a square
summable sequence is square summable. Hence HilbR contains the zero vector and is closed under scalar
multiplication. To show that it is closed under addition, assume a, b ∈ HilbR , and let c = a + b. We must
show that c is square summable. But
c20 + c21 + . . . = (a0 + b0 )2 + (a1 + b1 )2 + . . . = (a20 + a21 + . . .) + (b20 + b21 + . . .) + (2a0 b0 + 2a1 b1 + . . .).
The series a20 + a21 + . . . and b20 + b21 + . . . converge by assumption, and the series 2a0 b0 + 2a1 b1 + . . .
converges by part (a). It follows that HilbR is closed under addition.
(c) Symmetry and linearity follow straightforwardly from properties of convergent series. For example,
ha, kb + ℓci = a0 (kb0 + ℓc0 ) + a1 (kb1 + ℓc1 ) + . . .
= k(a0 b0 + a1 b1 + . . .) + ℓ(a0c0 + a1 c1 + . . .)
= kha, bi + ℓha, ci.
As for the positive definite property, note that
ha, ai = a20 + a21 + . . . ≥ 0.
Moreover, since all terms in the series are ≥ 0, it follows that ha, ai = 0 if and only if ai = 0 for all i.
535
11.2.1 We have hu1 , u2 i = uT1 Au2 = 0, hu1 , u3 i = uT1 Au3 = −5, hu1 , u4 i = uT1 Au4 = 0, hu2 , u3 i = uT2 Au3 =
0, hu2 , u4 i = uT2 Au4 = 32, and hu3 , u4 i = uT3 Au4 = 54. Therefore, u1 ⊥ u2 , u1 ⊥ u4 , and u2 ⊥ u3 . None
of the other pairs of vectors are orthogonal.
11.2.2 f1 ⊥ f2 , f1 ⊥ f4 , f2 ⊥ f3 , and f3 ⊥ f4 .
11.2.3 (a) Let p(x) = ax3 +bx2 +cx +d. Then hp(x), x2 i = 52 b + 32 d and hp(x), xi = 25 a + 23 c. Therefore,
p(x) is in the orthogonal complement of x2 , x if and only if 25 b+ 23 d = 0 and 25 a+ 23 c = 0. It follows
that b = − 53 d and a = − 53 c. The general solution is p(x) = − 53 cx3 − 53 dx2 + cx + d. A basis for the
orthogonal complement is − 35 x3 + x, − 35 x2 + 1 .
(b) A basis for the orthogonal complement of {x + 1} is 5x3 − 1, 3x2 − 1, 3x − 1 .
11.2.4 (a) Orthonormal (therefore also orthogonal). (b) Neither orthogonal nor orthonormal. (c) Orthog-
onal (not orthonormal). (d) Orthonormal (therefore also orthogonal).
11.2.5 v = 14 u1 + 23 u2 − 45 u3 .
hu1 ,vi
11.2.6 We have v = a1 u1 + a2 u2 + a3 u3 where a1 = hu 1 , u 1 i = 12 . So the first coordinate is 21 .
1 1
11.3.1 u1 = 2 , u2 = 4 .
3 −3
0 3
1 −1
11.3.2 u1 = , u2 =
1
.
1
0 −1
1 0 0
0 3 1
11.3.3 u1 = , u2 =
1
, u3 = .
0 0
0 −1 3
1 1 −1
11.3.4 u1 = 0 , u2 = 1 , u3 = 1 .
2 −1 0
1 1 −1
−1 1 1
11.3.5 u1 =
1
, u2 =
, u3 = .
−1 1
1 1 0
11.3.6 u1 = 1, u2 = x − 1, x2 − 2x + 32 .
536 Answers to selected exercises
1 1 1
u1 u2 1
11.3.7 The orthonormal basis is , = √ 2 , √ 4 .
ku1 k ku2 k 14 26 −3
3
−2
11.4.1 The best approximation is v′ = 3u1 − u2 = 5 .
2
1
3
11.4.2 The best approximation is v′ = u1 + 2u2 − u3 =
−2 .
4
11.4.3 Let p0 , p1 , . . . be the Legendre polynomials from Section 11.3. The approximating polynomials
are:
3
f0 (x) = p0 ,
4
3 1
f1 (x) = p0 − p1 ,
4 2
3 1 15
f2 (x) = p0 − p1 − p2 ,
4 2 32
3 1 15
f3 (x) = p0 − p1 − p2 + 0p3 ,
4 2 32
3 1 15 105
f4 (x) = p0 − p1 − p2 + 0p3 + p4 .
4 2 32 256
f0
f1
f4
f x
−1 f
1 2 = f 3
π2
11.4.4 3 − 41 cos x + 44 cos 2x − 49 cos 3x + 16
4 4
cos 4x − 25 cos 5x ± . . ..
11.5.3 y = 2 + 2x.
11.5.4 y = 1 − x + x2 .
537
11.6.2 A and C are orthogonal (and therefore isometries). D is an isometry but not orthogonal. B is neither
orthogonal nor an isometry.
11.7.2 Suppose A is orthogonally diagonalizable. Then A = PDP−1 , where D is diagonal and P is orthog-
onal. Since P is orthogonal, we have P−1 = PT , and therefore A = PDPT . Moreover, since D is diagonal,
we have D = DT . It follows that AT = (PDPT )T = (PT )T DT PT = PDPT = A, so A is symmetric.
11.8.1 (a) Positive semidefinite. (b) Positive definite. (c) Not symmetric (therefore neither). (d) Positive
semidefinite. (e) Neither.
11.8.2 (a) Eigenvalues: {1, 6}. Positive definite. (b) Eigenvalues: {0, 13}. Positive semidefinite. (c)
Eigenvalues: {0, 2, 3}. Positive semidefinite. (d) Eigenvalues: {−1, 4, 8}. Neither.
11.8.3 (a) No (neither positive definite nor semidefinite). (b) Yes. (c) No (positive semidefinite but not
definite). (d) Yes.
det(A − λ I) = λ 2 − 3λ − 2,
det(B − λ I) = −λ 3 + 4λ 2 − 4λ + 1,
det(C − λ I) = λ 4 − 6λ 3 + 9λ 2 − 3λ + 0.
For A, the coefficients are not weakly alternating, so A is not positive semidefinite. For B, the coefficients
are strongly alternating, so B is positive definite. For C, the coefficients are weakly, but not strongly
alternating, so C is positive semidefinite, but not positive definite.
11.9.4 3u2 + v2 + vw + w2 .
√
2
y y y
1 1 u2 1 u2
1
u2 √1 2
6
1
x x x
u1
−1 1 −1 1u −1 1
1
1 √1 u1
2
−1 −1 −1
1 1 1
11.9.8 The principal axes are √13 −1 , √1 1 ,
2
√1 −1 .
6
−1 0 2
11.10.7 u1 ⊥ u2 , u1 ⊥ u4 , and u2 ⊥ u3 .
11.10.9 Orthogonal: [0, i, 2]T , [1, 2, i]T . Orthonormal: √1 [0, i, 2]T , √1 [1, 2, i]T .
5 6
539
w 50 37
A.1.1 z + w = 5 − i, z − 2w = −4 + 23i, zw = 62 + 5i, and = − − i.
z 53 53
A.2.2 Since i has magnitude 1 and argument π /2 (or 90◦ ), the number z must have magnitude 1 and
argument π /4. It therefore lies at 45◦ on the unit circle. The solution is z = 1+i
√ . A second solution is
2
−z = − 1+i
√ , whose argument is −3π /4 or −135◦ . Note that if we double this angle, we get −270◦ , which
2
is the same as +90◦ .
i
z = 1+i
√
2
−1 1
−z = − 1+i
√
2
−i
540 Answers to selected exercises
A.2.3 The three solutions can be found on the unit circle at 60◦ , 180◦ , and 300◦ . If we triple any of
◦ ◦
√ we get 180 (up to multiples of 360 ). Thus, the three cube roots of −1 are z = −1 and
these angles,
z = 0.5 ± 0.75 i.
√
0.5 + 0.75 i
−1
√
0.5 − 0.75 i
√
6 ± 16
A.3.1 By the quadratic formula, z = = 3 ± 2i.
2
A.3.3 By trial and error, we find the roots z = 1 and z = −2. Moreover, z = 1 is a double root. We have
z3 − 3z + 2 = (z − 1)(z − 1)(z + 2).
541
542 INDEX