0% found this document useful (0 votes)
332 views

Lab. Manual PDF

This document is a textbook on linear algebra using MATLAB as a guide. It introduces vectors and matrices, defining them as lists of numbers. It explains how to enter vectors and matrices into MATLAB and perform basic operations like addition and scalar multiplication. Several special types of matrices are also introduced. The document provides geometric interpretations of vector operations and discusses the angle between vectors using the dot product.

Uploaded by

Zedrik Mojica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views

Lab. Manual PDF

This document is a textbook on linear algebra using MATLAB as a guide. It introduces vectors and matrices, defining them as lists of numbers. It explains how to enter vectors and matrices into MATLAB and perform basic operations like addition and scalar multiplication. Several special types of matrices are also introduced. The document provides geometric interpretations of vector operations and discusses the angle between vectors using the dot product.

Uploaded by

Zedrik Mojica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 310

Linear Algebra Using MATLAB

1
MATH 5331

May 12, 2010

1
Selected material from the text Linear Algebra and Differential Equations Using MATLAB by
Martin Golubitsky and Michael Dellnitz
Contents

1 Preliminaries 1

1.1 Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Special Kinds of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 The Geometry of Vector Operations . . . . . . . . . . . . . . . . . . . . . . 11

2 Solving Linear Equations 18

2.1 Systems of Linear Equations and Matrices . . . . . . . . . . . . . . . . . . . 19

2.2 The Geometry of Low-Dimensional Solutions . . . . . . . . . . . . . . . . . 28

2.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Reduction to Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.5 Linear Equations with Special Coefficients . . . . . . . . . . . . . . . . . . . 57

2.6 *Uniqueness of Reduced Echelon Form . . . . . . . . . . . . . . . . . . . . . 64

3 Matrices and Linearity 66

3.1 Matrix Multiplication of Vectors . . . . . . . . . . . . . . . . . . . . . . . . 67

3.2 Matrix Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.3 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.4 The Principle of Superposition . . . . . . . . . . . . . . . . . . . . . . . . . 84

i
3.5 Composition and Multiplication of Matrices . . . . . . . . . . . . . . . . . . 88

3.6 Properties of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 92

3.7 Solving Linear Systems and Inverses . . . . . . . . . . . . . . . . . . . . . . 97

3.8 Determinants of 2 × 2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 106

4 Determinants and Eigenvalues 110

4.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2 Determinants, An Alternative Treatment . . . . . . . . . . . . . . . . . . . . 124

4.3 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

4.4 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 143

4.5 *Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

4.6 Appendix: Existence of Determinants . . . . . . . . . . . . . . . . . . . . . 166

5 Vector Spaces 169

5.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.2 Construction of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

5.3 Spanning Sets and MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . 181

5.4 Linear Dependence and Linear Independence . . . . . . . . . . . . . . . . . 185

5.5 Dimension and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

5.6 The Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . 193

6 Linear Maps and Changes of Coordinates 200

6.1 Linear Mappings and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

6.2 Row Rank Equals Column Rank . . . . . . . . . . . . . . . . . . . . . . . . 207

6.3 Vectors and Matrices in Coordinates . . . . . . . . . . . . . . . . . . . . . . 211

6.4 Matrices of Linear Maps on a Vector Space . . . . . . . . . . . . . . . . . . 219

ii
7 Orthogonality 224

7.1 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

7.2 Least Squares Approximations . . . . . . . . . . . . . . . . . . . . . . . . . 228

7.3 Least Squares Fitting of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 234

7.4 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

7.5 Orthogonal Matrices and QR Decompositions . . . . . . . . . . . . . . . . . 246

8 Matrix Normal Forms 254

8.1 Real Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 254

8.2 Simple Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 260

8.3 Multiplicity and Generalized Eigenvectors . . . . . . . . . . . . . . . . . . . 270

8.4 The Jordan Normal Form Theorem . . . . . . . . . . . . . . . . . . . . . . . 276

8.5 *Appendix: Markov Matrix Theory . . . . . . . . . . . . . . . . . . . . . . . 285

8.6 *Appendix: Proof of Jordan Normal Form . . . . . . . . . . . . . . . . . . . 288

MATLAB Commands 292

iii
Chapter 1

Preliminaries

The subjects of linear algebra and differential equations involve manipulating vector equa-
tions. In this chapter we introduce our notation for vectors and matrices — and we intro-
duce MATLAB, a computer program that is designed to perform vector manipulations in a
natural way.

We begin, in Section 1.1, by defining vectors and matrices, and by explaining how to add
and scalar multiply vectors and matrices. In Section 1.2 we explain how to enter vectors and
matrices into MATLAB, and how to perform the operations of addition and scalar multipli-
cation in MATLAB. There are many special types of matrices; these types are introduced in
Section 1.3. In the concluding section, we introduce the geometric interpretations of vector
addition and scalar multiplication; in addition we discuss the angle between vectors through
the use of the dot product of two vectors.

1.1 Vectors and Matrices

In their elementary form, matrices and vectors are just lists of real numbers in different
formats. An n-vector is a list of n numbers (x1, x2, . . . , xn). We may write this vector as a
row vector as we have just done — or as a column vector
 
x1
 . 
 ..  .
 
xn
The set of all (real-valued) n-vectors is denoted by Rn ; so points in Rn are called vectors.
The sets Rn when n is small are very familiar sets. The set R1 = R is the real number
line, and the set R2 is the Cartesian plane. The set R3 consists of points or vectors in three
dimensional space.

1
An m × n matrix is a rectangular array of numbers with m rows and n columns. A
general 2 × 3 matrix has the form
' (
a11 a12 a13
A= .
a21 a22 a23
We use the convention that matrix entries aij are indexed so that the first subscript i refers
to the row while the second subscript j refers to the column. So the entry a21 refers to the
matrix entry in the 2nd row, 1st column.

An m × n matrix A and an m! × n! matrix B are equal precisely when the sizes of the
matrices are equal (m = m! and n = n!) and when each of the corresponding entries are
equal (aij = bij ).

There is some redundancy in the use of the terms “vector” and “matrix”. For example,
a row n-vector may be thought of as a 1 × n matrix, and a column n-vector may be thought
of as a n × 1 matrix. There are situations where matrix notation is preferable to vector
notation and vice-versa.

Addition and Scalar Multiplication of Vectors

There are two basic operations on vectors: addition and scalar multiplication. Let x =
(x1, . . . , xn ) and y = (y1, . . . , yn ) be n-vectors. Then

x + y = (x1 + y1 , . . . , xn + yn );

that is, vector addition is defined as componentwise addition.

Similarly, scalar multiplication is defined as componentwise multiplication. A scalar is


just a number. Initially, we use the term scalar to refer to a real number — but later on we
sometimes use the term scalar to refer to a complex number. Suppose r is a real number;
then the multiplication of a vector by the scalar r is defined as

rx = (rx1, . . . , rxn ).

Subtraction of vectors is defined simply as

x − y = (x1 − y1 , . . . , xn − yn ).

Formally, subtraction of vectors may also be defined as

x − y = x + (−1)y.

Division of a vector x by a scalar r is defined to be


1
x.
r
The standard difficulties concerning division by zero still hold.

2
Addition and Scalar Multiplication of Matrices

Similarly, we add two m × n matrices by adding corresponding entries, and we multiply a


scalar times a matrix by multiplying each entry of the matrix by that scalar. For example,
' ( ' ( ' (
0 2 1 −3 1 −1
+ =
4 6 1 4 5 10

and ' ( ' (


2 −4 8 −16
4 = .
3 1 12 4
The main restriction on adding two matrices is that the matrices must be of the same size.
So you cannot add a 4 × 3 matrix to 6 × 2 matrix — even though they both have twelve
entries.

Hand Exercises

In Exercises 1 – 3, let x = (2, 1, 3) and y = (1, 1, −1) and compute the given expression.

1. x + y.

2. 2x − 3y.

3. 4x.

4. Let A be the 3 × 4 matrix  


2 −1 0 1
 
A= 3 4 −7 10  .
6 −3 4 2

(a) For which n is a row of A a vector in Rn?


(b) What is the 2nd column of A?

(c) Let aij be the entry of A in the ith row and the j th column. What is a23 − a31?

For each of the pairs of vectors or matrices in Exercises 5 – 9, decide whether addition of the members
of the pair is possible; and, if addition is possible, perform the addition.

5. x = (2, 1) and y = (3, −1).

6. x = (1, 2, 2) and y = (−2, 1, 4).

7. x = (1, 2, 3) and y = (−2, 1).


' ( ' (
1 3 2 1
8. A = and B = .
0 4 1 −2

3
 
2 1 0 ' (
  2 1
9. A =  4 1 0  and B = .
1 −2
0 0 0

' ( ' (
2 1 0 2
In Exercises 10 – 11, let A = and B = and compute the given expression.
−1 4 3 −1

10. 4A + B.

11. 2A − 3B.

1.2 MATLAB

We shall use MATLAB to compute addition and scalar multiplication of vectors in two and three
dimensions. This will serve the purpose of introducing some basic MATLAB commands.

Entering Vectors and Vector Operations

Begin a MATLAB session. We now discuss how to enter a vector into MATLAB. The syntax is
straightforward; to enter the row vector x = (1, 2, 1) type1

x = [1 2 1]

and MATLAB responds with

x =
1 2 1

Next we show how easy it is to perform addition and scalar multiplication in MATLAB. Enter
the row vector y = (2, −1, 1) by typing

y = [2 -1 1]

and MATLAB responds with

y =
2 -1 1
1
MATLAB has several useful line editing features. We point out two here:
(a) Horizontal arrow keys (→, ←) move the cursor one space without deleting a character.
(b) Vertical arrow keys (↑, ↓) recall previous and next command lines.

4
To add the vectors x and y, type

x + y

and MATLAB responds with

ans =
3 1 2

This vector is easily checked to be the sum of the vectors x and y. Similarly, to perform a scalar
multiplication, type

2*x

which yields

ans =
2 4 2

MATLAB subtracts the vector y from the vector x in the natural way. Type

x - y

to obtain

ans =
-1 3 0

We mention two points concerning the operations that we have just performed in MATLAB.

(a) When entering a vector or a number, MATLAB automatically echoes what has been entered.
This echoing can be suppressed by appending a semicolon to the line. For example, type

z = [-1 2 3];

and MATLAB responds with a new line awaiting a new command. To see the contents of the
vector z just type z and MATLAB responds with

z =
-1 2 3

(b) MATLAB stores in a new vector the information obtained by algebraic manipulation. Type

a = 2*x - 3*y + 4*z;

5
Now type a to find

a =
-8 15 11

We see that MATLAB has created a new row vector a with the correct number of entries.

Note: In order to use the result of a calculation later in a MATLAB session, we need to name
the result of that calculation. To recall the calculation 2*x - 3*y + 4*z, we needed to name that
calculation, which we did by typing a = 2*x - 3*y + 4*z. Then we were able to recall the result
just by typing a.

We have seen that we enter a row n vector into MATLAB by surrounding a list of n numbers
separated by spaces with square brackets. For example, to enter the 5-vector w = (1, 3, 5, 7, 9) just
type

w = [1 3 5 7 9]

Note that the addition of two vectors is only defined when the vectors have the same number of
entries. Trying to add the 3-vector x with the 5-vector w by typing x + w in MATLAB yields the
warning:

??? Error using ==> +


Matrix dimensions must agree.

In MATLAB new rows are indicated by typing ;. For example, to enter the column vector
 
−1
 
z =  2 ,
3

just type:

z = [-1; 2; 3]

and MATLAB responds with

z =
-1
2
3

Note that MATLAB will not add a row vector and a column vector. Try typing x + z.

Individual entries of a vector can also be addressed. For instance, to display the first component
of z type z(1).

6
Entering Matrices

Matrices are entered into MATLAB row by row with rows separated either by semicolons or by line
returns. To enter the 2 × 3 matrix ' (
2 3 1
A= ,
1 4 7
just type

A = [2 3 1; 1 4 7]

MATLAB has very sophisticated methods for addressing the entries of a matrix. You can directly
address individual entries, individual rows, and individual columns. To display the entry in the 1st
row, 3rd column of A, type A(1,3). To display the 2nd column of A, type A(:,2); and to display
the 1st row of A, type A(1,:). For example, to add the two rows of A and store them in the vector
x, just type

x = A(1,:) + A(2,:)

MATLAB has many operations involving matrices — these will be introduced later, as needed.

Computer Exercises

1. Enter the 3 × 4 matrix  


1 2 5 7
 
A =  −1 2 1 −2  .
4 6 8 0
As usual, let aij denote the entry of A in the ith row and j th column. Use MATLAB to compute
the following:

(a) a13 + a32 .

(b) Three times the 3rd column of A.


(c) Twice the 2nd row of A minus the 3rd row.

(d) The sum of all of the columns of A.

2. Verify that MATLAB adds vectors only if they are of the same type, by typing

(a) x = [1 2], y = [2; 3] and x + y.


(b) x = [1 2], y = [2 3 1] and x + y.

In Exercises 3 – 4, let x = (1.2, 1.4, −2.45) and y = (−2.6, 1.1, 0.65) and use MATLAB to compute
the given expression.

7
3. 3.27x − 7.4y.

4. 1.65x + 2.46y.

In Exercises 5 – 6, let
' ( ' (
1.2 2.3 −0.5 −2.9 1.23 1.6
A= and B =
0.7 −1.4 2.3 −2.2 1.67 0

and use MATLAB to compute the given expression.

5. −4.2A + 3.1B.

6. 2.67A − 1.1B.

1.3 Special Kinds of Matrices

There are many matrices that have special forms and hence have special names — which we now
list.

• A square matrix is a matrix with the same number of rows and columns; that is, a square
matrix is an n × n matrix.
• A diagonal matrix is a square matrix whose only nonzero entries are along the main diagonal;
that is, aij = 0 if i =
# j. The following is a 3 × 3 diagonal matrix
 
1 0 0
 
 0 2 0 .
0 0 3

There is a shorthand in MATLAB for entering diagonal matrices. To enter this 3 × 3 matrix,
type diag([1 2 3]).
• The identity matrix is the diagonal matrix all of whose diagonal entries equal 1. The n × n
identity matrix is denoted by In . This identity matrix is entered in MATLAB by typing
eye(n).
• A zero matrix is a matrix all of whose entries are 0. A zero matrix is denoted by 0. This
notation is ambiguous since there is a zero m × n matrix for every m and n. Nevertheless,
this ambiguity rarely causes any difficulty. In MATLAB, to define an m × n matrix A whose
entries all equal 0, just type A = zeros(m,n). To define an n × n zero matrix B, type B =
zeros(n).
• The transpose of an m × n matrix A is the n × m matrix obtained from A by interchanging
rows and columns. Thus the transpose of the 4 × 2 matrix
 
2 1
 −1 2 
 
 
 3 −4 
5 7

8
is the 2 × 4 matrix ' (
2 −1 3 5
.
1 2 −4 7
Suppose that you enter this 4 × 2 matrix into MATLAB by typing

A = [2 1; -1 2; 3 -4; 5 7]

The transpose of a matrix A is denoted by At . To compute the transpose of A in MATLAB,


just type A! .

• A symmetric matrix is a square matrix whose entries are symmetric about the main diagonal;
that is aij = aji. Note that a symmetric matrix is a square matrix A for which At = A.

• An upper triangular matrix is a square matrix all of whose entries below the main diagonal
are 0; that is, aij = 0 if i > j. A strictly upper triangular matrix is an upper triangular matrix
whose diagonal entries are also equal to 0. Similar definitions hold for lower triangular and
strictly lower triangular matrices. The following four 3 × 3 matrices are examples of upper
triangular, strictly upper triangular, lower triangular, and strictly lower triangular matrices:
       
1 2 3 0 2 3 7 0 0 0 0 0
       
 0 2 4   0 0 4   5 2 0   5 0 0 .
0 0 6 0 0 0 −4 1 −3 10 1 0

• A square matrix A is block diagonal if


 
B1 0 ··· 0
 
 0 B2 ··· 0 
A=
 .. .. .. .. 

 . . . . 
0 0 ··· Bk

where each Bj is itself a square matrix. An example of a 5 × 5 block diagonal matrix with
one 2 × 2 block and one 3 × 3 block is:
 
2 3 0 0 0
 
 4 1 0 0 0 
 
 0 0 1 2 3 .
 
 
 0 0 3 2 4 
0 0 1 1 5

Hand Exercises

In Exercises 1 – 5 decide whether or not the given matrix is symmetric.


' (
2 1
1. .
1 5

9
' (
1 1
2. .
0 −5

3. (3).
 
3 4
 
4.  4 3 .
0 1
 
3 4 −1
 
5.  4 3 1 .
−1 1 10

In Exercises 6 – 10 decide which of the given matrices are upper triangular and which are strictly
upper triangular.
' (
2 0
6. .
−1 −2
' (
0 4
7. .
0 0

8. (2).
 
3 2
 
9.  0 1 .
0 0
 
0 2 −4
 
10.  0 7 −2 .
0 0 0

' (
a 0
A general 2 × 2 diagonal matrix has the form . Thus the two unknown real numbers a
0 b
and b are needed to specify each 2 × 2 diagonal matrix. In Exercises 11 – 16, how many unknown
real numbers are needed to specify each of the given matrices:

11. An upper triangular 2 × 2 matrix?

12. A symmetric 2 × 2 matrix?

13. An m × n matrix?

14. A diagonal n × n matrix?

15. An upper triangular n × n matrix? Hint: Recall the summation formula:

k(k + 1)
1+ 2+···+k = .
2
16. A symmetric n × n matrix?

10
In each of Exercises 17 – 19 determine whether the statement is True or False?

17. Every symmetric, upper triangular matrix is diagonal.

18. Every diagonal matrix is a multiple of the identity matrix.

19. Every block diagonal matrix is symmetric.

Computer Exercises

20. Use MATLAB to compute At when


 
1 2 4 7
 
A= 2 1 5 6  (1.3.1)
4 6 2 1

Use MATLAB to verify that (At )t = A by setting B=A’, C=B’, and checking that C = A.

21. Use MATLAB to compute At when A = (3) is a 1 × 1 matrix.

1.4 The Geometry of Vector Operations

In this section we discuss the geometry of addition, scalar multiplication, and dot product of vectors.
We also use MATLAB graphics to visualize these operations.

Geometry of Addition

MATLAB has an excellent graphics language that we shall use at various times to illustrate concepts
in both two and three dimensions. In order to make the connections between ideas and graphics
more transparent, we will sometimes use previously developed MATLAB programs. We begin with
such an example — the illustration of the parallelogram law for vector addition.

Suppose that x and y are two planar vectors. Think of these vectors as line segments from the
origin to the points x and y in R2 . We use a program written by T.A. Bryan to visualize x + y. In
MATLAB type2 :

x = [1 2];
y = [-2 3];
addvec(x,y)

The vector x is displayed in blue, the vector y in green, and the vector x + y in red. Note that x + y
is just the diagonal of the parallelogram spanned by x and y. A black and white version of this
figure is given in Figure 1.1.
2
Note that all MATLAB commands are case sensitive — upper and lower case must be correct

11
x+y
5

3 y

2 x

−1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 1.1: Addition of two planar vectors.

The parallelogram law (the diagonal of the parallelogram spanned by x and y is x + y) is equally
valid in three dimensions. Use MATLAB to verify this statement by typing:

x = [1 0 2];
y = [-1 4 1];
addvec3(x,y)

The parallelogram spanned by x and y in R3 is shown in cyan; the diagonal x + y is shown in blue.
See Figure 1.2. To test your geometric intuition, make several choices of vectors x and y. Note that
one vertex of the parallelogram is always the origin.

x+y

2.5

2 x

1.5 y

0.5

0
4 0
2 4
0 2
0
−2
−2
−4 −4

Figure 1.2: Addition of two vectors in three dimensions.

Geometry of Scalar Multiplication

In all dimensions scalar multiplication just scales the length of the vector. To discuss this point we
need to define the length of a vector. View an n-vector x = (x1 , . . ., xn) as a line segment from the
origin to the point x. Using the Pythagorean theorem, it can be shown that the length or norm of

12
this line segment is: )
||x|| = x21 + · · · + x2n.

MATLAB has the command norm for finding the length of a vector. Test this by entering the
3-vector

x = [1 4 2];

Then type

norm(x)

MATLAB responds with:

ans =
4.5826

√ √
which is indeed approximately 1 + 42 + 22 = 21.

Now suppose r ∈ R and x ∈ Rn. A calculation shows that

||rx|| = |r|||x||. (1.4.1)

See Exercise 17. Note also that if r is positive, then the direction of rx is the same as that of x; while
if r is negative, then the direction of rx is opposite to the direction of x. The lengths of the vectors
3x and −3x are each three times the length of x — but these vectors point in opposite directions.
Scalar multiplication by the scalar 0 produces the 0 vector, the vector whose entries are all zero.

Dot Product and Angles

The dot product of two n-vectors x = (x1, . . . , xn) and y = (y1 , . . ., yn ) is an important operation
on vectors. It is defined by:
x · y = x1 y1 + · · · + xnyn . (1.4.2)

Note that x · x is just ||x||2, the length of x squared.

MATLAB also has a command for computing dot products of n-vectors. Type

x = [1 4 2];
y = [2 3 -1];
dot(x,y)

MATLAB responds with the dot product of x and y, namely,

13
ans =
12

One of the most important facts concerning dot products is the one that states

x·y = 0 if and only if x and y are perpendicular. (1.4.3)

Indeed, dot product also gives a way of numerically determining the angle between n-vectors. (Note:
By convention, the angle between two vectors is the angle whose measure is between 0◦ and 180◦.)

Theorem 1.4.1. Let θ be the angle between two nonzero n-vectors x and y. Then
x·y
cos θ = . (1.4.4)
||x||||y||

It follows that cos θ = 0 if and only if x · y = 0. Thus (1.4.3) is valid.

Proof: Theorem 1.4.1 is just a restatement of the law of cosines. Recall that the law of cosines
states that
c2 = a2 + b2 − 2ab cos θ,

where a, b, c are the lengths of the sides of a triangle and θ is the interior angle opposite the side of
length c. In vector notation we can form a triangle two of whose sides are given by x and y in Rn.
The third side is just x − y as x = y + (x − y), as in Figure 1.3.

x-y

θ y

Figure 1.3: Triangle formed by vectors x and y with interior angle θ.

It follows from the law of cosines that

||x − y||2 = ||x||2 + ||y||2 − 2||x||||y|| cosθ.

We claim that
||x − y||2 = ||x||2 + ||y||2 − 2x · y.

Assuming that the claim is valid, it follows that

x · y = ||x||||y|| cos θ,

14
which proves the theorem. Finally, compute

||x − y||2 = (x1 − y1 )2 + · · · + (xn − yn)2


= (x21 − 2x1y1 + y12 ) + · · · + (x2n − 2xnyn + yn2 )
= (x21 + · · · + x2n) − 2(x1y1 + · · · + xnyn ) + (y12 + · · · + yn2 )
= ||x||2 − 2x · y + ||y||2

to verify the claim.

Theorem 1.4.1 gives a numerically efficient method for computing the angle between vectors x
and y. In MATLAB this computation proceeds by typing

theta = acos(dot(x,y)/(norm(x)*norm(y)))

where acos is the inverse cosine of a number. For example, using the 3-vectors x = (1, 4, 2) and
y = (2, 3, −1) entered previously, MATLAB responds with

theta =
0.7956

Remember that this answer is in radians. To convert this answer to degrees, just multiply by 360
and divide by 2π:

360*theta / (2*pi)

to obtain the answer of 45.5847◦.

Area of Parallelograms

Let P be a parallelogram whose sides are the vectors v and w as in Figure 1.4. Let |P | denote the
area of P . As an application of dot products and (1.4.4), we calculate |P |. We claim that

|P |2 = ||v||2||w||2 − (v · w)2 . (1.4.5)

We verify (1.4.5) as follows. Note that the area of P is the same as the area of the rectangle R also
pictured in Figure 1.4. The side lengths of R are: ||v|| and ||w|| sin θ where θ is the angle between v
and w. A computation using (1.4.4) shows that

|R|2 = ||v||2||w||2 sin2 θ


= ||v||2||w||2(1 − cos2 θ)
' * +2(
v·w
= ||v|| ||w|| 1 −
2 2
||v||||w||
= ||v||2||w||2 − (v · w)2 ,

which establishes (1.4.5).

15
w

P |w| sin(θ) R
θ
0 v |v|

Figure 1.4: Parallelogram P beside rectangle R with same area.

Hand Exercises

In Exercises 1 – 4 compute the lengths of the given vectors.

1. x = (3, 0).

2. x = (2, −1).

3. x = (−1, 1, 1).

4. x = (−1, 0, 2, −1, 3).

In Exercises 5 – 8 determine whether the given pair of vectors is perpendicular.

5. x = (1, 3) and y = (3, −1).

6. x = (2, −1) and y = (−2, 1).

7. x = (1, 1, 3, 5) and y = (1, −4, 3, 0).

8. x = (2, 1, 4, 5) and y = (1, −4, 3, −2).

9. Find a real number a so that the vectors

x = (1, 3, 2) and y = (2, a, −6)

are perpendicular.

10. Find the lengths of the vectors u = (2, 1, −2) and v = (0, 1, −1), and the angle between them.

In Exercises 11 – 16 compute the dot product x · y for the given pair of vectors and the cosine of
the angle between them.

11. x = (2, 0) and y = (2, 1).

12. x = (2, −1) and y = (1, 2).

13. x = (−1, 1, 4) and y = (0, 1, 3).

14. x = (−10, 1, 0) and y = (0, 1, 20).

15. x = (2, −1, 1, 3, 0) and y = (4, 0, 2, 7, 5).

16
16. x = (5, −1, 4, 1, 0, 0) and y = (−3, 0, 0, 1, 10, −5).

17. Using the definition of length, verify that formula (1.4.1) is valid.

Computer Exercises

18. Use addvec and addvec3 to add vectors in R2 and R3 . More precisely, enter pairs of 2-vectors
x and y of your choosing into MATLAB, use addvec to compute x+y, and note the parallelogram
formed by 0, x, y, x + y. Similarly, enter pairs of 3-vectors and use addvec3.

19. Determine the vector of length 1 that points in the same direction as the vector

x = (2, 13.5, −6.7, 5.23).

20. Determine the vector of length 1 that points in the same direction as the vector

y = (2.1, −3.5, 1.5, 1.3, 5.2).

In Exercises 21– 23 find the angle in degrees between the given pair of vectors.

21. x = (2, 1, −3, 4) and y = (1, 1, −5, 7).

22. x = (2.43, 10.2, −5.27, π) and y = (−2.2, 0.33, 4, −1.7).

23. x = (1, −2, 2, 1, 2.1) and y = (−3.44, 1.2, 1.5, −2, −3.5).

In Exercises 24 – 25 let P be the parallelogram generated by the given vectors v and w in R3.
Compute the area of that parallelogram.

24. v = (1, 5, 7) and w = (−2, 4, 13).

25. v = (2, −1, 1) and w = (−1, 4, 3).

17
Chapter 2

Solving Linear Equations

A linear equation in n unknowns, x1, x2, . . . , xn , is an equation of the form

a1 x1 + a2 x2 + · · · + an xn = b

where a1 , a2 , . . . , an and b are given numbers. The numbers a1, a2 , . . . , an , are called
the coefficients of the equation. In particular, the equation

ax = b,

is a linear equation in one unknown;

ax + by = c,

is a linear equation in two unknowns (if a and b are real numbers and not both 0, the
graph of the equation is a straight line); and

ax + by + cz = d,

is a linear equation in three unknowns (if a, b and c are real numbers and not all 0,
then the graph is a plane in 3-space).

Our main interest in this chapter is in solving systems of linear equations. This work
provides the basis for a general study of vectors and matrices. The algorithms that enable
us to find solutions are themselves based on certain kinds of matrix manipulations. In these
algorithms, matrices serve as a shorthand for calculation, rather than as a basis for a theory.
We will see later that these matrix manipulations do lead to a rich theory of how to solve
systems of linear equations. But our first step is just to see how these equations are actually
solved.

We begin with a discussion in Section 2.1 of how to write systems of linear equations
in terms of matrices. We also show by example how complicated writing down the answer

18
to such systems can be. In Section 2.2, we recall that solution sets to systems of linear
equations in two and three variables are points, lines or planes.

The best known and probably the most efficient method for solving systems of linear
equations (especially with a moderate to large number of unknowns) is Gaussian elimination.
The idea behind this method, which is introduced in Section 2.3, is to manipulate matrices
by elementary row operations to reduced echelon form. It is then possible just to look at
the reduced echelon form matrix and to read off the solutions to the linear system, if any.
The process of reading off the solutions is formalized in Section 2.4; see Theorem 2.4.6. Our
discussion of solving linear equations is presented with equations whose coefficients are real
numbers — though most of our examples have just integer coefficients. The methods work
just as well with complex numbers, and this generalization is discussed in Section 2.5.

Throughout this chapter, we alternately discuss the theory and show how calculations
that are tedious when done by hand can easily be performed by computer using MATLAB.
The chapter ends with a proof of the uniqueness of row echelon form (a topic of theoretical
importance) in Section 2.6. This section is included mainly for completeness and need not
be covered on a first reading.

2.1 Systems of Linear Equations and Matrices

It is a simple exercise to solve the system of two equations

x+ y=7
(2.1.1)
−x + 3y = 1

to find that x = 5 and y = 2. One way to solve system (2.1.1) is to add the two equations,
obtaining
4y = 8;

hence y = 2. Substituting y = 2 into the 1st equation in (2.1.1) yields x = 5.

This system of equations can be solved in a more algorithmic fashion by solving the 1st
equation in (2.1.1) for x as
x = 7 − y,

and substituting this answer into the 2nd equation in (2.1.1), to obtain

−(7 − y) + 3y = 1.

This equation simplifies to:


4y = 8.

Now proceed as before.

19
Solving Larger Systems by Substitution

In contrast to solving the simple system of two equations, it is less clear how to solve a
complicated system of five equations such as:

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 − x3 − x4 + x5 = 6
x1 + 2x2 + x3 + x4 + 3x5 = 19 (2.1.2)
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.

The algorithmic method used to solve (2.1.1) can be expanded to produce a method, called
substitution, for solving larger systems. We describe the substitution method as it applies
to (2.1.2). Solve the 1st equation in (2.1.2) for x1 , obtaining

4 4 3 6 2
x1 = + x2 − x3 + x4 − x5 . (2.1.3)
5 5 5 5 5
Then substitute the right hand side of (2.1.3) for x1 in the remaining four equations in
(2.1.2) to obtain a new system of four equations in the four variables x2,x3 ,x4,x5. This
procedure eliminates the variable x1 . Now proceed inductively — solve the 1st equation in
the new system for x2 and substitute this expression into the remaining three equations to
obtain a system of three equations in three unknowns. This step eliminates the variable
x2 . Continue by substitution to eliminate the variables x3 and x4, and arrive at a simple
equation in x5 — which can be solved. Once x5 is known, then x4, x3 , x2 , and x1 can be
found in turn.

Two Questions

• Is it realistic to expect to complete the substitution procedure without making a


mistake in arithmetic?

• Will this procedure work — or will some unforeseen difficulty arise?

Almost surely, attempts to solve (2.1.2) by hand, using the substitution procedure, will
lead to arithmetic errors. However, computers and software have developed to the point
where solving a system such as (2.1.2) is routine. In this text, we use the software package
MATLAB to illustrate just how easy it has become to solve equations such as (2.1.2).

The answer to the second question requires knowledge of the theory of linear algebra.
In fact, no difficulties will develop when trying to solve the particular system (2.1.2) using
the substitution algorithm. We discuss why later.

20
Solving Equations by MATLAB

We begin by discussing the information that is needed by MATLAB to solve (2.1.2). The
computer needs to know that there are five equations in five unknowns — but it does not
need to keep track of the unknowns (x1 , x2, x3, x4, x5) by name. Indeed, the computer just
needs to know the matrix of coefficients in (2.1.2)
 
5 −4 3 −6 2
 
 2 1 −1 −1 1 
 
 1 2 1 1 3  (2.1.4*)
 
 
 −2 −1 −1 1 −1 
1 −6 1 1 4

and the vector on the right hand side of (2.1.2)


 
4
 
 6 
 
 19  . (2.1.5*)
 
 
 −12 
4

We now describe how we enter this information into MATLAB. To reduce the drudgery
and to allow us to focus on ideas, the entries in equations having a ∗ after their label (such
as (2.1.4*) have been entered in the laode toolbox. This information can be accessed as
follows. After starting your MATLAB session, type

e2_1_4

followed by a carriage return. This instruction tells MATLAB to load equation (2.1.4*) of
Chapter 2. The matrix of coefficients is now available in MATLAB; note that this matrix is
stored in the 5 × 5 array A. What should appear is:

A =
5 -4 3 -6 2
2 1 -1 -1 1
1 2 1 1 3
-2 -1 -1 1 -1
1 -6 1 1 4

Indeed, comparing this result with (2.1.4*), we see that A contains precisely the same
information.

21
Since the label (2.1.5*) is followed by a ‘∗’, we can enter the vector in (2.1.5*) into
MATLAB by typing

e2_1_5

Note that the right hand side of (2.1.2) is stored in the vector b. MATLAB should have
responded with

b =
4
6
19
-12
4

Now MATLAB has all the information it needs to solve the system of equations given in
(2.1.2). To have MATLAB solve this system, type

x = A\b

to obtain

x =
5.0000
2.0000
3.0000
4.0000
1.0000

This answer is interpreted as follows: the five values of the unknowns x1, x2 , x3 , x4, x5
are stored in the vector x; that is,

x1 = 5, x2 = 2, x3 = 3, x4 = 4, x5 = 1. (2.1.6)

The reader may verify that (2.1.6) is indeed a solution of (2.1.2) by substituting the values
in (2.1.6) into the equations in (2.1.2).

Changing Entries in MATLAB

MATLAB also permits access to single components of x. For instance, type

22
x(5)

and the 5th entry of x is displayed,

ans =
1.0000

We see that the component x(i) of x corresponds to the component xi of the vector x
where i = 1, 2, 3, 4, 5. Similarly, we can access the entries of the coefficient matrix A. For
instance, by typing

A(3,4)

MATLAB responds with

ans =
1

It is also possible to change an individual entry in either a vector or a matrix. For


example, if we enter

A(3,4) = -2

we obtain a new matrix A which when displayed is:

A =
5 -4 3 -6 2
2 1 -1 -1 1
1 2 1 -2 3
-2 -1 -1 1 -1
1 -6 1 1 4

Thus the command A(3,4) = -2 changes the entry in the 3rd row, 4th column of A from 1
to −2. In other words, we have now entered into MATLAB the information that is needed
to solve the system of equations

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 − x3 − x4 + x5 = 6
x1 + 2x2 + x3 − 2x4 + 3x5 = 19
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.

23
As expected, this change in the coefficient matrix results in a change in the solution of
system (2.1.2), as well. Typing

x = A\b

now leads to the solution

x =
1.9455
3.0036
3.0000
1.7309
3.8364

that is displayed to an accuracy of four decimal places.

In the next step, change A as follows:

A(2,3) = 1

The new system of equations is:

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 + x3 − x4 + x5 = 6
x1 + 2x2 + x3 − 2x4 + 3x5 = 19 (2.1.7)
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.
The command

x = A\b

now leads to the message

Warning: Matrix is singular to working precision.

x =
Inf
Inf
Inf
Inf
Inf

24
Obviously, something is wrong; MATLAB cannot find a solution to this system of equations!
Assuming that MATLAB is working correctly, we have shed light on one of our previous
questions: the method of substitution described by (2.1.3) need not always lead to a solution,
even though the method does work for system (2.1.2). Why? As we will see, this is one
of the questions that is answered by the theory of linear algebra. In the case of (2.1.7), it
is fairly easy to see what the difficulty is: the second and fourth equations have the form
y = 6 and −y = −12, respectively.

Warning: The MATLAB command

x = A\b

may give an error message similar to the previous one. When this happens, one must
approach the answer with caution.

Hand Exercises

In Exercises 1 – 3 find solutions to the given system of linear equations.

1.
2x − y = 0
3x = 6
2.
3x − 4y = 2
2y + z = 1
3z = 9
3.
−2x + y = 9
3x + 3y = −9
4. Write the coefficient matrices for each of the systems of linear equations given in Exercises 1 – 3.

5. Neither of the following systems of three equations in three unknowns has a unique solution
— but for different reasons. Solve these systems and explain why these systems cannot be solved
uniquely.

x − y = 4 2x − 4y + 3z = 4
(a) x + 3y − 2z = −6 and (b) 3x − 5y + 3z = 5
4x + 2y − 3z = 1 2y − 3z = −4

6. Last year Dick was twice as old as Jane. Four years ago the sum of Dick’s age and Jane’s age
was twice Jane’s age now. How old are Dick and Jane? Hint: Rewrite the two statements as linear
equations in D — Dick’s age now — and J — Jane’s age now. Then solve the system of linear
equations.

25
7. (a) Find a quadratic polynomial p(x) = ax2 + bx + c satisfying p(0) = 1, p(1) = 5, and
p(−1) = −5.
(b) Prove that for every triple of real numbers L, M , and N , there is a quadratic polynomial
satisfying p(0) = L, p(1) = M , and p(−1) = N .
(c) Let x1 , x2, x3 be three unequal real numbers and let A1 , A2, A3 be three real numbers. Show
that finding a quadratic polynomial q(x) that satisfies q(xi ) = Ai is equivalent to solving a
system of three linear equations.

Computer Exercises

8. Using MATLAB type the commands e2 1 8 and e2 1 9 to load the matrices:


 
−5.6 0.4 −9.8 8.6 4.0 −3.4
 
 −9.1 6.6 −2.3 6.9 8.2 2.7 
 
 3.6 −9.3 −8.7 0.5 5.2 5.1 
A=  3.6 −8.9
 (2.1.8*)
 −1.7 −8.2 −4.8 9.8 

 
 8.7 0.6 3.7 3.1 −9.1 −2.7 
−2.3 3.4 1.8 −1.7 4.7 −5.1
and the vector  
9.7
 
 4.5 
 
 5.1 
b=


 (2.1.9*)
 3.0 
 
 −8.5 
2.6
Solve the corresponding system of linear equations.

9. Matrices are entered in MATLAB as follows. To enter the 2 × 3 matrix A, type A = [ -1 1 2;


4 1 2]. Enter this matrix into MATLAB; the displayed matrix should be

A =
-1 1 2
4 1 2

Now change the entry in the 2nd row, 1st column to −5.

10. Column vectors with n entries are viewed by MATLAB as n × 1 matrices. Enter the vector b
= [1; 2; -4]. Then change the 3rd entry in b to 13.

11. This problem illustrates some of the different ways that MATLAB displays numbers using the
format long, the format short and the format rational commands.

Use MATLAB to solve the following system of equations


2x1 − 4.5x2 + 3.1x3 = 4.2
x1 + x2 + x3 = −5.1
x1 − 6.2x2 + x3 = 1.3 .

26
You may change the format of your answer in MATLAB. For example, to print your result with an
accuracy of 15 digits type format long and redisplay the answer. Similarly, to print your result as
fractions type format rational and redisplay your answer.

12. Enter the following matrix and vector into MATLAB

A = [ 1 0 -1 ; 2 5 3 ; 5 -1 0];
b = [ 1; 1; -2];

and solve the corresponding system of linear equations by typing

x = A\b

Your answer should be

x =
-0.2000
1.0000
-1.2000

Find an integer for the entry in the 2nd row, 2nd column of A so that the solution

x = A\b

is not defined. Hint: The answer is an integer between −4 and 4.

13. The MATLAB command rand(m,n) defines matrices with random entries between 0 and 1. For
example, the command A = rand(5,5) generates a random 5 × 5 matrix, whereas the command b
= rand(5,1) generates a column vector with 5 random entries. Use these commands to construct
several systems of linear equations and then solve them.

14. Suppose that the four substances S1 , S2 , S3 , S4 contain the following percentages of vitamins
A, B, C and F by weight

Vitamin S1 S2 S3 S4
A 25% 19% 20% 3%
B 2% 14% 2% 14%
C 8% 4% 1% 0%
F 25% 31% 25% 16%

Mix the substances S1 , S2 , S3 and S4 so that the resulting mixture contains precisely 3.85 grams of
vitamin A, 2.30 grams of vitamin B, 0.80 grams of vitamin C, and 5.95 grams of vitamin F. How
many grams of each substance have to be contained in the mixture?

Discuss what happens if we require that the resulting mixture contains 2.00 grams of vitamin B
instead of 2.30 grams.

27
2.2 The Geometry of Low-Dimensional Solutions

In this section we discuss how to use MATLAB graphics to solve systems of linear equations in two
and three unknowns. We begin with two dimensions.

Linear Equations in Two Dimensions

The set of all solutions to the equation


2x − y = 6 (2.2.1)
is a straight line in the xy plane; this line has slope 2 and y-intercept equal to −6. We can use
MATLAB to plot the solutions to this equation — though some understanding of the way MATLAB
works is needed.

The plot command in MATLAB plots a sequence of points in the plane, as follows. Let X and
Y be n vectors. Then

plot(X,Y)

will plot the points (X(1), Y (1)), (X(2), Y (2)), . . . , (X(n), Y (n)) in the xy-plane.

To plot points on the line (2.2.1) we need to enter the x-coordinates of the points we wish to
plot. If we want to plot a hundred points, we would be facing a tedious task. MATLAB has a
command to simplify this task. Typing

x = linspace(-5,5,100);

produces a vector x with 100 entries with the 1st entry equal to −5, the last entry equal to 5, and
the remaining 98 entries equally spaced between −5 and 5. MATLAB has another command that
allows us to create a vector of points x. In this command we specify the distance between points
rather than the number of points. That command is:

x = -5:0.1:5;

Producing x by either command is acceptable.

Typing

y = 2*x - 6;

produces a vector whose entries correspond to the y-coordinates of points on the line (2.2.1). Then
typing

plot(x,y)

28
produces the desired plot. It is useful to label the axes on this figure, which is accomplished by
typing

xlabel(’x’)
ylabel(’y’)

We can now use MATLAB to solve the equation (2.1.1) graphically. Recall that (2.1.1) is:

x+ y =7
−x + 3y = 1
A solution to this system of equations is a point that lies on both lines in the system. Suppose that
we search for a solution to this system that has an x-coordinate between −3 and 7. Then type the
commands

x = linspace(-3,7,100);
y = 7 - x;
plot(x,y)
xlabel(’x’)
ylabel(’y’)
hold on
y = (1 + x)/3;
plot(x,y)
axis(’equal’)
grid

The MATLAB command hold on tells MATLAB to keep the present figure and to add the infor-
mation that follows to that figure. The command axis(’equal’) instructs MATLAB to make unit
distances on the x and y axes equal. The last MATLAB command superimposes grid lines. See
Figure 2.1. From this figure you can see that the solution to this system is (x, y) = (5, 2), which we
already knew.
10

6
y

0 5
x

Figure 2.1: Graph of equations in (2.1.1)

There are several principles that follow from this exercise.

29
• Solutions to a single linear equation in two variables form a straight line.
• Solutions to two linear equations in two unknowns lie at the intersection of two straight lines
in the plane.

It follows that the solution to two linear equations in two variables is a single point if the lines are
not parallel. If these lines are parallel and unequal, then there are no solutions, as there are no
points of intersection. If the lines are parallel and equal, i.e., coincident, then there are infinitely
many solutions, namely the set of points on the (one) line. The latter two cases are illustrated below

x + 2y = 2 x + 2y = 2
−2x − 4y = −8 2x + 4y = 4

3
2

2
1
1

-3 -2 -1 1 2 3
-3 -2 -1 1 2 3

Figure 2.2: Parallel and coincident lines

Linear Equations in Three Dimensions

We begin by observing that the set of all solutions to a linear equation in three variables forms a
plane. More precisely, the solutions to the equation

ax + by + cz = d (2.2.2)

form a plane that is perpendicular to the vector (a, b, c) — assuming of course that the vector (a, b, c)
is nonzero.

This fact is most easily proved using the dot product. Recall from Chapter 1 (1.4.2) that the
dot product is defined by
X · Y = x1y1 + x2y2 + x3y3 ,
where X = (x1, x2, x3) and Y = (y1 , y2, y3). We recall from Chapter 1 (1.4.3) the following important
fact concerning dot products:
X ·Y = 0
if and only if the vectors X and Y are perpendicular.

Suppose that N = (a, b, c) #= 0. Consider the plane that is perpendicular to the normal vector
N and that contains the point X0 . If the point X lies in that plane, then X − X0 is perpendicular
to N ; that is,
(X − X0 ) · N = 0. (2.2.3)

30
If we use the notation
X = (x, y, z) and X0 = (x0 , y0, z0 ),

then (2.2.3) becomes


a(x − x0) + b(y − y0 ) + c(z − z0 ) = 0.

Setting
d = ax0 + by0 + cz0

puts equation (2.2.3) into the form (2.2.2). In this way we see that the set of solutions to a single
linear equation in three variables forms a plane. See Figure 2.3.

X
X
0

Figure 2.3: The plane containing X0 and perpendicular to N .

We now use MATLAB to visualize the planes that are solutions to linear equations. Plotting an
equation in three dimensions in MATLAB follows a structure similar to the planar plots. Suppose
that we wish to plot the solutions to the equation

−2x + 3y + z = 2. (2.2.4)

We can rewrite (2.2.4) as


z = 2x − 3y + 2.

It is this function that we actually graph by typing the commands

[x,y] = meshgrid(-5:0.5:5);
z = 2*x - 3*y + 2;
surf(x,y,z)

The first command tells MATLAB to create a square grid in the xy-plane. Grid points are equally
spaced between −5 and 5 at intervals of 0.5 on both the x and y axes. The second command tells
MATLAB to compute the z value of the solution to (2.2.4) at each grid point. The third command
tells MATLAB to graph the surface containing the points (x, y, z). See Figure 2.4.

We can now see that solutions to a system of two linear equations in three unknowns consists
of points that lie simultaneously on two planes. As long as the normal vectors to these planes are

31
30

20

10

−10

−20

−30
5
5
0
0

−5 −5

Figure 2.4: Graph of (2.2.4).

not parallel, the intersection of the two planes will be a line in three dimensions. Indeed, consider
the equations

−2x + 3y + z = 2
2x − 3y + z = 0.

We can graph the solution using MATLAB , as follows. We continue from the previous graph by
typing

hold on
z = -2*x + 3*y;
surf(x,y,z)

The result, which illustrates that the intersection of two planes in R3 is generally a line, is shown in
Figure 2.5.

30

20

10

−10

−20

−30
5
5
0
0

−5 −5

Figure 2.5: Line of intersection of two planes.

We can now see geometrically that the solution to three simultaneous linear equations in three
unknowns will generally be a point — since generally three planes in three space intersect in a point.

32
To visualize this intersection, as shown in Figure 2.6, we extend the previous system of equations to

−2x + 3y + z = 2
2x − 3y + z = 0
−3x + 0.2y + z = 1.

Continuing in MATLAB type

hold on
z = 3*x - 0.2*y + 1;
surf(x,y,z)

30

20

10

−10

−20

−30
5
5
0
0

−5 −5

Figure 2.6: Point of intersection of three planes.

Unfortunately, visualizing the point of intersection of these planes geometrically does not really
help to get an accurate numerical value of the coordinates of this intersection point. However, we
can use MATLAB to solve this system accurately. Denote the 3 × 3 matrix of coefficients by A,
the vector of coefficients on the right hand side by b, and the solution by x. Solve the system in
MATLAB by typing

A = [ -2 3 1; 2 -3 1; -3 0.2 1];
b = [2; 0; 1];
x = A\b

The point of intersection of the three planes is at

x =
0.0233
0.3488
1.0000

Three planes in three dimensional space need not intersect in a single point. For example, if
two of the planes are parallel they need not intersect at all. The normal vectors must point in

33
independent directions to guarantee that the intersection is a point. Understanding the notion of
independence (it is more complicated than just not being parallel) is part of the subject of linear
algebra. MATLAB returns “Inf”, which we have seen previously, when these normal vectors are
(approximately) dependent. For example, consider Exercise 6.

Plotting Nonlinear Functions in MATLAB

Suppose that we want to plot the graph of a nonlinear function of a single variable, such as

y = x2 − 2x + 3 (2.2.5)

on the interval [−2, 5] using MATLAB. There is a difficulty: How do we enter the term x2? For
example, suppose that we type

x = linspace(-2,5);
y = x*x - 2*x + 3;

Then MATLAB responds with

??? Error using ==> *


Inner matrix dimensions must agree.

The problem is that in MATLAB the variable x is a vector of 100 equally spaced points x(1), x(2),
..., x(100). What we really need is a vector consisting of entries x(1)*x(1), x(2)*x(2), ...,
x(100)*x(100). MATLAB has the facility to perform this operation automatically and the syntax
for the operation is .* rather than *. So typing

x = linspace(-2,5);
y = x.*x - 2*x + 3;
plot(x,y)

produces the graph of (2.2.5) in Figure 2.7. In a similar fashion, MATLAB has the ‘dot’ operations
of ./, .\, and .ˆ, as well as .*.

Hand Exercises

1. Find the equation for the plane perpendicular to the vector (2, 3, 1) and containing the point
(−1, −2, 3).

2. Determine three systems of two linear equations in two unknowns so that the first system has a
unique solution, the second system has an infinite number of solutions, and the third system has no
solutions.

34
18

16

14

12

10

y
8

2
−2 −1 0 1 2 3 4 5
x

Figure 2.7: Graph of y = x2 − 2x + 3.

3. Write the equation of the plane through the origin containing the vectors (1, 0, 1) and (2, −1, 2).

4. Find a system of two linear equations in three unknowns whose solution set is the line consisting
of scalar multiples of the vector (1, 2, 1).

5. (a) Find a vector u normal to the plane 2x + 2y + z = 3.


(b) Find a vector v normal to the plane x + y + 2z = 4.
(c) Find the cosine of the angle between the vectors u and v. Use MATLAB to find the angle in
degrees.

6. Determine graphically the geometry of the set of solutions to the system of equations in the three
unknowns x, y, z:
x + 3z = 1
3x − z = 1
z = 2
by sketching the plane of solutions for each equation individually. Describe in words why there are
no solutions to this system. (Use MATLAB graphics to verify your sketch. Note that you should
enter the last equation as z = 2 - 0*x - 0*y and the first two equations with 0*y terms. Try
different views — but include view([0 1 0]) as one view.)

Computer Exercises

7. Use MATLAB to solve graphically the planar system of linear equations

x + 4y = −4
4x + 3y = 4

to an accuracy of two decimal points.

Hint: The MATLAB command zoom on allows us to view the plot in a window whose axes are
one-half those of original. Each time you click with the mouse on a point, the axes’ limits are halved

35
and centered at the designated point. Coupling zoom on with grid on allows you to determine
approximate numerical values for the intersection point.

8. Use MATLAB to solve graphically the planar system of linear equations

4.23x + 0.023y = −1.1


1.65x − 2.81y = 1.63

to an accuracy of two decimal points.

9. Use MATLAB to find an approximate graphical solution to the three dimensional system of linear
equations
3x − 4y + 2z = −11
2x + 2y + z = 7
−x + y − 5z = 7.
Then use MATLAB to find an exact solution.

10. Use MATLAB to determine graphically the geometry of the set of solutions to the system of
equations:
x + 3y + 4z = 5
2x + y + z = 1
−4x + 3y + 5z = 7.
Attempt to use MATLAB to find an exact solution to this system and discuss the implications of
your calculations.

Hint: After setting up the graphics display in MATLAB, you can use the command view([0,1,0])
to get a better view of the solution point.

11. Use MATLAB to graph the function y = 2 − x sin(x2 − 1) on the interval [−2, 3]. How many
relative maxima does this function have on this interval?

2.3 Gaussian Elimination

A general system of m linear equations in n unknowns has the form

a11x1 + a12x2 + ··· + a1nxn = b1


a21x1 + a22x2 + ··· + a2nxn = b2
.. .. .. .. (2.3.1)
. . . .
am1 x1 + am2 x2 + ··· + amn xn = bm .

The entries aij and bi are constants. Our task is to find a method for solving (2.3.1) for the variables
x1 , . . . , x n .

36
Easily Solved Equations

Some systems are easily solved. The system of three equations (m = 3) in three unknowns (n = 3)

x1 + 2x2 + 3x3 = 10
x2 − 1
5 x3 = 7
5
(2.3.2)
x3 = 3

is one example. The 3rd equation states that x3 = 3. Substituting this value into the 2nd equation
allows us to solve the 2nd equation for x2 = 2. Finally, substituting x2 = 2 and x3 = 3 into the
1st equation allows us to solve for x1 = −3. The process that we have just described is called back
substitution.

Next, consider the system of two equations (m = 2) in three unknowns (n = 3):

x1 + 2x2 + 3x3 = 10
(2.3.3)
x3 = 3 .

The 2nd equation in (2.3.3) states that x3 = 3. Substituting this value into the 1st equation leads
to the equation
x1 = 1 − 2x2.

We have shown that every solution to (2.3.3) has the form (x1, x2, x3) = (1 − 2x2, x2, 3) and that
every vector (1 − 2x2, x2, 3) is a solution of (2.3.3). Thus, there is an infinite number of solutions to
(2.3.3), and these solutions can be parameterized by one number x2.

Equations Having No Solutions

Note that the system of equations

x1 − x2 = 1
x1 − x2 = 2

has no solutions.

Definition 2.3.1. A linear system of equations is inconsistent if the system has no solutions and
consistent if the system does have solutions.

As discussed in the previous section, (2.1.7) is an example of a linear system that MATLAB
cannot solve. In fact, that system is inconsistent — inspect the 2nd and 4th equations in (2.1.7).

Gaussian elimination is an algorithm for finding all solutions to a system of linear equations by
reducing the given system to ones like (2.3.2) and (2.3.3), that are easily solved by back substitution.
Consequently, Gaussian elimination can also be used to determine whether a system is consistent or
inconsistent.

37
Elementary Equation Operations

There are three ways to change a system of equations without changing the set of solutions; Gaussian
elimination is based on this observation. The three elementary operations are:

1. Swap two equations.

2. Multiply a single equation by a nonzero number.

3. Add a scalar multiple of one equation to another.

We begin with an example:

x1 + 2x2 + 3x3 = 10
x1 + 2x2 + x3 = 4 (2.3.4)
2x1 + 9x2 + 5x3 = 27 .

Gaussian elimination works by eliminating variables from the equations in a fashion similar to the
substitution method in the previous section. To begin, eliminate the variable x1 from all but the 1st
equation, as follows. Subtract the 1st equation from the 2nd , and subtract twice the 1st equation
from the 3rd , obtaining:
x1 + 2x2 + 3x3 = 10
−2x3 = −6 (2.3.5)
5x2 − x3 = 7 .
Next, swap the 2nd and 3rd equations, so that the coefficient of x2 in the new 2nd equation is nonzero.
This yields
x1 + 2x2 + 3x3 = 10
5x2 − x3 = 7 (2.3.6)
−2x3 = −6 .
Now, divide the 2nd equation by 5 and the 3rd equation by −2 to obtain a system of equations
identical to our first example (2.3.2), which we solved by back substitution.

Augmented Matrices

The process of performing Gaussian elimination when the number of equations is greater than two or
three is painful. The computer, however, can help with the manipulations. We begin by introducing
the augmented matrix . The augmented matrix associated with (2.3.1) has m rows and n+1 columns
and is written as:  
a11 a12 · · · a1n b1
 
 a21 a22 · · · a2n b2 
 .. .. .. ..  (2.3.7)
 
 . . . . 
am1 am2 · · · amn bm
The augmented matrix contains all of the information that is needed to solve system (2.3.1).

38
Elementary Row Operations

The elementary operations used in Gaussian elimination can be interpreted as row operations on
the augmented matrix, as follows:

1. Swap two rows.


2. Multiply a single row by a nonzero number.
3. Add a scalar multiple of one row to another.

We claim that by using these elementary row operations intelligently, we can always solve a consistent
linear system — indeed, we can determine when a linear system is consistent or inconsistent. The
idea is to perform elementary row operations in such a way that the new augmented matrix has zero
entries below the diagonal.

We describe this process inductively. Begin with the 1st column. We assume for now that some
entry in this column is nonzero. If a11 = 0, then swap two rows so that the number a11 is nonzero.
Then divide the 1st row by a11 so that the leading entry in that row is 1. Now subtract ai1 times
the 1st row from the ith row for each row i from 2 to m. The end result is that the 1st column has
a 1 in the 1st row and a 0 in every row below the 1st. The result is
 
1 ∗ ··· ∗
 
 0 ∗ ··· ∗ 
 . . .. .. 
 . . .
 . . . . 
0 ∗ ··· ∗

Next we consider the 2nd column. We assume that some entry in that column below the 1st
row is nonzero. So, if necessary, we can swap two rows below the 1st row so that the entry a22 is
nonzero. Then we divide the 2nd row by a22 so that its leading nonzero entry is 1. Then we subtract
appropriate multiples of the 2nd row from each row below the 2nd so that all the entries in the 2nd
column below the 2nd row are 0. The result is
 
1 ∗ ··· ∗
 
 0 1 ··· ∗ 
 . . .. .. 
 . . .
 . . . . 
0 0 ··· ∗

Then we continue with the 3rd column. That’s the idea. However, does this process always work
and what happens if all of the entries in a column are zero? Before answering these questions we do
experimentation with MATLAB.

Row Operations in MATLAB

In MATLAB the ith row of a matrix A is specified by A(i,:). Thus to replace the 5th row of a
matrix A by twice itself, we need only type:

39
A(5,:) = 2*A(5,:)

In general, we can replace the ith row of the matrix A by c times itself by typing

A(i,:) = c*A(i,:)

Similarly, we can divide the ith row of the matrix A by the nonzero number c by typing

A(i,:) = A(i,:)/c

The third elementary row operation is performed similarly. Suppose we want to add c times the
ith row to the j th row, then we type

A(j,:) = A(j,:) + c*A(i,:)

For example, subtracting 3 times the 7th row from the 4th row of the matrix A is accomplished by
typing:

A(4,:) = A(4,:) - 3*A(7,:)

The first elementary row operation, swapping two rows, requires a different kind of MATLAB
command. In MATLAB, the ith and j th rows of the matrix A are permuted by the command

A([i j],:) = A([j i],:)

So, to swap the 1st and 3rd rows of the matrix A, we type

A([1 3],:) = A([3 1],:)

Examples of Row Reduction in MATLAB

Let us see how the row operations can be used in MATLAB. As an example, we consider the
augmented matrix
 
1 3 0 −1 −8
 2 6 −4 4 4 
 
  (2.3.8*)
 1 0 −1 −9 −35 
0 1 0 3 10

We enter this information into MATLAB by typing

e2_3_8

40
which produces the result

A =
1 3 0 -1 -8
2 6 -4 4 4
1 0 -1 -9 -35
0 1 0 3 10

We now perform Gaussian elimination on A, and then solve the resulting system by back substi-
tution. Gaussian elimination uses elementary row operations to set the entries that are in the lower
left part of A to zero. These entries are indicated by numbers in the following matrix:

* * * * *
2 * * * *
1 0 * * *
0 1 0 * *

Gaussian elimination works inductively. Since the first entry in the matrix A is equal to 1, the
first step in Gaussian elimination is to set to zero all entries in the 1st column below the 1st row.
We begin by eliminating the 2 that is the first entry in the 2nd row of A. We replace the 2nd row by
the 2nd row minus twice the 1st row. To accomplish this elementary row operation, we type

A(2,:) = A(2,:) - 2*A(1,:)

and the result is

A =
1 3 0 -1 -8
0 0 -4 6 20
1 0 -1 -9 -35
0 1 0 3 10

In the next step, we eliminate the 1 from the entry in the 3rd row, 1st column of A. We do this by
typing

A(3,:) = A(3,:) - A(1,:)

which yields

A =
1 3 0 -1 -8
0 0 -4 6 20
0 -3 -1 -8 -27
0 1 0 3 10

41
Using elementary row operations, we have now set the entries in the 1st column below the 1st
row to 0. Next, we alter the 2nd column. We begin by swapping the 2nd and 4th rows so that the
leading nonzero entry in the 2nd row is 1. To accomplish this swap, we type

A([2 4],:) = A([4 2],:)

and obtain

A =
1 3 0 -1 -8
0 1 0 3 10
0 -3 -1 -8 -27
0 0 -4 6 20

The next elementary row operation is the command

A(3,:) = A(3,:) + 3*A(2,:)

which leads to

A =
1 3 0 -1 -8
0 1 0 3 10
0 0 -1 1 3
0 0 -4 6 20

Now we have set all entries in the 2nd column below the 2nd row to 0.

Next, we set the first nonzero entry in the 3rd row to 1 by multiplying the 3rd row by −1,
obtaining

A =
1 3 0 -1 -8
0 1 0 3 10
0 0 1 -1 -3
0 0 -4 6 20

Since the leading nonzero entry in the 3rd row is 1, we next eliminate the nonzero entry in the
3rd
column, 4th row. This is accomplished by the following MATLAB command:

A(4,:) = A(4,:) + 4*A(3,:)

Finally, divide the 4th row by 2 to obtain:

42
A =
1 3 0 -1 -8
0 1 0 3 10
0 0 1 -1 -3
0 0 0 1 4

By using elementary row operations, we have arrived at the system

x1 + 3x2 − x4 = −8
x2 + 3x4 = 10
(2.3.9)
x3 − x4 = −3
x4 = 4 ,

that can now be solved by back substitution. We obtain

x4 = 4, x3 = 1, x2 = −2, x1 = 2. (2.3.10)

We return to the original set of equations corresponding to (2.3.8*)

x1 + 3x2 − x4 = −8
2x1 + 6x2 − 4x3 + 4x4 = 4
(2.3.11*)
x1 − x3 − 9x4 = −35
x2 + 3x4 = 10 .

Load the corresponding linear system into MATLAB by typing

e2_3_11

The information in (2.3.11*) is contained in the coefficient matrix C and the right hand side b. A
direct solution is found by typing

x = C\b

which yields the same answer as in (2.3.10), namely,

x =
2.0000
-2.0000
1.0000
4.0000

43
Introduction to Echelon Form

Next, we discuss how Gaussian elimination works in an example in which the number of rows and
the number of columns in the coefficient matrix are unequal. We consider the augmented matrix
 
1 0 −2 3 4 0 1
 0 1 2 4 0 −2 0 
 
  (2.3.12*)
 2 −1 −4 0 −2 8 −4 
−3 0 6 −8 −12 2 −2

This information is entered into MATLAB by typing

e2_3_12

Again, the augmented matrix is denoted by A.

We begin by eliminating the 2 in the entry in the 3rd row, 1st column. To accomplish the
corresponding elementary row operation, we type

A(3,:) = A(3,:) - 2*A(1,:)

resulting in

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 -1 0 -6 -10 8 -6
-3 0 6 -8 -12 2 -2

We proceed with

A(4,:) = A(4,:) + 3*A(1,:)

to create two more zeros in the 4th row. Finally, we eliminate the -1 in the 3rd row, 2nd column by

A(3,:) = A(3,:) + A(2,:)

to arrive at

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 0 2 -2 -10 6 -6
0 0 0 1 0 2 1

44
Next we set the leading nonzero entry in the 3rd row to 1 by dividing the 3rd row by 2. That is, we
type

A(3,:) = A(3,:)/2

to obtain

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 0 1 -1 -5 3 -3
0 0 0 1 0 2 1

We say that the matrix A is in (row) echelon form since the first nonzero entry in each row is a 1,
each entry in a column below a leading 1 is 0, and the leading 1 moves to the right as you go down
the matrix. In row echelon form, the entries where leading 1’s occur are called pivots.

If we compare the structure of this matrix to the ones we have obtained previously, then we see
that here we have two columns too many. Indeed, we may solve these equations by back substitution
for any choice of the variables x5 and x6.

The idea behind back substitution is to solve the last equation for the variable corresponding to
the first nonzero coefficient. In this case, we use the 4th equation to solve for x4 in terms of x5 and
x6, and then we substitute for x4 in the first three equations. This process can also be accomplished
by elementary row operations. Indeed, eliminating the variable x4 from the first three equations is
the same as using row operations to set the first three entries in the 4th column to 0. We can do
this by typing

A(3,:) = A(3,:) + A(4,:);


A(2,:) = A(2,:) - 4*A(4,:);
A(1,:) = A(1,:) - 3*A(4,:)

Remember: By typing semicolons after the first two rows, we have told MATLAB not to print the
intermediate results. Since we have not typed a semicolon after the 3rd row, MATLAB outputs

A =
1 0 -2 0 4 -6 -2
0 1 2 0 0 -10 -4
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

We proceed with back substitution by eliminating the nonzero entries in the first two rows of the
3rd column. To do this, type

45
A(2,:) = A(2,:) - 2*A(3,:);
A(1,:) = A(1,:) + 2*A(3,:)

which yields

A =
1 0 0 0 -6 4 -6
0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

The augmented matrix is now in reduced echelon form and the corresponding system of equations
has the form
x1 − 6x5 + 4x6 = −6
x2 + 10x5 − 20x6 = 0
(2.3.13)
x3 − 5x5 + 5x6 = −2
x4 + 2x6 = 1 ,
A matrix is in reduced echelon form if it is in echelon form and if every entry in a column containing
a pivot, other than the pivot itself, is 0.

Reduced echelon form allows us to solve directly this system of equations in terms of the variables
x5 and x6,    
x1 −6 + 6x5 − 4x6
   
 x2   −10x5 + 20x6 
   
 x3   −2 + 5x5 − 5x6 
 = . (2.3.14)
 x   1 − 2x6 
 4   
   
 x5   x5 
x6 x6
It is important to note that every consistent system of linear equations corresponding to an aug-
mented matrix in reduced echelon form can be solved as in (2.3.14) — and this is one reason for
emphasizing reduced echelon form. We will discuss the reduction to reduced echelon form in more
detail in the next section.

Hand Exercises

In Exercises 1 – 3 determine whether the given matrix is in reduced echelon form.


 
1 −1 0 1
 
1.  0 1 0 −6 .
0 0 1 0
 
1 0 −2 0
 
2.  0 1 4 0 .
0 0 0 1

46
 
0 1 0 3
 
3.  0 0 2 1 .
0 0 0 0

In Exercises 4 – 6 we list the reduced echelon form of an augmented matrix of a system of linear
equations. Which columns in these augmented matrices contain pivots? Describe all solutions to
these systems of equations in the form of (2.3.14).
 
1 4 0 0
 
4.  0 0 1 5 .
0 0 0 0
 
1 2 0 0 0
 
5.  0 0 1 1 0 .
0 0 0 0 1
 
1 −6 0 0 1
 
6.  0 0 1 0 9 .
0 0 0 0 0

7. (a) Consider the 2 × 2 matrix ' (


a b
(2.3.15)
c 1
where a, b, c ∈ R and a #= 0. Show that (2.3.15) is row equivalent to the matrix
' (
1 b
a .
0 a−bca

(b) Show that (2.3.15) is row equivalent to the identity matrix if and only if a #= bc.

8. Use row reduction and back substitution to solve the following system of two equations in three
unknowns:
x1 − x2 + x3 = 1
2x1 + x2 − x3 = −1
Is (1, 2, 2) a solution to this system? If not, is there a solution for which x3 = 2?

In Exercises 9 – 10 determine the augmented matrix and all solutions for each system of linear
equations

x−y+z = 1
9. 4x + y + z = 5 .
2x + 3y − z = 2

2x − y + z + w = 1
10. .
x + 2y − z + w = 7

In Exercises 11 – 14 consider the augmented matrices representing systems of linear equations, and
decide

47
(a) if there are zero, one or infinitely many solutions, and
(b) if solutions are not unique, how many variables can be assigned arbitrary values.
 
1 0 0 3
 
11.  0 2 1 1 .
0 0 0 0
 
1 2 0 0 3
 
12.  0 1 1 0 1 .
0 0 0 0 2
 
1 0 2 1
 
13.  0 5 0 2 .
0 0 4 3
 
1 0 2 0 3
 2 3 6 1 16 
 
14.  .
 0 3 2 1 10 
0 0 0 0 0

A system of m equations in n unknowns is linear if it has the form (2.3.1); any other system of
equations is called nonlinear. In Exercises 15 – 19 decide whether each of the given systems of
equations is linear or nonlinear.

15.
3x1 − 2x2 + 14x3 − 7x4 = 35
2x1 + 5x2 − 3x3 + 12x4 = −1
16.
3x1 + πx2 = 0
2x1 − ex2 = 1
17.
3x1x2 − x2 = 10
2x1 − x22 = −5
18.
3x1 − x2 = cos(12)
2x1 − x2 = −5
19.
3x1 − sin(x2) = 12
2x1 − x3 = −5

Computer Exercises

In Exercises 20 – 22 use elementary row operations and MATLAB to put each of the given matrices
into row echelon form. Suppose that the matrix is the augmented matrix for a system of linear
equations. Is the system consistent or inconsistent?

48
20. ' (
2 1 1
.
4 2 3

21.  
3 −4 0 2
 
 0 2 3 1 .
3 1 4 5

22.  
−2 1 9 1
 
 3 3 −4 2  .
1 4 5 5

Observation: In standard format MATLAB displays all nonzero real numbers with four decimal
places while it displays zero as 0. An unfortunate consequence of this display is that when a matrix
has both zero and noninteger entries, the columns will not align — which is a nuisance. You can
work with rational numbers rather than decimal numbers by typing format rational. Then the
columns will align.

23. Load the following 6 × 8 matrix A into MATLAB by typing e2 3 16.


 
0 0 0 1 3 5 0 9
 
 0 3 6 −6 −6 −12 0 1 
 
 0 2 4 −5 −7 14 0 1 
A=  0 1 2

 1 14 21 0 −1  
 
 0 0 0 2 4 9 0 7 
0 5 10 −11 −13 2 0 2

Use MATLAB to transform this matrix to row echelon form.

24. Use row reduction and back substitution to solve the following system of linear equations:

2x1 + 3x2 − 4x3 + x4 = 2


3x1 − x2 − x3 + 2x4 = 4
x1 − 7x2 + 5x3 − x4 = 6

25. Comment: To understand the point of this exercise you must begin by typing the MATLAB
command format short e. This command will set a format in which you can see the difficulties
that sometimes arise in numerical computations.

Consider the following two 3 × 3-matrices:


   
1 3 4 3 1 4
   
A =  2 1 1  and B =  1 2 1 .
−4 3 5 3 −4 5

Note that matrix B is obtained from matrix A by interchanging the first two columns.

(a) Use MATLAB to put A into row echelon form using the transformations

49
1. Subtract 2 times the 1st row from the 2nd.
2. Add 4 times the 1st row to the 3rd .
3. Divide the 2nd row by −5.
4. Subtract 15 times the 2nd row from the 3rd .

(b) Put B by hand into row echelon form using the transformations

1. Divide the 1st row by 3.


2. Subtract the 1st row from the 2nd .
3. Subtract 3 times the 1st row from the 3rd .
4. Multiply the 2nd row by 3/5.
5. Add 5 times the 2nd row to the 3rd .

(c) Use MATLAB to put B into row echelon form using the same transformations as in part (b).

(d) Discuss the outcome of the three transformations. Is there a difference in the results? Would
you expect to see a difference? Could the difference be crucial when solving a system of linear
equations?

26. Find a cubic polynomial


p(x) = ax3 + bx2 + cx + d

so that p(1) = 2, p(2) = 3, p!(−1) = −1, and p! (3) = 1.

2.4 Reduction to Echelon Form

In this section, we formalize our previous numerical experiments. We define more precisely the
notions of echelon form and reduced echelon form matrices, and we prove that every matrix can be
put into reduced echelon form using a sequence of elementary row operations. Consequently, we will
have developed an algorithm for determining whether a system of linear equations is consistent or
inconsistent, and for determining all solutions to a consistent system.

Definition 2.4.1. A matrix E is in (row) echelon form if two conditions hold.

(a) The first nonzero entry in each row of E is equal to 1. This leading entry 1 is called a pivot.

(b) A pivot in the (i + 1)st row of E occurs in a column to the right of the column where the pivot
in the ith row occurs.

Here are three examples of matrices that are in echelon form. The pivot in each row (which is
always equal to 1) is preceded by a ∗.
 
∗1 0 −1 0 −6 4 −6
 0 ∗1 4 0 0 −2 0 
 
 
 0 0 0 ∗1 −5 5 −2 
0 0 0 0 0 ∗1 0

50
 
∗1 0 −1 0 −6
 0 ∗1 0 3 0 
 
 
 0 0 0 ∗1 −5 
0 0 0 0 0
 
0 ∗1 −1 14 −6
 0 0 0 ∗1 15 
 
 
 0 0 0 0 0 
0 0 0 0 0
Here are three examples of matrices that are not in echelon form.
     
0 0 1 15 1 −1 14 −6 1 −1 14 −6
     
 1 −1 14 −6  and  0 0 3 15  and  0 0 0 0 
0 0 0 0 0 0 0 0 0 0 1 15

Definition 2.4.2. Two m × n matrices are row equivalent if one can be transformed to the other
by a sequence of elementary row operations.

Let A = (aij ) be a matrix with m rows and n columns. We want to show that we can perform
row operations on A so that the transformed matrix is in echelon form; that is, A is row equivalent
to a matrix in echelon form. If A = 0, then we are finished. So we assume that some entry in A is
nonzero and that the 1st column where that nonzero entry occurs is in the kth column. By swapping
rows we can assume that a1k is nonzero. Next, divide the 1st row by a1k , thus setting a1k = 1. Now,
using MATLAB notation, perform the row operations

A(i,:) = A(i,:) - A(i,k)*A(1,:)

for each i ≥ 2. This sequence of row operations leads to a matrix whose first nonzero column has a
1 in the 1st row and a zero in each row below the 1st row.

Now we look for the next column that has a nonzero entry below the 1st row and call that
column #. By construction # > k. We can swap rows so that the entry in the 2nd row, #th column
is nonzero. Then we divide the 2nd row by this nonzero element, so that the pivot in the 2nd row
is 1. Again we perform elementary row operations so that all entries below the 2nd row in the #th
column are set to 0. Now proceed inductively until we run out of nonzero rows.

This argument proves:

Proposition 2.4.3. Every matrix is row equivalent to a matrix in echelon form.

More importantly, the previous argument provides an algorithm for transforming matrices into
echelon form.

Reduction to Reduced Echelon Form

Definition 2.4.4. A matrix E is in reduced echelon form if

51
(a) E is in echelon form, and

(b) in every column of E having a pivot, every entry in that column other than the pivot is 0.

We can now prove

Theorem 2.4.5. Every matrix is row equivalent to a matrix in reduced echelon form.

Proof: Let A be a matrix. Proposition 2.4.3 states that we can transform A by elementary row
operations to a matrix E in echelon form. Next we transform E into reduced echelon form by some
additional elementary row operations, as follows. Choose the pivot in the last nonzero row of E.
Call that row #, and let k be the column where the pivot occurs. By adding multiples of the #th row
to the rows above, we can transform each entry in the kth column above the pivot to 0. Note that
none of these row operations alters the matrix before the kth column. (Also note that this process
is identical to the process of back substitution.)

Again we proceed inductively by choosing the pivot in the (# − 1)st row, which is 1, and zeroing
out all entries above that pivot using elementary row operations.

Reduced Echelon Form in MATLAB

Preprogrammed into MATLAB is a routine to row reduce any matrix to reduced echelon form. The
command is rref. For example, recall the 4 × 7 matrix A in (2.3.12*) by typing e2 3 12. Put A
into reduced row echelon form by typing rref(A) and obtaining

ans =
1 0 0 0 -6 4 -6
0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

Compare the result with the system of equations (2.3.13).

Solutions to Systems of Linear Equations

Originally, we introduced elementary row operations as operations that do not change solutions to
the linear system. More precisely, we discussed how solutions to the original system are still solutions
to the transformed system and how no new solutions are introduced by elementary row operations.
This argument is most easily seen by observing that

all elementary row operations are invertible

— they can be undone.

52
For example, swapping two rows is undone by just swapping these rows again. Similarly, multi-
plying a row by a nonzero number c is undone by just dividing that same row by c. Finally, adding
c times the j th row to the ith row is undone by subtracting c times the j th row from the ith row.

Thus, we can make several observations about solutions to linear systems. Let E be an aug-
mented matrix corresponding to a system of linear equations having n variables. Since an augmented
matrix is formed from the matrix of coefficients by adding a column, we see that the augmented
matrix has n + 1 columns.

Theorem 2.4.6. Suppose that E is an m × (n + 1) augmented matrix that is in reduced echelon


form. Let # be the number of nonzero rows in E

(a) The system of linear equations corresponding to E is inconsistent if and only if the #th row in
E has a pivot in the (n + 1)st column.

(b) If the linear system corresponding to E is consistent, then the set of all solutions is parame-
terized by n − # parameters.

Proof: Suppose that the last nonzero row in E has its pivot in the (n + 1)st column. Then the
corresponding equation is:
0x1 + 0x2 + · · · + 0xn = 1,

which has no solutions. Thus the system is inconsistent.

Conversely, suppose that the last nonzero row has its pivot before the last column. Without loss
of generality, we can renumber the columns — that is, we can renumber the variables xj — so that
the pivot in the ith row occurs in the ith column, where 1 ≤ i ≤ #. Then the associated system of
linear equations has the form:

x1 + a1,!+1x!+1 + · · · + a1,nxn = b1
x2 + a2,!+1x!+1 + · · · + a2,nxn = b2
.. ...
.
x! + a!,!+1 x!+1 + · · · + a!,n xn = b! .

This system can be rewritten in the form:

x1 = b1 − a1,!+1x!+1 − · · · − a1,nxn
x2 = b2 − a2,!+1x!+1 − · · · − a2,nxn (2.4.1)
.. ..
. .
x! = b! − a!,!+1 x!+1 − · · · − a!,nxn.

Thus, each choice of the n − # numbers x!+1 , . . . , xn uniquely determines values of x1, . . . , x! so
that x1, . . . , xn is a solution to this system. In particular, the system is consistent, so (a) is proved;
and the set of all solutions is parameterized by n − # numbers, so (b) is proved.

53
Two Examples Illustrating Theorem 2.4.6

The reduced echelon form matrix  


1 5 0 0
 
E= 0 0 1 0 
0 0 0 1
is the augmented matrix of an inconsistent system of three equations in three unknowns.

The reduced echelon form matrix


 
1 5 0 2
 
E= 0 0 1 5 
0 0 0 0

is the augmented matrix of a consistent system of three equations in three unknowns x1, x2, x3. For
this matrix n = 3 and # = 2. It follows from Theorem 2.4.6 that the solutions to this system are
specified by one parameter. Indeed, the solutions are

x1 = 2 − 5x2
x3 = 5

and are specified by the one parameter x2 .

Consequences of Theorem 2.4.6

It follows from Theorem 2.4.6 that linear systems of equations with fewer equations than unknowns
and with zeros on the right hand side always have nonzero solutions. More precisely:

Corollary 2.4.7. Let A be an m × n matrix where m < n. Then the system of linear equations
whose augmented matrix is (A|0) has a nonzero solution.

Proof: Perform elementary row operations on the augmented matrix (A|0) to arrive at the reduced
echelon form matrix (E|0). Since the zero vector is a solution, the associated system of equations is
consistent. Now the number of nonzero rows # in (E|0) is less than or equal to the number of rows
m in E. By assumption m < n and hence # < n. It follows from Theorem 2.4.6 that solutions to the
linear system are parametrized by n − # ≥ 1 parameters and that there are nonzero solutions.

Recall that two m × n matrices are row equivalent if one can be transformed to the other by
elementary row operations.

Corollary 2.4.8. Let A be an n × n square matrix and let b be in Rn . Then A is row equivalent to
the identity matrix In if and only if the system of linear equations whose augmented matrix is (A|b)
has a unique solution.

Proof: Suppose that A is row equivalent to In . Then, by using the same sequence of elementary
row operations, it follows that the n × (n + 1) augmented matrix (A|b) is row equivalent to (In |c)

54
for some vector c ∈ Rn. The system of linear equations that corresponds to (In |c) is:

x1 = c1
.. .. ..
. . .
xn = cn,

which transparently has the unique solution x = (c1 , . . . , cn). Since elementary row operations do
not change the solutions of the equations, the original augmented system (A|b) also has a unique
solution.

Conversely, suppose that the system of linear equations associated to (A|b) has a unique so-
lution. Suppose that (A|b) is row equivalent to a reduced echelon form matrix E. Suppose that
the last nonzero row in E is the #th row. Since the system has a solution, it is consistent. Hence
Theorem 2.4.6(b) implies that the solutions to the system corresponding to E are parameterized by
n − # parameters. If # < n, then the solution is not unique. So # = n.

Next observe that since the system of linear equations is consistent, it follows from Theo-
rem 2.4.6(a) that the pivot in the nth row must occur in a column before the (n + 1)st. It follows
that the reduced echelon matrix E = (In |c) for some c ∈ Rn. Since (A|b) is row equivalent to (In |c),
it follows, by using the same sequence of elementary row operations, that A is row equivalent to
In .

Uniqueness of Reduced Echelon Form and Rank

Abstractly, our discussion of reduced echelon form has one point remaining to be proved. We know
that every matrix A can be transformed by elementary row operations to reduced echelon form.
Suppose, however, that we use two different sequences of elementary row operations to transform A
to two reduced echelon form matrices E1 and E2. Can E1 and E2 be different? The answer is: No.

Theorem 2.4.9. For each matrix A, there is precisely one reduced echelon form matrix E that is
row equivalent to A.

The proof of Theorem 2.4.9 is given in Section 2.6. Since every matrix is row equivalent to a
unique matrix in reduced echelon form, we can define the rank of a matrix as follows.

Definition 2.4.10. Let A be an m × n matrix that is row equivalent to a reduced echelon form
matrix E. Then the rank of A, denoted rank(A), is the number of nonzero rows in E.

We make three remarks concerning the rank of a matrix.

• An echelon form matrix is always row equivalent to a reduced echelon form matrix with the
same number of nonzero rows. Thus, to compute the rank of a matrix, we need only perform
elementary row operations until the matrix is in echelon form.
• The rank of any matrix is easily computed in MATLAB. Enter a matrix A and type rank(A).

• The number # in the statement of Theorem 2.4.6 is just the rank of E.

55
In particular, if the rank of the augmented matrix corresponding to a consistent system of linear
equations in n unknowns has rank #, then the solutions to this system are parametrized by n − #
parameters.

Hand Exercises

In Exercises 1 – 2 row reduce the given matrix to reduced echelon form and determine the rank of
A.
 
1 2 1 6
 
1. A =  3 6 1 14 
1 2 2 8
 
1 −2 3
 
2. B =  3 −6 9 
1 −8 2

3. The augmented matrix of a consistent system of five equations in seven unknowns has rank equal
to three. How many parameters are needed to specify all solutions?

4. The augmented matrix of a consistent system of nine equations in twelve unknowns has rank
equal to five. How many parameters are needed to specify all solutions?

Computer Exercises

In Exercises 5 – 8, use rref on the given augmented matrices to determine whether the associ-
ated system of linear equations is consistent or inconsistent. If the equations are consistent, then
determine how many parameters are needed to enumerate all solutions.

5.  
2 1 3 −2 4 1
 5 12 −1 3 5 1 
 
A=  (2.4.2*)
 −4 −21 11 −12 2 1 
23 59 −8 17 21 4

6.  
2 4 6 −2 1
 
B= 0 0 4 1 −1  (2.4.3*)
2 4 0 1 2

7.  
2 3 −1 4
 
C =  8 11 −7 8  (2.4.4*)
2 2 −4 −3

56
8.  
2.3 4.66 −1.2 2.11 −2
 0 0 1.33 0 1.44 
 
D= 
 4.6 9.32 −7.986 4.22 −10.048 
1.84 3.728 −5.216 1.688 −6.208

In Exercises 9 – 11 compute the rank of the given matrix.


' (
1 −2
9. .
−3 6
 
2 1 0 1
 
10.  −1 3 2 4 .
5 −1 2 −2
 
3 1 0
 −1 2 4 
 
11.  .
 2 3 4 
4 −1 −4

2.5 Linear Equations with Special Coefficients

In this chapter we have shown how to use elementary row operations to solve systems of linear
equations. We have assumed that each linear equation in the system has the form

aj1x1 + · · · + ajnxn = bj ,

where the ajis and the bj s are real numbers. For simplicity, in our examples we have only chosen
equations with integer coefficients — such as:

2x1 − 3x2 + 15x3 = −1.

Systems with Nonrational Coefficients

In fact, a more general choice of coefficients for a system of two equations might have been

2x1 + 2πx2 = 22.4
3x1 + 36.2x2 = e. (2.5.1)

Suppose that we solve (2.5.1) by elementary row operations. In matrix form we have the aug-
mented matrix ' √ (
2 2π 22.4
.
3 36.2 e

57

Proceed with the following elementary row operations. Divide the 1st row by 2 to obtain
' √ √ (
1 π 2 11.2 2
.
3 36.2 e

Next, subtract 3 times the 1st row from the 2nd row to obtain:
' √ √ (
1 π 2 11.2 2
√ √ .
0 36.2 − 3π 2 e − 33.6 2

Then divide the 2nd row by 36.2 − 3π 2, obtaining:
' √ √ (
1 π 2 11.2 2
√ .
0 1 e−33.6 √2
36.2−3π 2

Finally, multiply the 2nd row by π 2 and subtract it from the 1st row to obtain:
' √ √ e−33.6√2 (
1 0 11.2 2 − π 2 36.2−3π √
2
√ .
0 1 e−33.6 √2
36.2−3π 2

So

√ √ e − 33.6 2
x1 = 11.2 2 − π 2 √
36.2 − 3π 2
(2.5.2)

e − 33.6 2
x2 = √
36.2 − 3π 2
which is both hideous to look at and quite uninformative. It is, however, correct.

Both x1 and x2 are real numbers — they had to be because all of the manipulations involved
addition, subtraction, multiplication, and division of real numbers — which yield real numbers.

If we wanted to use MATLAB to perform these calculations, we have to convert 2, π, and e to
their decimal equivalents — at least up to a certain decimal place accuracy. This introduces errors
— which for the moment we assume are small.

To enter A and b in MATLAB , type

A = [sqrt(2) 2*pi; 3 36.2];


b = [22.4; exp(1)];

Now type A to obtain:

A =
1.4142 6.2832
3.0000 36.2000

As its default display, MATLAB displays real numbers to four decimal place accuracy. Similarly,
type b to obtain

58
b =
22.4000
2.7183

Next use MATLAB to solve this system by typing:

A\b

to obtain

ans =
24.5417
-1.9588

The reader may check that this answer agrees with the answer in (2.5.2) to MATLAB output
accuracy by typing

x1 = 11.2*sqrt(2)-pi*sqrt(2)*(exp(1)-33.6*sqrt(2))/(36.2-3*pi*sqrt(2))
x2 = (exp(1)-33.6*sqrt(2))/(36.2-3*pi*sqrt(2))

to obtain

x1 =
24.5417

and

x2 =
-1.9588

More Accuracy

MATLAB can display numbers in machine precision (15 digits) rather than the standard four decimal
place accuracy. To change to this display, type

format long

Now solve the system of equations (2.5.1) again by typing

A\b

59
and obtaining

ans =
24.54169560069650
-1.95875151860858

Systems with Integers and Rational Numbers

Now suppose that all of the coefficients in a system of linear equations are integers. When we add,
subtract or multiply integers — we get integers. In general, however, when we divide an integer by
an integer we get a rational number rather than an integer. Indeed, since elementary row operations
involve only the operations of addition, subtraction, multiplication and division, we see that if we
perform elementary row operations on a matrix with integer entries, we will end up with a matrix
with rational numbers as entries.

MATLAB can display calculations using rational numbers rather than decimal numbers. To
display calculations using only rational numbers, type

format rational

For example, let


 
2 2 1 0
 1 3 −5 1 
 
A=  (2.5.3*)
 4 2 1 3 
2 1 −1 4
and let  
1
 1 
 
b= . (2.5.4*)
 −5 
2
Enter A and b into MATLAB by typing

e2_5_3
e2_5_4

Solve the system by typing

A\b

to obtain

ans =

60
-357/41
309/41
137/41
156/41

To display the answer in standard decimal form, type

format
A\b

obtaining

ans =
-8.7073
7.5366
3.3415
3.8049

The same logic shows that if we begin with a system of equations whose coefficients are rational
numbers, we will obtain an answer consisting of rational numbers — since adding, subtracting,
multiplying and dividing rational numbers yields rational numbers. More precisely:

Theorem 2.5.1. Let A be an n × n matrix that is row equivalent to In, and let b be an n vector.
Suppose that all entries of A and b are rational numbers. Then there is a unique solution to the system
corresponding to the augmented matrix (A|b) and this solution has rational numbers as entries.

Proof: Since A is row equivalent to In , Corollary 2.4.8 states that this linear system has a unique
solution x. As we have just discussed, solutions are found using elementary row operations — hence
the entries of x are rational numbers.

Complex Numbers

In the previous parts of this section, we have discussed why solutions to linear systems whose
coefficients are rational numbers must themselves have entries that are rational numbers. We now
discuss solving linear equations whose coefficients are more general than real numbers; that is, whose
coefficients are complex numbers.

First recall that addition, subtraction, multiplication and division of complex numbers yields
complex numbers. Suppose that

a = α + iβ
b = γ + iδ

61

where α, β, γ, δ are real numbers and i = −1. Then

a+b = (α + γ) + i(β + δ)
a−b = (α − γ) + i(β − δ)
ab = (αγ − βδ) + i(αδ + βγ)
a αγ + βδ βγ − αδ
= +i 2
b γ +δ
2 2 γ + δ2

MATLAB has been programmed to do arithmetic with complex numbers using exactly the same
instructions as it uses to do arithmetic with real and rational numbers. For example, we can solve
the system of linear equations

(4 − i)x1 + 2x2 = 3−i


2x1 + (4 − 3i)x2 = 2+i

in MATLAB by typing

A = [4-i 2; 2 4-3i];
b = [3-i; 2+i];
A\b

The solution to this system of equations is:

ans =
0.8457 - 0.1632i
-0.1098 + 0.2493i

Note: Care must be given when entering complex numbers into arrays in MATLAB. For example, if you
type

b = [3 -i; 2 +i]

then MATLAB will respond with the 2 × 2 matrix

b =
3.0000 0 - 1.0000i
2.0000 0 + 1.0000i

Typing either b = [3-i; 2+i] or b = [3 - i; 2 + i] will yield the desired 2 × 1 column vector.

All of the theorems concerning the existence and uniqueness of row echelon form — and
for solving systems of linear equations — work when the coefficients of the linear system
are complex numbers as opposed to real numbers. In particular:
Theorem 2.5.2. If the coefficients of a system of n linear equations in n unknowns are
complex numbers and if the coefficient matrix is row equivalent to In , then there is a unique
solution to this system whose entries are complex numbers.

62
Complex Conjugation

Let a = α + iβ be a complex number. Then the complex conjugate of a is defined to be

a = α − iβ.

Let a = α + iβ and c = γ + iδ be complex numbers. Then we claim that

a+c = a+c
(2.5.5)
ac = a c

To verify these statements, calculate

a + c = (α + γ) + i(β + δ) = (α + γ) − i(β + δ) = (α − iβ) + (γ − iδ) = a + c

and

ac = (αγ − βδ) + i(αδ + βγ) = (αγ − βδ) − i(αδ + βγ) = (α − iβ)(γ − iδ) = a c.

Hand Exercises

1. Solve the system of equations

x1 − ix2 = 1
ix1 + 3x2 = −1

Check your answer using MATLAB.

Solve the systems of linear equations given in Exercises 2 – 3 and verify that the answers are rational
numbers.
x1 + x2 − 2x3 = 1
2. x1 + x2 + x3 = 2
x1 − 7x2 + x3 = 3

x1 − x2 = 1
3.
x1 + 3x2 = −1

Computer Exercises

In Exercises 4 – 6 use MATLAB to solve the given system of linear equations to four significant
decimal places.

63
4. √
0.1x + 5x2 − 2x3 = 1
√ 1
− 3x1 + πx2 − 2.6x3 = 14.3 .

x1 − 7x2 + π
2 x3 = 2

5.
(4 − i)x1 + (2 + 3i)x2 = −i
.
ix1 − 4x2 = 2.2

6. √
(2 + i)x1 + ( 2 − 3i)x2 − 10.66x3 = 4.23

14x1 − 5ix2 + (10.2 − i)x3 = 3 − 1.6i .

−4.276x1 + 2x2 − (4 − 2i)x3 = 2i


Hint: When entering 2i in MATLAB you must type sqrt(2)*i, even though when you enter 2i,
you can just type 2i.

2.6 *Uniqueness of Reduced Echelon Form

In this section we prove Theorem 2.4.9, which states that every matrix is row equivalent to precisely
one reduced echelon form matrix.

Proof of Theorem 2.4.9: Suppose that E and F are two m × n reduced echelon matrices that are
row equivalent to A. Since elementary row operations are invertible, the two matrices E and F are
row equivalent. Thus, the systems of linear equations associated to the m × (n + 1) matrices (E|0)
and (F |0) must have exactly the same set of solutions. It is the fact that the solution sets of the
linear equations associated to (E|0) and (F |0) are identical that allows us to prove that E = F .

Begin by renumbering the variables x1, . . . , xn so that the equations associated to (E|0) have
the form:
x1 = −a1,!+1 x!+1 − · · · − a1,nxn
x2 = −a2,!+1 x!+1 − · · · − a2,nxn
.. .. (2.6.1)
. .
x! = −a!,!+1 x!+1 − · · · − a!,nxn.
In this form, pivots of E occur in the columns 1, . . . , #. We begin by showing that the matrix F
also has pivots in columns 1, . . . , #. Moreover, there is a unique solution to these equations for every
choice of numbers x!+1 , . . . , xn.

Suppose that the pivots of F do not occur in columns 1, . . . , #. Then there is a row in F whose
first nonzero entry occurs in a column k > #. This row corresponds to an equation

xk = ck+1 xk+1 + · · · + cnxn .

Now, consider solutions that satisfy

x!+1 = · · · = xk−1 = 0 and xk+1 = · · · = xn = 0.

64
In the equations associated to the matrix (E|0), there is a unique solution associated with every
number xk ; while in the equations associated to the matrix (F |0), xk must be zero to be a solution.
This argument contradicts the fact that the (E|0) equations and the (F |0) equations have the same
solutions. So the pivots of F must also occur in columns 1, . . ., #, and the equations associated to F
must have the form:
x1 = −â1,!+1 x!+1 − · · · − â1,nxn
x2 = −â2,!+1 x!+1 − · · · − â2,nxn
.. .. (2.6.2)
. .
x! = −â!,!+1 x!+1 − · · · − â!,nxn
where âi,j are scalars.

To complete this proof, we show that ai,j = âi,j . These equalities are verified as follows. There
is just one solution to each system (2.6.1) and (2.6.2) of the form

x!+1 = 1, x!+2 = · · · = xn = 0.

These solutions are


(−a1,!+1 , . . . , −a!,!+1, 1, 0, · · · , 0)

for (2.6.1) and


(−â1,!+1 , . . . , −â!,!+1, 1, 0 · · · , 0)

for (2.6.2). It follows that aj,!+1 = âj,!+1 for j = 1, . . . , #. Complete this proof by repeating this
argument. Just inspect solutions of the form

x!+1 = 0, x!+2 = 1, x!+3 = · · · = xn = 0

through
x!+1 = · · · = xn−1 = 0, xn = 1.

65
Chapter 3

Matrices and Linearity

In this chapter we take the first step in abstracting vectors and matrices to mathematical
objects that are more than just arrays of numbers. We begin the discussion in Section 3.1 by
introducing the multiplication of a matrix times a vector. Matrix multiplication simplifies
the way in which we write systems of linear equations and is the way by which we view
matrices as mappings. This latter point is discussed in Section 3.2.

The mappings that are produced by matrix multiplication are special and are called
linear mappings. Some properties of linear maps are discussed in Section 3.3. One conse-
quence of linearity is the principle of superposition that enables solutions to systems of linear
equations to be built out of simpler solutions. This principle is discussed in Section 3.4.

In Section 3.5 we introduce multiplication of two matrices and discuss properties of this
multiplication in Section 3.6. Matrix multiplication is defined in terms of composition of
linear mappings which leads to an explicit formula for matrix multiplication. This dual role
of multiplication of two matrices — first by formula and second as composition — enables
us to solve linear equations in a conceptual way as well as in an algorithmic way. The
conceptual way of solving linear equations is through the use of matrix inverses (or inverse
mappings) which is described in Section 3.7. In this section we also present important
properties of matrix inversion and a method of computation of matrix inverses. There is a
simple formula for computing inverses of 2×2 matrices based on determinants. The chapter
ends with a discussion of determinants of 2 × 2 matrices in Section 3.8.

66
3.1 Matrix Multiplication of Vectors

In Chapter 2 we discussed how matrices appear when solving systems of m linear equations
in n unknowns. Given the system

a11 x1 + a12x2 + ··· + a1n xn = b1


a21 x1 + a22x2 + ··· + a2n xn = b2
.. .. .. .. (3.1.1)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm ,

we saw that all relevant information is contained in the m × n matrix of coefficients


 
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A=  .. .. .. 

 . . . 
am1 am2 · · · amn

and the n vector  


b1
 . 
b= . 
 . .
bn

Matrices Times Vectors

We motivate multiplication of a matrix times a vector just as a notational advance that


simplifies the presentation of the linear systems. It is, however, much more than that. This
concept of multiplication allows us to think of matrices as mappings and these mappings
tell us much about the structure of solutions to linear systems. But first we discuss the
notational advantage.

Multiplying an m × n matrix A times an n vector x produces an m vector, as follows:


    
a11 · · · a1n x1 a11 x1 + · · · + a1n xn
 . ..  .   .. 
Ax =  ..
 .   ..

=
  . .
 (3.1.2)
am1 · · · amn xn am1 x1 + · · · + amn xn

For example, when m = 2 and n = 3, then the product is a 2-vector


 
' ( x1 ' (
a11 a12 a13   a x
11 1 + a x
12 2 + a x
13 3
 x2  = . (3.1.3)
a21 a22 a23 a21x1 + a22 x2 + a23x3
x3

67
As a specific example, compute
 
' ( 2 ' ( ' (
2 3 −1   2 · 2 + 3 · (−3) + (−1) · 4 −9
 −3  = = .
4 1 5 4 · 2 + 1 · (−3) + 5·4 25
4

Using (3.1.2) we have a compact notation for writing systems of linear equations. For
example, using a special instance of (3.1.3),
 
' ( x1 ' (
2 3 −1   2x1 + 3x2 − x3
 x2  = .
4 1 5 4x1 + x2 + 5x3
x3

In this notation we can write the system of two linear equations in three unknowns

2x1 + 3x2 − x3 = 2
4x1 + x2 + 5x3 = −1

as the matrix equation


 
' ( x1 ' (
2 3 −1   2
 x2  = .
4 1 5 −1
x3

Indeed, the general system of linear equations (3.1.1) can be written in matrix form
using matrix multiplication as
Ax = b

where A is the m × n matrix of coefficients, x is the n vector of unknowns, and b is the m


vector of constants on the right hand side of (3.1.1).

Matrices Times Vectors in MATLAB

We have already seen how to define matrices and vectors in MATLAB. Now we show how
to multiply a matrix times a vector using MATLAB.

Load the matrix A  


5 −4 3 −6 2
 
 2 −4 −2 −1 1 
 
A=
 1 2 1 −5 3 
 (3.1.4*)
 
 −2 −1 −2 1 −1 
1 −6 1 1 4

68
and the vector x  
−1
 
 2 
 
x=
 1 
 (3.1.5*)
 
 −1 
3
into MATLAB by typing

e3_1_4
e3_1_5

The multiplication Ax can be performed by typing

b = A*x

and the result should be

b =
2
-8
18
-6
-1

We may verify this result by solving the system of linear equations Ax = b. Indeed if we
type

A\b

then we get the vector x back as the answer.

Hand Exercises

1. Let ' (
2 1
A= and x = (3, −2) .
−1 4
Compute Ax.

69
2. Let  
' ( 2
3 4 1  
B= and y =  5 .
1 2 3
−2
Compute By.

In Exercises 3 – 6 decide whether or not the matrix vector product Ax can be computed; if it can,
compute the product.
' (
1 2
3. A = and x = (2, 2).
0 −5
 
' ( 2
1 2  
4. A = and x =  2 .
0 −5
4
 
, - −1
 
5. A = 1 2 4 and x =  1 .
3
6. A = (5) and x = (1, 0).

7. Let    
a11 a12 · · · a1n x1
   
 a21 a22 · · · a2n   x2 
A=
 .. .. .. 
 and x=
 .. .

 . . .   . 
am1 am2 · · · amn xn
Denote the columns of the matrix A by
     
a11 a12 a1n
     
 a21   a22   a2n 
A1 =   
 ..  , A2 =  ..
,
 ··· An = 
 .. .

 .   .   . 
am1 am2 amn
Show that the matrix vector product Ax can be written as

Ax = x1A1 + x2A2 + · · · + xnAn ,

where xj Aj denotes scalar multiplication (see Chapter 1).

8. Let ' (
1 1
C= and b = (1, 1) .
2 −1
Find a 2-vector z such that Cz = b.

9. Write the system of linear equations

2x1 + 3x2 − 2x3 = 4


6x1 − 5x3 = 1

in the matrix form Ax = b.

70
10. Find all solutions to
 
  x1  
1 3 −1 4   14
  x2   
 2 1 5 7   =  17  .
 x3 
3 4 4 11 31
x4

11. Let A be a 2 × 2 matrix. Find A so that


' ( ' (
1 3
A =
0 −5
' ( ' (
0 1
A = .
1 4

12. Let A be a 2 × 2 matrix. Find A so that


' ( ' (
1 2
A =
1 −1
' ( ' (
1 4
A = .
−1 3

13. Is there an upper triangular 2 × 2 matrix A such that

A (1, 0) = (1, 2)? (3.1.6)

Is there a symmetric 2 × 2 matrix A satisfying (3.1.6)?

Computer Exercises

In Exercises 14 – 15 use MATLAB to compute b = Ax for the given A and x.

14.    
−0.2 −1.8 3.9 −6 −1.6 −2.6
   
 6.3 8 3 2.5 5.1   2.4 
   
A=
 −0.8 −9.9 9.7 4.7 5.9 
 and x = 
 4.6 .
 (3.1.7*)
   
 −0.9 −4.1 1.1 −2.5 8.4   −6.1 
−1 −9 −2 −9.8 6.9 8.1

15.
   
14 −22 −26 −2 −77 100 −90 2.7
 26 25 −15 −63 33 92 14   6.1 
   
   
 −53 40 19 40 −27 −88 40   −8.3 
   
A=
 10 −21 13 97 −72 −28 92 
 and x=
 8.9 .
 (3.1.8*)
   
 86 −17 43 61 13 10 50   8.3 
   
 −33 31 2 41 65 −48 48   2 
31 68 55 −3 35 19 −14 −4.9

71
16. Let    
2 4 −1 2
   
A= 1 3 2  and b =  1 . (3.1.9*)
−1 −2 5 4
Find a 3-vector x such that Ax = b.

17. Let    
1.3 −4.15 −1.2 1.12
   
A =  1.6 −1.2 2.4  and b =  −2.1  . (3.1.10*)
−2.5 2.35 5.09 4.36
Find a 3-vector x such that Ax = b.

18. Let A be a 3 × 3 matrix. Find A so that


   
2 1
   
A  −1  =  1 
1 −1
   
1 −1
   
A  −1  =  −2 
0 1
   
0 5
   
A 2  =  1 .
4 1

Hint: Rewrite these three conditions as a system of linear equations in the nine entries of A. Then
solve this system using MATLAB. (Then pray that there is an easier way.)

3.2 Matrix Mappings

Having illustrated the notational advantage of using matrices and matrix multiplication, we now
begin to discuss why there is also a conceptual advantage to matrix multiplication, a conceptual
advantage that will help us to understand how systems of linear equations and linear differential
equations may be solved.

Matrix multiplication allows us to view m × n matrices as mappings from Rn to Rm . Let A be


an m × n matrix and let x be an n vector. Then

x )→ Ax

defines a mapping from Rn to Rm .

The simplest example of a matrix mapping is given by 1 × 1 matrices. Matrix mappings defined
from R → R are
x )→ ax
where a is a real number. Note that the graph of this function is just a straight line through the
origin (with slope a). From this example we see that matrix mappings are very special mappings

72
indeed. In higher dimensions, matrix mappings provide a richer set of mappings; we explore here
planar mappings — mappings of the plane into itself — using MATLAB graphics and the program
map.

The simplest planar matrix mappings are the dilatations. Let A = cI2 where c > 0 is a scalar.
When c < 1 vectors are contracted by a factor of c and and these mappings are examples of
contractions. When c > 1 vectors are stretched or expanded by a factor of c and these dilatations
are examples of expansions. We now explore some more complicated planar matrix mappings.

The next planar motions that we study are those given by the matrices
' (
λ 0
A= .
0 µ

Here the matrix mapping is given by (x, y) )→ (λx, µy); that is, a mapping that independently
stretches and/or contracts the x and y coordinates. Even these simple looking mappings can move
objects in the plane in a somewhat complicated fashion.

The Program Map

We can use MATLAB to explore planar matrix mappings in an efficient way using the program map.
In MATLAB type the command

map

and a menu appears labeled MAP Setup. The 2 × 2 matrix


' (
0 −1
1 0

has been pre-entered. Click on the Proceed button. A window entitled MAP Display appears. Click
on Icons and click on an icon — say Dog. Then click in the MAP Display window and a blue ‘Dog’
will appear in that window. Now click on the Map button and a new version of the Dog will appear
in yellow — but the yellow Dog is rotated about the origin counterclockwise by 90◦ from the blue
dog. Indeed, this matrix A just rotates the plane counterclockwise by 90◦ . To verify this statement
just click on Map again and see that the yellow dog rotates 90◦ counterclockwise into the magenta
dog. Of course, the magenta dog is just rotated 180◦ from the original blue dog. Clicking on Map
again produces a fourth dog — this one in cyan. Finally one more click on the map button will
rotate the cyan dog into a red dog that exactly covers the original blue dog.

Choose another icon from the Icons menu; a blue version of this icon appears in the MAP Display
window. Now click on Map to see that your chosen icon is just rotated counterclockwise by 90◦.

Other matrices will produce different motions of the plane. You may either type the entries of
a matrix in the Map Setup window and click on the Proceed button or recall one of the pre-assigned
matrices listed in the menu obtained by clicking on Gallery. For example, clicking on the Contracting

73
rotation button enters the matrix ' (
0.3 −0.8
0.8 0.3
This matrix rotates the plane through an angle of approximately 69.4◦ counterclockwise and con-
tracts the plane by a factor of approximately 0.85. Now click on Dog in the Icons menu to bring
up the blue dog again. Repeated clicking on map rotates and contracts the dog so that dogs in a
cycling set of colors slowly converge towards the origin in a spiral of dogs.

Rotations

Rotating the plane counterclockwise through an angle θ is a motion given by a matrix mapping. We
show that the matrix that performs this rotation is:
' (
cos θ − sin θ
Rθ = . (3.2.1)
sin θ cos θ
To verify that Rθ rotates the plane counterclockwise through angle θ, let vϕ be the unit vector whose
angle from the horizontal is ϕ; that is, vϕ = (cos ϕ, sin ϕ). We can write every vector in R2 as rvϕ
for some number r ≥ 0. Using the trigonometric identities for the cosine and sine of the sum of two
angles, we have:
' (
cos θ − sin θ
Rθ (rvϕ ) = (r cos ϕ, r sin ϕ)
sin θ cos θ
= (r cos θ cos ϕ − r sin θ sin ϕ, r sin θ cos ϕ + r cos θ sin ϕ)
= r (cos(θ + ϕ), sin(θ + ϕ))
= rvϕ+θ .

This calculation shows that Rθ rotates every vector in the plane counterclockwise through angle θ.

It follows from (3.2.1) that R180◦ = −I2 . So rotating a vector in the plane by 180◦ is the same
as reflecting the vector through the origin. It also follows that the movement associated with the
linear map x )→ −cx where c > 0 may be thought of as a dilatation (x )→ cx) followed by rotation
through 180◦ (x )→ −x).

We claim that combining dilatations with general rotations produces spirals. Consider the matrix
' (
c cos θ −c sin θ
S= = cRθ
c sin θ c cos θ
where c < 1. Then a calculation similar to the previous one shows that

S(rvϕ ) = c(rvϕ+θ ).

So S rotates vectors in the plane while contracting them by the factor c. Thus, multiplying a vector
repeatedly by S spirals that vector into the origin. The example that we just considered while using
map is ' ( ' (
0.3 −0.8 ∼ 0.85 cos(69.4◦) −0.85 sin(69.4◦)
= ,
0.8 0.3 0.85 sin(69.4◦) 0.85 cos(69.4◦)
which has the general form of S.

74
A Notation for Matrix Mappings

We reinforce the idea that matrices are mappings by introducing a notation for the mapping asso-
ciated with an m × n matrix A. Define

LA : Rn → Rm

by
LA (x) = Ax,

for every x ∈ Rn.

There are two special matrices: the m × n zero matrix O all of whose entries are 0 and the n × n
identity matrix In whose diagonal entries are 1 and whose off diagonal entries are 0. For instance,
 
1 0 0
 
I3 =  0 1 0  .
0 0 1

The mappings associated with these special matrices are also special. Let x be an n vector.
Then
Ox = 0, (3.2.2)

where the 0 on the right hand side of (3.2.2) is the m vector all of whose entries are 0. The mapping
LO is the zero mapping — the mapping that maps every vector x to 0.

Similarly,
In x = x

for every vector x. It follows that


LIn (x) = x

is the identity mapping, since it maps every element to itself. It is for this reason that the matrix
In is called the n × n identity matrix .

Hand Exercises

In Exercises 1 – 3 find a nonzero vector that is mapped to the origin by the given matrix.
' (
0 1
1. A = .
0 −2
' (
1 2
2. B = .
−2 −4
' (
3 −1
3. C = .
−6 2

75
4. What 2 × 2 matrix rotates the plane about the origin counterclockwise by 30◦ ?

5. What 2 × 2 matrix rotates the plane clockwise by 45◦ ?

6. What 2 × 2 matrix rotates the plane clockwise by 90◦ while dilating it by a factor of 2?

7. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the x axis.

8. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the y axis.

9. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the line x = y.

10. The matrix ' (


1 K
A=
0 1
is a shear. Describe the action of A on the plane for different values of K.

11. Determine a rotation matrix that maps the vectors (3, 4) and (1, −2) onto the vectors (−4, 3)
and (2, 1) respectively.

12. Find a 2 × 3 matrix P that projects three dimensional xyz space onto the xy plane. Hint: Such
a matrix will satisfy    
0 x
   
P  0  = (0, 0) and P  y  = (x, y) .
z 0
' (
a −b
13. Show that every matrix of the form corresponds to rotating the plane through the
b a
angle θ followed by a dilatation cI2 where
.
c = a2 + b2
a
cos θ =
c
b
sin θ = .
c
' (
3 4
14. Using Exercise 13 observe that the matrix rotates the plane counterclockwise
−4 3
through an angle θ and then dilates the planes by a factor of c. Find θ and c. Use map to verify
your results.

Computer Exercises

In Exercises 15 – 17 use map to find vectors that are stretched and/or contracted to a multiple of
themselves by the given linear mapping. Hint: Choose a vector in the MAP Display window and
apply Map several times.
' (
2 0
15. A = .
1.5 0.5

76
' (
1.2 −1.5
16. B = .
−0.4 1.2
' (
2 −1.25
17. C = .
0 −0.5

In Exercises 18 – 20 use Exercise 13 and map to verify that the given matrices rotate the plane
through an angle θ followed by a dilatation cI2 . Find θ and c in each case.
' (
1 −2
18. A = .
2 1
' (
−2.4 −0.2
19. B = .
0.2 −2.4
' (
2.67 1.3
20. C = .
−1.3 2.67

In Exercises 21 – 25 use map to help describe the planar motions of the associated linear mappings
for the given 2 × 2 matrix.
' √ (
3 1
21. A = 2 √2 .
− 12 2
3

' (
1
− 12
22. B = 2
1 1
.
2 2
' (
0 1
23. C = .
1 0
' (
1 0
24. D = .
0 0
' (
1 1
25. E = 2
1
2
1
.
2 2

26. The matrix ' (


0 −1
A=
−1 0
reflects the xy-plane across the diagonal line y = −x while the matrix
' (
−1 0
B=
0 −1

rotates the plane through an angle of 180◦. Using the program map verify that both matrices map
the vector (1, 1) to its negative (−1, −1). Now perform two experiments. First using the icon menu
in map place a dog icon at about the point (1, 1) and move that dog by matrix A. Then replace the
dog in its orginal position near (1, 1) and move that dog using matrix B. Describe the difference in
the result.

77
3.3 Linearity

We begin by recalling the vector operations of addition and scalar multiplication. Given two n
vectors, vector addition is defined by
     
x1 y1 x1 + y1
 .   .   .. 
 . + .  =  .
 .   .   . 
xn yn xn + yn

Multiplication of a scalar times a vector is defined by


   
x1 cx1
 .   . 
c  .  =  ..
 .   .

xn cxn

Using (3.1.2) we can check that matrix multiplication satisfies

A(x + y) = Ax + Ay (3.3.1)
A(cx) = c(Ax). (3.3.2)

Using MATLAB we can also verify that the identities (3.3.1) and (3.3.2) are valid for some particular
choices of x, y, c and A. For example, let
   
' ( 1 1
2 3 4 1  5   −1 
   
A= , x= , y= , and c = 5. (3.3.3*)
1 1 2 3  4   −1 
3 4

Typing e3 3 3 enters this information into MATLAB. Now type

z1 = A*(x+y)
z2 = A*x + A*y

and compare z1 and z2. The fact that they are both equal to

(35, 33)

verifies (3.3.1) in this case. Similarly, type

w1 = A*(c*x)
w2 = c*(A*x)

and compare w1 and w2 to verify (3.3.2).

The central idea in linear algebra is the notion of linearity.

Definition 3.3.1. A mapping L : Rn → Rm is linear if

78
(a) L(x + y) = L(x) + L(y) for all x, y ∈ Rn .
(b) L(cx) = cL(x) for all x ∈ Rn and all scalars c ∈ R.

To better understand the meaning of Definition 3.3.1(a,b), we verify these conditions for the
mapping L : R2 → R2 defined by

L(x) = (x1 + 3x2, 2x1 − x2), (3.3.4)

where x = (x1 , x2) ∈ R2 . To verify Definition 3.3.1(a), let y = (y1 , y2) ∈ R2. Then

L(x + y) = L(x1 + y1 , x2 + y2)


= ((x1 + y1 ) + 3(x2 + y2 ), 2(x1 + y1 ) − (x2 + y2 ))
= (x1 + y1 + 3x2 + 3y2 , 2x1 + 2y1 − x2 − y2 ).

On the other hand,

L(x) + L(y) = (x1 + 3x2, 2x1 − x2 ) + (y1 + 3y2, 2y1 − y2 )


= (x1 + 3x2 + y1 + 3y2, 2x1 − x2 + 2y1 − y2 ).

Hence
L(x + y) = L(x) + L(y)
2
for every pair of vectors x and y in R .

Similarly, to verify Definition 3.3.1(b), let c ∈ R be a scalar and compute

L(cx) = L(cx1 , cx2) = ((cx1 ) + 3(cx2), 2(cx1) − (cx2)).

Then compute
cL(x) = c(x1 + 3x2, 2x1 − x2) = (c(x1 + 3x2), c(2x1 − x2 )),
from which it follows that
L(cx) = cL(x)
for every vector x ∈ R2 and every scalar c ∈ R. Thus L is a linear mapping.

In fact, the mapping (3.3.4) is a matrix mapping and could have been written in the form
' (
1 3
L(x) = x.
2 −1

Hence the linearity of L could have been checked using identities (3.3.1) and (3.3.2). Indeed, matrix
mappings are always linear mappings, as we now discuss.

Matrix Mappings are Linear Mappings

Let A be an m × n matrix and recall that the matrix mapping LA : Rn → Rm is defined by


LA (x) = Ax. We may rewrite (3.3.1) and (3.3.2) using this notation as

LA (x + y) = LA (x) + LA (y)
LA (cx) = cLA (x).

79
Thus all matrix mappings are linear mappings. We will show that all linear mappings are matrix
mappings (see Theorem 3.3.5). But first we discuss linearity in the simplest context of mappings
from R → R.

Linear and Nonlinear Mappings of R → R

Note that 1 × 1 matrices are just scalars A = (a). It follows from (3.3.1) and (3.3.2) that we have
shown that the matrix mappings LA (x) = ax are all linear, though this point could have been verified
directly. Before showing that these are all the linear mappings of R → R, we focus on examples of
functions of R → R that are not linear.

Examples of Mappings that are Not Linear

• f(x) = x2. Calculate


f(x + y) = (x + y)2 = x2 + 2xy + y2

while
f(x) + f(y) = x2 + y2 .

The two expressions are not equal and f(x) = x2 is not linear.

• f(x) = ex . Calculate
f(x + y) = ex+y = ex ey

while
f(x) + f(y) = ex + ey .

The two expressions are not equal and f(x) = ex is not linear.

• f(x) = sin x. Recall that

f(x + y) = sin(x + y) = sin x cos y + cos x sin y

while
f(x) + f(y) = sin x + sin y.

The two expressions are not equal and f(x) = sin x is not linear.

Linear Functions of One Variable

Suppose we take the opposite approach and ask what functions of R → R are linear. Observe that
if L : R → R is linear, then
L(x) = L(x · 1).

Since we are looking at the special case of linear mappings on R, we note that x is a real number as
well as a vector. Thus we can use Definition 3.3.1(b) to observe that

L(x · 1) = xL(1).

80
So if we let a = L(1), then we see that
L(x) = ax.
Thus linear mappings of R into R are very special mappings indeed; they are all scalar multiples of
the identity mapping.

All Linear Mappings are Matrix Mappings

We end this section by proving that every linear mapping is given by matrix multiplication. But
first we state and prove two lemmas. There is a standard set of vectors that is used over and over
again in linear algebra, which we now define.

Definition 3.3.2. Let j be an integer between 1 and n. The n-vector ej is the vector that has a 1
in the j th entry and zeros in all other entries.

Lemma 3.3.3. Let L1 : Rn → Rm and L2 : Rn → Rm be linear mappings. Suppose that L1 (ej ) =


L2 (ej ) for every j = 1, . . ., n. Then L1 = L2 .

Proof: Let x = (x1 , . . ., xn) be a vector in Rn . Then

x = x1e1 + · · · + xnen .

Linearity of L1 and L2 implies that

L1(x) = x1L1 (e1 ) + · · · + xnL1 (en )


= x1L2 (e1 ) + · · · + xnL2 (en )
= L2 (x).

Since L1 (x) = L2(x) for all x ∈ Rn , it follows that L1 = L2 .

Lemma 3.3.4. Let A be an m × n matrix. Then Aej is the j th column of A.

Proof: Recall the definition of matrix multiplication given in (3.1.2). In that formula, just set xi
equal to zero for all i #= j and set xj = 1.

Theorem 3.3.5. Let L : Rn → Rm be a linear mapping. Then there exists an m × n matrix A such
that L = LA .

Proof: There are two steps to the proof: determine the matrix A and verify that LA = L.

Let A be the matrix whose j th column is L(ej ). By Lemma 3.3.4 L(ej ) = Aej ; that is, L(ej ) =
LA (ej ). Lemma 3.3.3 implies that L = LA .

Theorem 3.3.5 provides a simple way of showing that

L(0) = 0

for any linear map L. Indeed, L(0) = LA (0) = A0 = 0 for some matrix A. (This fact can also be
proved directly from the definition of linear mapping.)

81
Using Theorem 3.3.5 to Find Matrices Associated to Linear Maps

The proof of Theorem 3.3.5 shows that the j th column of the matrix A associated to a linear mapping
L is L(ej ) viewed as a column vector. As an example, let L : R2 → R2 be rotation clockwise through
90◦ . Geometrically, it is easy to see that

L(e1 ) = L ((1, 0)) = (0, −1) and L(e2 ) = L ((0, 1)) = (1, 0) .

Since we know that rotations are linear maps, it follows that the matrix A associated to the linear
map L is: ' (
0 1
A= .
−1 0
Additional examples of linear mappings whose associated matrices can be found using Theorem 3.3.5
are given in Exercises 10 – 13.

Hand Exercises

1. Compute ax + by for each of the following:

(a) a = 2, b = −3, x = (2, 4) and y = (3, −1).

(b) a = 10, b = −2, x = (1, 0, −1) and y = (2, −4, 3).

(c) a = 5, b = −1, x = (4, 2, −1, 1) and y = (−1, 3, 5, 7).

2. Let x = (4, 7) and y = (2, −1). Write the vector αx + βy as a vector in coordinates.

3. Let x = (1, 2), y = (1, −3), and z = (−2, −1). Show that you can write

z = αx + βy

for some α, β ∈ R.

Hint: Set up a system of two linear equations in the unknowns α and β, and then solve this linear
system.

4. Can the vector z = (2, 3, −1) be written as

z = αx + βy

where x = (2, 3, 0) and y = (1, −1, 1)?

5. Let x = (3, −2), y = (2, 3), and z = (1, 4). For which real numbers α, β, γ does

αx + βy + γz = (1, −2)?

In Exercises 6 – 9 determine whether the given transformation is linear.

6. T : R3 → R2 defined by T (x1, x2, x3) = (x1 + 2x2 − x3 , x1 − 4x3).

82
7. T : R2 → R2 defined by T (x1, x2) = (x1 + x1x2, 2x2).

8. T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2 − 1).

9. T : R2 → R3 defined by T (x1, x2) = (1, x1 + x2, 2x2)

10. Find the 2 × 3 matrix A that satisfies

Ae1 = (2, 3) , Ae2 = (1, −1) , and Ae3 = (0, 1) .

11. The cross product of two 3-vectors x = (x1, x2, x3) and y = (y1 , y2, y3 ) is the 3-vector

x × y = (x2 y3 − x3 y2 , −(x1y3 − x3y1 ), x1y2 − x2y1 ).

Let K = (2, 1, −1). Show that the mapping L : R3 → R3 defined by

L(x) = x × K

is a linear mapping. Find the 3 × 3 matrix A such that

L(x) = Ax,

that is, L = LA .

12. Argue geometrically that rotation of the plane counterclockwise through an angle of 45◦ is a
linear mapping. Find a 2 × 2 matrix A such that LA rotates the plane counterclockwise by 45◦ .

13. Let σ permute coordinates cyclically in R3 ; that is,

σ(x1, x2, x3) = (x2, x3, x1).

Find a 3 × 3 matrix A such that σ = LA .

14. Let L be a linear map. Using the definition of linearity, prove that L(0) = 0.

15. Let L1 : Rn → Rm and L2 : Rn → Rm be linear mappings. Prove that L : Rn → Rm defined by

L(x) = L1(x) + L2 (x)

is also a linear mapping. Theorem 3.3.5 states that there are matrices A, A1 and A2 such that

L = LA and Lj = LAj

for j = 1, 2. What is the relationship between the matrices A, A1 and A2 ?

Computer Exercises

16. Let ' (


0.5 0
A= .
0 2
Use map to verify that the linear mapping LA halves the x-component of a point while it doubles
the y-component.

83
17. Let ' (
0 0.5
A= .
−0.5 0
Use map to determine how the mapping LA acts on 2-vectors. Describe this action in words.

In Exercises 18 – 19 use MATLAB to verify (3.3.1) and (3.3.2).

18.     
1 2 3 3 0
     
A =  0 1 −2  , x =  2 , y =  −5  , c = 21. (3.3.5*)
4 0 1 −1 10

19.
     
4 0 −3 2 4 1 2
     
 2 8 −4 −1 3   3   0 
     
A=
 −1 2 1 10 −2 
, x=
 −2 ,
 y=
 13 ,
 c = −13. (3.3.6*)
     
 4 4 −2 1 2   3   −2 
−2 3 1 1 −1 −1 1

3.4 The Principle of Superposition

The principle of superposition is just a restatement of the fact that matrix mappings are linear.
Nevertheless, this restatement is helpful when trying to understand the structure of solutions to
systems of linear equations.

Homogeneous Equations

A system of linear equations is homogeneous if it has the form

Ax = 0, (3.4.1)

where A is an m×n matrix and x ∈ Rn. Note that homogeneous systems are consistent since 0 ∈ Rn
is always a solution, that is, A(0) = 0.

The principle of superposition makes two assertions:

• Suppose that y and z in Rn are solutions to (3.4.1) (that is, suppose that Ay = 0 and Az = 0);
then y + z is a solution to (3.4.1).

• Suppose that c is a scalar; then cy is a solution to (3.4.1).

The principle of superposition is proved using the linearity of matrix multiplication. Calculate

A(y + z) = Ay + Az = 0 + 0 = 0

84
to verify that y + z is a solution, and calculate

A(cy) = c(Ay) = c · 0 = 0

to verify that cy is a solution.

We see that solutions to homogeneous systems of linear equations always satisfy the general
property of superposition: sums of solutions are solutions and scalar multiples of solutions are
solutions.

We illustrate this principle by explicitly solving the system of equations


 
' ( x1 ' (
1 2 −1 1  x2  0
 
 = .
2 5 −4 −1  x3  0
x4

Use row reduction to show that the matrix


' (
1 2 −1 1
2 5 −4 −1

is row equivalent to ' (


1 0 3 7
0 1 −2 −3
which is in reduced echelon form. Recall, using the methods of Section 2.3, that every solution to
this linear system has the form
     
−3x3 − 7x4 −3 −7
 2x3 + 3x4   2   3 
     
  = x3   + x4  .
 x3   1   0 
x4 0 1

Superposition is verified again by observing that the form of the solutions is preserved under vector
addition and scalar multiplication. For instance, suppose that
       
−3 −7 −3 −7
 2   3   2   3 
       
α1   + α2   and β1   + β2  
 1   0   1   0 
0 1 0 1

are two solutions. Then the sum has the form


   
−3 −7
 2   3 
   
γ1   + γ2  
 1   0 
0 1

where γj = αj + βj .

85
We have actually proved more than superposition. We have shown in this example that every
solution is a superposition of just two solutions
   
−3 −7
 2   3 
   
  and  .
 1   0 
0 1

Inhomogeneous Equations

The linear system of m equations in n unknowns is written as

Ax = b

where A is an m × n matrix, x ∈ Rn, and b ∈ Rm . This system is inhomogeneous when the vector b
is nonzero. Note that if y, z ∈ Rn are solutions to the inhomogeneous equation (that is, Ay = b and
Az = b), then y − z is a solution to the homogeneous equation. That is,

A(y − z) = Ay − Az = b − b = 0.

For example, let ' (


1 2 0
A= and b = (3, −1) .
−2 0 1
Then    
1 3
   
y= 1  and z= 0 
1 5
are both solutions to the linear system Ax = b. It follows that
 
−2
 
y−z =  1 
−4

is a solution to the homogeneous system Ax = 0, which can be checked by direct calculation.

Thus we can completely solve the inhomogeneous equation by finding one solution to the inho-
mogeneous equation and then adding to that solution every solution of the homogeneous equation.
More precisely, suppose that we know all of the solutions w to the homogeneous equation Ax = 0
and one solution y to the inhomogeneous equation Ax = b. Then y + w is another solution to the
inhomogeneous equation and every solution to the inhomogeneous equation has this form.

An Example of an Inhomogeneous Equation

Suppose that we want to find all solutions of Ax = b where


   
3 2 1 −2
   
A =  0 1 −2  and b =  4  .
3 3 −1 2

86
Suppose that you are told that y = (−5, 6, 1)t is a solution of the inhomogeneous equation. (This
fact can be verified by a short calculation — just multiply Ay and see that the result equals b.) Next
find all solutions to the homogeneous equation Ax = 0 by putting A into reduced echelon form. The
resulting row echelon form matrix is  
1 0 5
3
 
 0 1 −2  .
0 0 0
Hence we see that the solutions of the homogeneous equation Ax = 0 are
   
− 53 s − 53
   
 2s  = s  2  .
s 1

Combining these results, we conclude that all the solutions of Ax = b are given by
   
−5 − 53
   
 6  +s 2 .
1 1

Hand Exercises

1. Consider the homogeneous linear equation

x+y+z =0

(a) Write all solutions to this equation as a general superposition of a pair of vectors v1 and v2 .

(b) Write all solutions as a general superposition of a second pair of vectors w1 and w2.

2. Write all solutions to the homogeneous system of linear equations

x1 + 2x2 + x4 − x5 = 0
x3 − 2x4 + x5 = 0

as the general superposition of three vectors.

3. (a) Find all solutions to the homogeneous equation Ax = 0 where


' (
2 3 1
A= .
1 1 4

(b) Find a single solution to the inhomogeneous equation

Ax = (6, 6) . (3.4.2)

(c) Use your answers in (a) and (b) to find all solutions to (3.4.2).

87
3.5 Composition and Multiplication of Matrices

The composition of two matrix mappings leads to another matrix mapping from which the concept
of multiplication of two matrices follows. Matrix multiplication can be introduced by formula, but
then the idea is unmotivated and one is left to wonder why matrix multiplication is defined in such
a seemingly awkward way.

We begin with the example of 2 × 2 matrices. Suppose that


' ( ' (
2 1 0 3
A= and B = .
1 −1 −1 4

We have seen that the mappings


x )→ Ax and x )→ Bx

map 2-vectors to 2-vectors. So we can ask what happens when we compose these mappings. In
symbols, we compute
LA ◦LB (x) = LA (LB (x)) = A(Bx).

In coordinates, let x = (x1 , x2) and compute


' (
3x2
A(Bx) = A
−x1 + 4x2
' (
−x1 + 10x2
= .
x1 − x2

It follows that we can rewrite A(Bx) using multiplication of a matrix times a vector as
' (' (
−1 10 x1
A(Bx) = .
1 −1 x2

In particular, LA ◦LB is again a linear mapping, namely LC , where


' (
−1 10
C= .
1 −1

With this computation in mind, we define the product


' (' ( ' (
2 1 0 3 −1 10
AB = = .
1 −1 −1 4 1 −1

Using the same approach we can derive a formula for matrix multiplication of 2 × 2 matrices.
Suppose ' ( ' (
a11 a12 b11 b12
A= and B = .
a21 a22 b21 b22

88
Then
' (
b11x1 + b12x2
A(Bx) = A
b21x1 + b22x2
' (
a11(b11x1 + b12x2 ) + a12(b21 x1 + b22x2)
=
a21(b11x1 + b12x2 ) + a22(b21 x1 + b22x2)
' (
(a11b11 + a12b21)x1 + (a11b12 + a12b22)x2
=
(a21b11 + a22b21)x1 + (a21b12 + a22b22)x2
' (' (
a11b11 + a12b21 a11b12 + a12b22 x1
= .
a21b11 + a22b21 a21b12 + a22b22 x2

Hence, for 2 × 2 matrices, we see that composition of matrix mappings defines matrix multiplication
as: ' (' ( ' (
a11 a12 b11 b12 a11b11 + a12b21 a11b12 + a12b22
= . (3.5.1)
a21 a22 b21 b22 a21b11 + a22b21 a21b12 + a22b22

Formula (3.5.1) may seem a bit formidable, but it does have structure. Suppose A and B are
2 × 2 matrices, then the entry of
C = AB
in the ith row, j th column may be written as
2
/
ai1 b1j + ai2 b2j = aik bkj .
k=1

We shall see that an analog of this formula is available for matrix multiplications of all sizes. But
to derive this formula, it is easier to develop matrix multiplication abstractly.

Lemma 3.5.1. Let L1 : Rn → Rm and L2 : Rp → Rn be linear mappings. Then L = L1 ◦L2 : Rp →


Rm is a linear mapping.

Proof: Compute

L(x + y) = L1◦L2 (x + y)
= L1(L2 (x) + L2 (y))
= L1(L2 (x)) + L1 (L2 (y))
= L1◦L2 (x) + L1 ◦L2 (y)
= L(x) + L(y).

Similarly, compute L1 ◦L2 (cx) = cL1 ◦L2 (x).

We apply Lemma 3.5.1 in the following way. Let A be an m × n matrix and let B be an
n × p matrix. Then LA : Rn → Rm and LB : Rp → Rn are linear mappings, and the mapping
L = LA ◦LB : Rp → Rm is defined and linear. Theorem 3.3.5 implies that there is an m × p matrix
C such that L = LC . Abstractly, we define the matrix product AB to be C.

Note that the matrix product AB is defined only when the number of columns of A is
equal to the number of rows of B.

89
Calculating the Product of Two Matrices

Next we discuss how to calculate the product of matrices; this discussion generalizes our discussion
of the product of 2 × 2 matrices. Lemma 3.3.4 tells how to compute C = AB. The j th column of
the matrix product is just
Cej = A(Bej ),
where Bj ≡ Bej is the j th column of the matrix B. Therefore,

C = (AB1 | · · · |ABp ). (3.5.2)

Indeed, the (i, j)th entry of C is the ith entry of ABj , that is, the ith entry of
   
b1j a11b1j + · · · + a1n bnj
 .   .. 
A .   .
 . = . 
bnj am1 b1j + · · · + amn bnj

It follows that the entry cij of C in the ith row and j th column is
n
/
cij = ai1b1j + ai2b2j + · · · + ainbnj = aik bkj . (3.5.3)
k=1

We can interpret (3.5.3) in the following way. To calculate cij : multiply the entries of the ith row
of A with the corresponding entries in the j th column of B and add the results. This interpretation
reinforces the idea that for the matrix product AB to be defined, the number of columns in A must
equal the number of rows in B.

For example, we now perform the following multiplication:

 
' ( 1 −2
2 3 1  
 3 1 
3 −1 2
−1 4
' (
2 · 1 + 3 · 3 + 1 · (−1) 2 · (−2) + 3 · 1 + 1 · 4
=
3 · 1 + (−1) · 3 + 2 · (−1) 3 · (−2) + (−1) · 1 + 2 · 4
' (
10 3
= .
−2 1

Some Special Matrix Products

Let A be an m × n matrix. Then

OA = O
AO = O
AIn = A
Im A = A

90
The first two equalities are easily checked using (3.5.3). It is not significantly more difficult to verify
the last two equalities using (3.5.3), but we shall verify these equalities using the language of linear
mappings, as follows:
LAIn (x) = LA ◦LIn (x) = LA (x),
since LIn (x) = x is the identity map. Therefore AIn = A. A similar proof verifies that Im A = A.
Although the verification of these equalities using the notions of linear mappings may appear to be
a case of overkill, the next section contains results where these notions truly simplify the discussion.

Hand Exercises

In Exercises 1 – 4 determine whether or not the matrix products AB or BA can be computed for
each given pair of matrices A and B. If the product is possible, perform the computation.
' ( ' (
1 0 −2 0
1. A = and B = .
−2 1 3 −1
' ( ' (
0 −2 1 0 2
2. A = and B = .
4 10 0 3 −1
 
' ( 0 2 5
8 0 2 3  
3. A = and B =  −1 3 −1 .
−3 0 −10 3
0 1 −5
   
8 −1 2 8 0 −3
   
4. A =  −3 12  and B =  1 4 0 1 
5 −4 −5 6 7 −20

In Exercises 5 – 8 compute the given matrix product.


' (' (
2 3 −1 1
5. .
0 1 −3 2
 
' ( 2 3
1 2 3  
6.  −2 5 .
−2 3 −1
1 −1
 
2 3 ' (
  1 2 3
7.  −2 5  .
−2 3 −1
1 −1
  
2 −1 3 1 7
  
8.  1 0 5   −2 −1 .
1 5 −1 −5 3
9. Determine all the 2 × 2 matrices B such that AB = BA where A is the matrix
' (
2 0
A= .
0 −1

91
10. Let ' ( ' (
2 5 a 3
A= and B = .
1 4 b 2
For which values of a and b does AB = BA?

11. Let  
1 0 −3
 
A =  −2 1 1 .
0 1 −5
Let At is the transpose of the matrix A, as defined in Section 1.3. Compute AAt .

Computer Exercises

In Exercises 12 – 14 decide for the given pair of matrices A and B whether or not the products AB
or BA are defined and compute the products when possible.

12.  
' ( 3 −2 0
2 2 −2  
A= and B =  0 −1 4  (3.5.4*)
−4 4 0
−2 −3 5
13.
 
  1 3 −4 3 −2 1
−4 1 0 5 −1 
   0 3 2 3 −1 4 

A= 5 −1 −2 −4 −2  and B =   (3.5.5*)
 5 4 4 5 −1 0 
1 5 −4 1 5
−4 −3 2 4 1 4
14.    
−2 −2 4 5 2 3 −4 5
 0 −3 −4 3   4 −3 0 −2 
   
A=  and B =   (3.5.6*)
 1 −3 1 1   −3 −4 −4 −3 
0 1 0 4 −2 −2 3 −1

3.6 Properties of Matrix Multiplication

In this section we discuss the facts that matrix multiplication is associative (but not commutative)
and that certain distributive properties hold. We also discuss how matrix multiplication is performed
in MATLAB .

Matrix Multiplication is Associative

Theorem 3.6.1. Matrix multiplication is associative. That is, let A be an m × n matrix, let B be
a n × p matrix, and let C be a p × q matrix. Then

(AB)C = A(BC).

92
Proof: Begin by observing that composition of mappings is always associative. In symbols, let
f : Rn → Rm , g : Rp → Rn , and h : Rq → Rp . Then
f ◦(g◦h)(x) = f[(g◦h)(x)]
= f[g(h(x))]
= (f ◦g)(h(x))
= [(f ◦g)◦h](x).
It follows that
f ◦(g◦h) = (f ◦g)◦h.

We can apply this result to linear mappings. Thus


LA ◦(LB ◦LC ) = (LA ◦LB )◦LC .
Since
LA(BC) = LA ◦LBC = LA ◦(LB ◦LC )
and
L(AB)C = LAB ◦LC = (LA ◦LB )◦LC ,
it follows that
LA(BC) = L(AB)C ,
and
A(BC) = (AB)C.

It is worth convincing yourself that Theorem 3.6.1 has content by verifying by hand that matrix
multiplication of 2 × 2 matrices is associative.

Matrix Multiplication is Not Commutative

Although matrix multiplication is associative, it is not commutative. This statement is trivially true
when the matrix AB is defined while that matrix BA is not. Suppose, for example, that A is a 2 × 3
matrix and that B is a 3 × 4 matrix. Then AB is a 2 × 4 matrix, while the multiplication BA makes
no sense whatsoever.

More importantly, suppose that A and B are both n × n square matrices. Then AB = BA is
generally not valid. For example, let
' ( ' (
1 0 0 1
A= and B = .
0 0 0 0
Then ' ( ' (
0 1 0 0
AB = and BA = .
0 0 0 0
So AB #= BA. In certain cases it does happen that AB = BA. For example, when B = In ,
AIn = A = In A.
But these cases are rare.

93
Additional Properties of Matrix Multiplication

Recall that if A = (aij ) and B = (bij ) are both m × n matrices, then A + B is the m × n matrix
(aij + bij ). We now enumerate several properties of matrix multiplication.

• Let A and B be m × n matrices and let C be an n × p matrix. Then

(A + B)C = AC + BC.

Similarly, if D is a q × m matrix, then

D(A + B) = DA + DB.

So matrix multiplication distributes across matrix addition.

• If α and β are scalars, then


(α + β)A = αA + βA.

So addition distributes with scalar multiplication.

• Scalar multiplication and matrix multiplication satisfy:

(αA)C = α(AC).

Matrix Multiplication and Transposes

Let A be an m × n matrix and let B be an n × p matrix, so that the matrix product AB is defined
and AB is an m × p matrix. Note that At is an n × m matrix and that B t is a p × n matrix, so that
in general the product AtB t is not defined. However, the product B t At is defined and is an p × m
matrix, as is the matrix (AB)t . We claim that

(AB)t = B t At . (3.6.1)

We verify this claim by direct computation. The (i, k)th entry in (AB)t is the (k, i)th entry in AB.
That entry is:
/n
akj bji.
j=1

The (i, k) th
entry in B A is:
t t
n
/
btij atjk ,
j=1

where atjk is the (j, k)th entry in At and btij is the (i, j)th entry in B t . It follows from the definition
of transpose that the (i, k)th entry in B t At is:
n
/ n
/
bjiakj = akj bji,
j=1 j=1

which verifies the claim.

94
Matrix Multiplication in MATLAB

Let us now explain how matrix multiplication works in MATLAB. We load the matrices
 
−5 2 0  
 −1 1 −4  2 −2 −2 5 5
   
A=  and B =  4 −5 1 −1 2  (3.6.2*)
 −4 4 2 
3 2 3 −3 3
−1 3 −1

by typing

e3_6_2

Now the command C = A*B asks MATLAB to compute the matrix C as the product of A and B.
We obtain

C =
-2 0 12 -27 -21
-10 -11 -9 6 -15
14 -8 18 -30 -6
7 -15 2 -5 -2

Let us confirm this result by another computation. As we have seen above the 4th column of C should
be given by the product of A with the 4th column of B. Indeed, if we perform this computation and
type

A*B(:,4)

the result is

ans =
-27
6
-30
-5

which is precisely the 4th column of C.

MATLAB also recognizes when a matrix multiplication of two matrices is not defined. For
example, the product of the 3 × 5 matrix B with the 4 × 3 matrix A is not defined, and if we type
B*A then we obtain the error message

??? Error using ==> *


Inner matrix dimensions must agree.

95
We remark that the size of a matrix A can be seen using the MATLAB command size. For example,
the command size(A) leads to

ans =
4 3

reflecting the fact that A is a matrix with four rows and three columns.

Hand Exercises

1. Let A be an m × n matrix. Show that the matrices AAt and At A are symmetric.

2. Let ' ( ' (


1 2 2 3
A= and B = .
−1 −1 1 4
Compute AB and B t At . Verify that (AB)t = B t At for these matrices A and B.

3. Let  
0 1 0
 
A= 0 0 1 .
0 0 0
Compute B = I + A + 12 A2 and C = I + tA + 12 (tA)2 .

4. Let ' ( ' (


1 0 0 −1
I= and J = .
0 1 1 0

(a) Show that J 2 = −I.

(b) Evaluate (aI + bJ)(cI + dJ) in terms of I and J.

5. Recall that a square matrix C is upper triangular if cij = 0 when i > j. Show that the matrix
product of two upper triangular n × n matrices is also upper triangular.

Computer Exercises

In Exercises 6 – 8 use MATLAB to verify that (A + B)C = AC + BC for the given matrices.
' ( ' ( ' (
0 2 −2 1 2 −1
6. A = ,B= and C =
2 1 3 0 1 5
' ( ' ( ' (
12 −2 8 −20 10 2 4
7. A = ,B= and C =
3 1 3 10 2 13 −4

96
   
6 1 2 −10 ' (
    −2 10
8. A =  3 20 , B =  5 0  and C =
12 10
−5 3 3 1

9. Use the rand(3,3) command in MATLAB to choose five pairs of 3 × 3 matrices A and B at
random. Compute AB and BA using MATLAB to see that in general these matrix products are
unequal.

10. Experimentally, find two symmetric 2 × 2 matrices A and B for which the matrix product AB
is not symmetric.

3.7 Solving Linear Systems and Inverses

When we solve the simple equation


ax = b,

we do so by dividing by a to obtain
1
x= b.
a
This division works as long as a #= 0.

Writing systems of linear equations as

Ax = b

suggests that solutions should have the form


1
x= b
A
and the MATLAB command for solving linear systems

x=A\b

suggests that there is some merit to this analogy.

The following is a better analogy. Multiplication by a has the inverse operation: division by a;
multiplying a number x by a and then multiplying the result by a−1 = 1/a leaves the number x
unchanged (as long as a #= 0). In this sense we should write the solution to ax = b as

x = a−1b.

For systems of equations Ax = b we wish to write solutions as

x = A−1 b.

In this section we consider the questions: What does A−1 mean and when does A−1 exist? (Even
in one dimension, we have seen that the inverse does not always exist, since 0−1 = 10 is undefined.)

97
Invertibility

We begin by giving a precise definition of invertibility for square matrices.

Definition 3.7.1. The n × n matrix A is invertible if there is an n × n matrix B such that

AB = In and BA = In.

The matrix B is called an inverse of A. If A is not invertible, then A is noninvertible or singular.

Geometrically, we can see that some matrices are invertible. For example, the matrix
' (
0 −1
R90 =
1 0

rotates the plane counterclockwise through 90◦ and is invertible. The inverse matrix of R90 is the
matrix that rotates the plane clockwise through 90◦ . That matrix is:
' (
0 1
R−90 = .
−1 0

This statement can be checked algebraically by verifying that R90R−90 = I2 and that R−90R90 = I2.

Similarly, ' (
5 3
B=
2 1
is an inverse of ' (
−1 3
A= ,
2 −5
as matrix multiplication shows that AB = I2 and BA = I2 . In fact, there is an elementary formula
for finding inverses of 2 × 2 matrices (when they exist); see (3.8.1) in Section 3.8.

On the other hand, not all matrices are invertible. For example, the zero matrix is noninvertible,
since 0B = 0 for any matrix B.

Lemma 3.7.2. If an n × n matrix A is invertible, then its inverse is unique and is denoted by A−1.

Proof: Let B and C be n × n matrices that are inverses of A. Then

BA = In and AC = In .

We use the associativity of matrix multiplication to prove that B = C. Compute

B = BIn = B(AC) = (BA)C = In C = C.

We now show how to compute inverses for products of invertible matrices.

Proposition 3.7.3. Let A and B be two invertible n × n matrices. Then AB is also invertible and

(AB)−1 = B −1 A−1 .

98
Proof: Use associativity of matrix multiplication to compute

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In .

Similarly,
(B −1 A−1 )(AB) = B −1 (A−1A)B = B −1 B = In.

Therefore AB is invertible with the desired inverse.

Proposition 3.7.4. Suppose that A is an invertible n × n matrix. Then At is invertible and

(At )−1 = (A−1 )t .

Proof: We must show that (A−1 )t is the inverse of At . Identity (3.6.1) implies that

(A−1 )tAt = (AA−1 )t = (In )t = In ,

and
At (A−1)t = (A−1A)t = (In )t = In .

Therefore, (A−1 )t is the inverse of At , as claimed.

Invertibility and Unique Solutions

Next we discuss the implications of invertibility for the solution of the inhomogeneous linear system:

Ax = b, (3.7.1)

where A is an n × n matrix and b ∈ Rn .

Proposition 3.7.5. Let A be an invertible n × n matrix and let b be in Rn. Then the system of
linear equations (3.7.1) has a unique solution.

Proof: We can solve the linear system (3.7.1) by setting

x = A−1 b. (3.7.2)

This solution is easily verified by calculating

Ax = A(A−1 b) = (AA−1 )b = In b = b.

Next, suppose that x is a solution to (3.7.1). Then

x = In x = (A−1 A)x = A−1 (Ax) = A−1 b.

So A−1b is the only possible solution.

Corollary 3.7.6. An invertible matrix is row equivalent to In .

99
Proof: Let A be an invertible n × n matrix. Proposition 3.7.5 states that the system of linear
equations Ax = b has a unique solution. Chapter 2, Corollary 2.4.8 states that A is row equivalent
to In .

The converse of Corollary 3.7.6 is also valid.


Proposition 3.7.7. An n × n matrix A that is row equivalent to In is invertible.

Proof: Form the n×2n matrix M = (A|In ). Since A is row equivalent to In , there is a sequence of
elementary row operations so that M is row equivalent to (In |B). Eliminating all columns from the
right half of M except the j th column yields the matrix (A|ej ). The same sequence of elementary
row operations states that the matrix (A|ej ) is row equivalent to (In|Bj ) where Bj is the j th column
of B. It follows that Bj is the solution to the system of linear equations Ax = ej and that the
matrix product
AB = (AB1 | · · · |ABn ) = (e1 | · · · |en) = In .
So AB = In .

We claim that BA = In and hence that A is invertible. To verify this claim form the n × 2n
matrix N = (In |A). Using the same sequence of elementary row operations again shows that N is
row equivalent to (B|In ). By construction the matrix B is row equivalent to In . Therefore, there is
a unique solution to the system of linear equations Bx = ej . Now eliminating all columns except
the j th from the right hand side of the matrix (B|In ) shows that the solution to the system of linear
equations Bx = ej is just Aj , where Aj is the j th column of A. It follows that

BA = (BA1 | · · · |BAn ) = (e1 | · · · |en ) = In .

Hence BA = In .
Theorem 3.7.8. Let A be an n × n matrix. Then the following are equivalent:

(a) A is invertible.
(b) The equation Ax = b has a unique solution for each b ∈ Rn .
(c) The only solution to Ax = 0 is x = 0.
(d) A is row equivalent to In .

Proof: (a) ⇒ (b) This implication is just Proposition 3.7.5.

(b) ⇒ (c) This implication is straightforward — just take b = 0 in (3.7.1).

(c) ⇒ (d) This implication is just a restatement of Chapter 2, Corollary 2.4.8.

(d) ⇒ (a). This implication is just Proposition 3.7.7.

A Method for Computing Inverse Matrices

The proof of Proposition 3.7.7 gives a constructive method for finding the inverse of any invertible
square matrix.

100
Theorem 3.7.9. Let A be an n × n matrix that is row equivalent to In and let M be the n × 2n
augmented matrix
M = (A|In ). (3.7.3)
Then the matrix M is row equivalent to (In |A−1).

An Example

Compute the inverse of the matrix  


1 2 0
 
A= 0 1 3 .
0 0 1
Begin by forming the 3 × 6 matrix
 
1 2 0 1 0 0
 
M = 0 1 3 0 1 0 .
0 0 1 0 0 1

To put M in row echelon form by row reduction, first subtract 3 times the 3rd row from the 2nd
row, obtaining  
1 2 0 1 0 0
 
 0 1 0 0 1 −3  .
0 0 1 0 0 1
Second, subtract 2 times the 2nd row from the 1st row, obtaining
 
1 0 0 1 −2 6
 
 0 1 0 0 1 −3  .
0 0 1 0 0 1
Theorem 3.7.9 implies that  
1 −2 6
 
A−1 = 0 1 −3  ,
0 0 1
which can be verified by matrix multiplication.

Computing the Inverse Using MATLAB

There are two ways that we can compute inverses using MATLAB . Either we can perform the row
reduction of (3.7.3) directly or we can use the MATLAB the command inv. We illustrate both of
these methods. First type e3 7 4 to recall the matrix
 
1 2 4
 
A= 3 1 1 . (3.7.4*)
2 0 −1

To perform the row reduction of (3.7.3) we need to form the matrix M . The MATLAB command
for generating an n × n identity matrix is eye(n). Therefore, typing

101
M = [A eye(3)]

in MATLAB yields the result

M =
1 2 4 1 0 0
3 1 1 0 1 0
2 0 -1 0 0 1

Now row reduce M to reduced echelon form as follows. Type

M(3,:) = M(3,:) - 2*M(1,:)


M(2,:) = M(2,:) - 3*M(1,:)

obtaining

M =
1 2 4 1 0 0
0 -5 -11 -3 1 0
0 -4 -9 -2 0 1

Next type

M(2,:) = M(2,:)/M(2,2)
M(3,:) = M(3,:) + 4*M(2,:)
M(1,:) = M(1,:) - 2*M(2,:)

to obtain

M =
1.0000 0 -0.4000 -0.2000 0.4000 0
0 1.0000 2.2000 0.6000 -0.2000 0
0 0 -0.2000 0.4000 -0.8000 1.0000

Finally, type

M(3,:) = M(3,:)/M(3,3)
M(2,:) = M(2,:) - M(2,3)*M(3,:)
M(1,:) = M(1,:) - M(1,3)*M(3,:)

to obtain

102
M =
1.0000 0 0 -1.0000 2.0000 -2.0000
0 1.0000 0 5.0000 -9.0000 11.0000
0 0 1.0000 -2.0000 4.0000 -5.0000

Thus C = A−1 is obtained by extracting the last three columns of M by typing

C = M(:,[4 5 6])

which yields

C =
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000
-2.0000 4.0000 -5.0000

You may check that C is the inverse of A by typing A*C and C*A.

In fact, this entire scheme for computing the inverse of a matrix has been preprogrammed into
MATLAB . Just type

inv(A)

to obtain

ans =
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000
-2.0000 4.0000 -5.0000

We illustrate again this simple method for computing the inverse of a matrix A. For example,
reload the matrix in (3.1.4*) by typing e3_1_4 and obtaining:

A =
5 -4 3 -6 2
2 -4 -2 -1 1
1 2 1 -5 3
-2 -1 -2 1 -1
1 -6 1 1 4

The command B = inv(A) stores the inverse of the matrix A in the matrix B, and we obtain the
result

103
B =
-0.0712 0.2856 -0.0862 -0.4813 -0.0915
-0.1169 0.0585 0.0690 -0.2324 -0.0660
0.1462 -0.3231 -0.0862 0.0405 0.0825
-0.1289 0.0645 -0.1034 -0.2819 0.0555
-0.1619 0.0810 0.1724 -0.1679 0.1394

This computation also illustrates the fact that even when the matrix A has integer entries, the
inverse of A usually has noninteger entries.

Let b = (2, −8, 18, −6, −1). Then we may use the inverse B = A−1 to compute the solution of
Ax = b. Indeed if we type

b = [2;-8;18;-6;-1];
x = B*b

then we obtain

x =
-1.0000
2.0000
1.0000
-1.0000
3.0000

as desired (see (3.1.5*)). With this computation we have confirmed the analytical results of the
previous subsections.

Hand Exercises

1. Verify by matrix multiplication that the following matrices are inverses of each other:
   
1 0 2 −1 0 2
   
 0 −1 2  and  2 −1 −2  .
1 0 1 1 0 −1

2. Let α #= 0 be a real number and let A be an invertible matrix. Show that the inverse of the
matrix αA is given by α1 A−1 .
' (
a 0
3. Let A = be a 2 × 2 diagonal matrix. For which values of a and b is A invertible?
0 b

4. Let A, B, C be general n × n matrices. Simplify the expression A−1 (BA−1 )−1 (CB −1 )−1 .

In Exercises 5 – 6 use row reduction to find the inverse of the given matrix.

104
 
1 4 5
 
5.  0 1 −1 .
−2 0 −8
 
1 −1 −1
 
6.  0 2 0 .
2 0 −1

7. Let A be an n × n matrix that satisfies

A3 + a2 A2 + a1 A + In = 0,

where A2 = AA and A3 = AA2 . Show that A is invertible.


Hint: Let B = −(A2 + a2 A + a1In ) and verify that AB = BA = In .

8. Let A be an n × n matrix that satisfies

Am + am−1 Am−1 + · · · + a1A + In = 0.

Show that A is invertible.

9. For which values of a, b, c is the matrix


 
1 a b
 
A= 0 1 c 
0 0 1

invertible? Find A−1 when it exists.

Computer Exercises

In Exercises 10 – 11 use row reduction to find the inverse of the given matrix and confirm your
results using the command inv.

10.  
2 1 3
 
A= 1 2 3 . (3.7.5*)
5 1 0

11.  
0 5 1 3
 1 5 3 −1 
 
B= . (3.7.6*)
 2 1 0 −4 
1 7 2 3

12. Try to compute the inverse of the matrix


 
1 0 3
 
C =  −1 2 −2  . (3.7.7*)
0 2 1

105
in MATLAB using the command inv. What happens — can you explain the outcome?

Now compute the inverse of the matrix


 
1 + 3
 
 −1 2 −2 
0 2 1

for some nonzero numbers + of your choice. What can be observed in the inverse if + is very small?
What happens when + tends to zero?

3.8 Determinants of 2 × 2 Matrices

There is a simple way for determining whether a 2 × 2 matrix A is invertible and there is a simple
formula for finding A−1 . First, we present the formula. Let
' (
a b
A= .
c d

and suppose that ad − bc #= 0. Then


' (
1 d −b
A−1 = . (3.8.1)
ad − bc −c a

This is most easily verified by directly applying the formula for matrix multiplication. So A is
invertible when ad − bc #= 0. We shall prove below that ad − bc must be nonzero when A is invertible.

From this discussion it is clear that the number ad − bc must be an important quantity for 2 × 2
matrices. So we define:

Definition 3.8.1. The determinant of the 2 × 2 matrix A is

det(A) = ad − bc. (3.8.2)

Proposition 3.8.2. As a function on 2 × 2 matrices, the determinant satisfies the following prop-
erties.

(a) The determinant of an upper triangular matrix is the product of the diagonal elements.

(b) The determinants of a matrix and its transpose are equal.

(c) det(AB) = det(A) det(B).

Proof: Both (a) and (b) are easily verified by direct calculation. Property (c) is also verified by
direct calculation — but of a more extensive sort. Note that
' (' ( ' (
a b α β aα + bγ aβ + bδ
= .
c d γ δ cα + dγ cβ + dδ

106
Therefore,

det(AB) = (aα + bγ)(cβ + dδ) − (aβ + bδ)(cα + dγ)


= (acαβ + bcβγ + adαδ + bdγδ) − (acαβ + bcαδ + adβγ + bdγδ)
= bc(βγ − αδ) + ad(αδ − βγ)
= (ad − bc)(αδ − βγ)
= det(A) det(B),

as asserted.

Corollary 3.8.3. A 2 × 2 matrix A is invertible if and only if det(A) #= 0.

Proof: If A is invertible, then AA−1 = I2. Proposition 3.8.2 implies that

det(A) det(A−1 ) = det(I2 ) = 1.

Therefore, det(A) #= 0. Conversely, if det(A) #= 0, then (3.8.1) implies that A is invertible.

Determinants and Area

Suppose that v and w are two vectors in R2 that point in different directions. Then, the set of points

z = αv + βw where 0 ≤ α, β ≤ 1

is a parallelogram, that we denote by P . We denote the area of P by |P |. For example, the unit
square S, whose corners are (0, 0), (1, 0), (0, 1), and (1, 1), is the parallelogram generated by the
unit vectors e1 and e2 .

Next let A be a 2 × 2 matrix and let

A(P ) = {Az : z ∈ P }.

It follows from linearity (since Az = αAv + βAw) that A(P ) is the parallelogram generated by Av
and Aw.

Proposition 3.8.4. Let A be a 2 × 2 matrix and let S be the unit square. Then

|A(S)| = | det A|. (3.8.3)

Proof: Note that A(S) is the parallelogram generated by u1 = Ae1 and u2 = Ae2 , and u1 and u2
are the columns of A. It follows that
' (
ut1 u1 ut1 u2
(det A) = det(A ) det(A) = det(A A) = det
2 t t
.
ut2 u1 ut2 u2

Hence ' (
||u1||2 u1 · u2
(det A) = det
2
= ||u1||2||u2||2 − (u1 · u2 )2 .
u1 · u2 ||u2||2

107
Recall that (1.4.5) of Chapter 1 states that

|P |2 = ||v||2||w||2 − (v · w)2 .

where P is the parallelogram generated by v and w. Therefore, (det A)2 = |A(S)|2 and (3.8.3) is
verified.

Theorem 3.8.5. Let P be a parallelogram in R2 and let A be a 2 × 2 matrix. Then

|A(P )| = | det A||P |. (3.8.4)

Proof: First note that (3.8.3) a special case of (3.8.4), since |S| = 1. Next, let P be the parallel-
ogram generated by the (column) vectors v and w, and let B = (v|w). Then P = B(S). It follows
from (3.8.3) that |P | = | det B|. Moreover,

|A(P )| = |(AB)(S)|
= | det(AB)|
= | det A|| det B|
= | det A||P |,

as desired.

Hand Exercises

1. Find the inverse of the matrix ' (


2 1
.
3 2
' (
1 K
2. Find the inverse of the shear matrix .
0 1
' (
a b
3. Show that the 2 × 2 matrix A = is row equivalent to I2 if and only if ad − bc #= 0.
c d
Hint: Prove this result separately in the two cases a #= 0 and a = 0.

4. Let A be a 2×2 matrix having integer entries. Find a condition on the entries of A that guarantees
that A−1 has integer entries.

5. Let A be a 2 × 2 matrix and assume that det(A) #= 0. Then use the explicit form for A−1 given
in (3.8.1) to verify that
1
det(A−1 ) = .
det(A)
6. Sketch the triangle whose vertices are 0, p = (3, 0)t, and q = (0, 2)t; and find the area of this
triangle. Let ' (
−4 −3
M = .
5 −2
Sketch the triangle whose vertices are 0, M p, and M q; and find the area of this triangle.

108
7. Cramer’s rule provides a method based on determinants for finding the unique solution to the
linear equation Ax = b when A is an invertible matrix. More precisely, let A be an invertible 2 × 2
matrix and let b ∈ R2 be a column vector. Let Bj be the 2 × 2 matrix obtained from A by replacing
the j th column of A by the vector b. Let x = (x1, x2)t be the unique solution to Ax = b. Then
Cramer’s rule states that
det(Bj )
xj = . (3.8.5)
det(A)
Prove Cramer’s rule. Hint: Write the general system of two equations in two unknowns as

a11x1 + a12x2 = b1
a21x1 + a22x2 = b2 .

Subtract a11 times the second equation from a21 times the first equation to eliminate x1 ; then solve
for x2, and verify (3.8.5). Use a similar calculation to solve for x1.

In Exercises 8 – 9 use Cramer’s rule (3.8.5) to solve the given system of linear equations.

2x + 3y = 2
8. Solve for x.
3x − 5y = 1

4x − 3y = −1
9. Solve for y.
x + 2y = 7

Computer Exercises

10. Use MATLAB to choose five 2 × 2 matrices at random and compute their inverses. Do you get
the impression that ‘typically’ 2 × 2 matrices are invertible? Try to find a reason for this fact using
the determinant of 2 × 2 matrices.

In Exercises 11 – 14 use the unit square icon in the program map to test Proposition 3.8.4, as follows.
Enter the given matrix A into map and map the unit square icon. Compute det(A) by estimating
the area of A(S) — given that S has unit area. For each matrix, use this numerical experiment to
decide whether or not the matrix is invertible.
' (
0 −2
11. A = .
2 0
' (
−0.5 −0.5
12. A = .
0.7 0.7
' (
−1 −0.5
13. A = .
−2 −1
' (
0.7071 0.7071
14. A = .
−0.7071 0.7071

109
Chapter 4

Determinants and Eigenvalues

In Section 3.8 we introduced determinants for 2 × 2 matrices A. There we showed that


the determinant of A is nonzero if and only if A is invertible. In Section 4.1 we generalize
the concept of determinants to n × n matrices. An alternative, more intuitive treatment of
determinants is given in Section 4.2.

If A is an n × n matrix, then, as we noted in Section 3.2, A can be viewed as a


transformation that maps an n-component vector to an n component vector; that is, A
can be thought of as a function from Rn into Rn . A number λ is an eigenvalue of A
if there exists a nonzero vector v in Rn such that Av = λ v. In Section 4.3 we use
determinants to show that every n × n matrix has exactly n eigenvalues. A treatment of
eigenvalues and eigenvectors is given in Section 4.4.

Certain details concerning determinants are deferred to Appendix 4.6.

4.1 Determinants

There are several equivalent ways to introduce determinants — none of which are easily
motivated. We prefer to define determinants through the properties they satisfy rather than
by formula. These properties actually enable us to compute determinants of n × n matrices
where n > 3, which further justifies the approach. Later on, we will give an inductive
formula (4.1.9) for computing the determinant.

Definition 4.1.1. A determinant of a square n × n matrix A is a real number that satisfies


the following three properties:

(a) If A = (aij ) is lower triangular, then the determinant of A is the product of the

110
diagonal entries; that is,
det(A) = a11 · · · · · ann .

(b) det(At) = det(A).

(c) Let B be an n × n matrix. Then

det(AB) = det(A) det(B). (4.1.1)

Theorem 4.1.2. There exists a unique determinant function satisfying the three properties
of Definition 4.1.1.

We will show that it is possible to compute the determinant of any n × n matrix using
Definition 4.1.1. Here we present a few examples:

Lemma 4.1.3. Let A be an n × n matrix.

(a) Let c ∈ R be a scalar. Then det(cA) = cn det(A).

(b) If all of the entries in either a row or a column of A are zero, then det(A) = 0.

Proof: (a) Note that Definition 4.1.1(a) implies that det(cIn) = cn . It follows from (4.1.1)
that
det(cA) = det(cIn A) = det(cIn) det(A) = cn det(A).

(b) Definition 4.1.1(b) implies that it suffices to prove this assertion when one row of
A is zero. Suppose that the ith row of A is zero. Let J be an n × n diagonal matrix with
a 1 in every diagonal entry except the ith diagonal entry which is 0. A matrix calculation
shows that JA = A. It follows from Definition 4.1.1(a) that det(J) = 0 and from (4.1.1)
that det(A) = 0.

Determinants of 2 × 2 Matrices

Before discussing how to compute determinants, we discuss the special case of 2×2 matrices.
Recall from (3.8.2) of Section 3.8 that when
' (
a b
A=
c d

we defined
det(A) = ad − bc. (4.1.2)

111
We check that (4.1.2) satisfies the three properties in Definition 4.1.1. Observe that when
A is lower triangular, then b = 0 and det(A) = ad. So (a) is satisfied. It is straightforward
to verify (b). We already verified (c) in Chapter 3, Proposition 3.8.2.

It is less obvious perhaps — but true nonetheless — that the three properties of det(A)
actually force the determinant of 2 × 2 matrices to be given by formula (4.1.2). We begin
by showing that Definition 4.1.1 implies that
' (
0 1
det = −1. (4.1.3)
1 0

We verify this by observing that


' ( ' (' (' (' (
0 1 1 −1 1 0 1 0 1 1
= . (4.1.4)
1 0 0 1 1 1 0 −1 0 1

Hence property (c), (a) and (b) imply that


' (
0 1
det = 1 · 1 · (−1) · 1 = −1.
1 0

It is helpful to interpret the matrices in (4.1.4) as elementary row operations. Then (4.1.4)
states that swapping two rows in a 2 × 2 matrix is the same as performing the following
row operations in order:

• add the 2nd row to the 1st row;

• multiply the 2nd row by −1;

• add the 1st row to the 2nd row; and

• subtract the 2nd row from the 1st row.

Suppose that d %= 0. Then


' ( ' (' (
a b 1 db ad−bc
0
A= = d .
c d 0 1 c d

It follows from properties (c), (b) and (a) that


ad − bc
det(A) = d = ad − bc,
d
as claimed.

Now suppose that d = 0 and note that


' ( ' (' (
a b 0 1 c 0
A= = .
c 0 1 0 a b

112
Using (4.1.3) we see that
' (
c 0
det(A) = − det = −bc,
a b

as desired.

We have verified that the only possible determinant function for 2 × 2 matrices is the
determinant function defined by (4.1.2).

Row Operations are Invertible Matrices

Proposition 4.1.4. Let A and B be m × n matrices where B is obtained from A by a


single elementary row operation. Then there exists an invertible m × m matrix R such that
B = RA.

Proof: First consider multiplying the j th row of A by the nonzero constant c. Let R be
the diagonal matrix whose j th entry on the diagonal is c and whose other diagonal entries
are 1. Then the matrix RA is just the matrix obtained from A by multiplying the j th row
of A by c. Note that R is invertible when c %= 0 and that R−1 is the diagonal matrix whose
j th entry is 1c and whose other diagonal entries are 1. For example
    
1 0 0 a11 a12 a13 a11 a12 a13
    
 0 1 0   a21 a22 a23  =  a21 a22 a23  ,
0 0 2 a31 a32 a33 2a31 2a32 2a33

multiplies the 3rd row by 2.

Next we show that the elementary row operation that swaps two rows may also be
thought of as matrix multiplication. Let R = (rkl ) be the matrix that deviates from the
identity matrix by changing in the four entries:

rii = 0
rjj = 0
rij = 1
rji = 1

A calculation shows that RA is the matrix obtained from A by swapping the ith and j th
rows. For example,
    
0 0 1 a11 a12 a13 a31 a32 a33
    
 0 1 0   a21 a22 a23  =  a21 a22 a23  ,
1 0 0 a31 a32 a33 a11 a12 a13

113
which swaps the 1st and 3rd rows. Another calculation shows that R2 = In and hence that
R is invertible since R−1 = R.

Finally, we claim that adding c times the ith row of A to the j th row of A can be viewed
as matrix multiplication. Let Ek! be the matrix all of whose entries are 0 except for the
entry in the kth row and 'th column which is 1. Then R = In + cEij has the property that
RA is the matrix obtained by adding c times the j th row of A to the ith row. We can verify
by multiplication that R is invertible and that R−1 = In − cEij . More precisely,

(In + cEij )(In − cEij ) = In + cEij − cEij − c2Eij


2
= In ,

since Eij
2 = O for i %= j. For example,

  
1 5 0 a11 a12 a13
  
(I3 + 5E12)A =  0 1 0   a21 a22 a23 
0 0 1 a31 a32 a33
 
a11 + 5a21 a12 + 5a22 a13 + 5a23
 
=  a21 a22 a23 ,
a31 a32 a33

adds 5 times the 2nd row to the 1st row.

Determinants of Elementary Row Matrices

Lemma 4.1.5. (a) The determinant of a swap matrix is −1.

(b) The determinant of the matrix that adds a multiple of one row to another is 1.

(c) The determinant of the matrix that multiplies one row by c is c.

Proof: The matrix that swaps the ith row with the j th row is the matrix whose nonzero
elements are akk = 1 where k %= i, j and aij = 1 = aji . Using a similar argument as in
(4.1.3) we see that the determinants of these matrices are equal to −1.

The matrix that adds a multiple of one row to another is triangular (either upper or
lower) and has 1’s on the diagonal. Thus property (a) in Definition 4.1.1 implies that the
determinants of these matrices are equal to 1.

Finally, the matrix that multiplies the ith row by c %= 0 is a diagonal matrix all of whose
diagonal entries are 1 except for aii = c. Again property (a) implies that the determinant
of this matrix is c %= 0.

114
Computation of Determinants

We now show how to compute the determinant of any n × n matrix A using elementary row
operations and Definition 4.1.1. It follows from Proposition 4.1.4 that every elementary row
operation on A may be performed by premultiplying A by an elementary row matrix.

For each matrix A there is a unique reduced echelon form matrix E and a sequence of
elementary row matrices R1 . . . Rs such that

E = Rs · · · R1 A. (4.1.5)

It follows from Definition 4.1.1(c) that we can compute the determinant of A once we know
the determinants of reduced echelon form matrices and the determinants of elementary row
matrices. In particular

det(A) = det(E)/(det(R1) · · · det(Rs)). (4.1.6)

It is easy to compute the determinant of any matrix in reduced echelon form using
Definition 4.1.1(a) since all reduced echelon form n × n matrices are upper triangular.
Lemma 4.1.5 tells us how to compute the determinants of elementary row matrices. This
discussion proves:

Proposition 4.1.6. If a determinant function exists for n × n matrices, then it is unique.

We still need to show that determinant functions exist when n > 2. More precisely,
we know that the reduced echelon form matrix E is uniquely defined from A (Chapter 2,
Theorem 2.4.9), but there is more than one way to perform elementary row operations on
A to get to E. Thus, we can write A in the form (4.1.6) in many different ways, and these
different decompositions might lead to different values for det A. (They don’t.)

An Example of Determinants by Row Reduction

As a practical matter we row reduce a square matrix A by premultiplying A by an elementary


row matrix Rj . Thus
1
det(A) = det(Rj A). (4.1.7)
det(Rj )
We use this approach to compute the determinant of the 4 × 4 matrix
 
0 2 10 −2
 
 1 2 4 0 
A=  .
1 6 1 −2 
 
2 1 1 0

115
The idea is to use (4.1.7) to keep track of the determinant while row reducing A to upper
triangular form. For instance, swapping rows changes the sign of the determinant; so
 
1 2 4 0
 
 0 2 10 −2 
det(A) = − det  1 6 1 −2  .

 
2 1 1 0

Adding multiples of one row to another leaves the determinant unchanged; so


 
1 2 4 0
 
 0 2 10 −2 

det(A) = − det  .
0 4 −3 −2 
 
0 −3 −7 0

Multiplying a row by a scalar c corresponds to an elementary row matrix whose determinant


is c. To make sure that we do not change the value of det(A), we have to divide the
determinant by c as we multiply a row of A by c. So as we divide the second row of the
matrix by 2, we multiply the whole result by 2, obtaining
 
1 2 4 0
 
 0 1 5 −1 
det(A) = −2 det  .
0 4 −3 −2 
 
0 −3 −7 0

We continue row reduction by zeroing out the last two entries in the 2nd column, obtaining
   
1 2 4 0 1 2 4 0
   
 0 1 5 −1   = 46 det  0 1 5 −1 

det(A) = −2 det 
 0 0 −23   0 2 
.
 2   0 1 − 23 
0 0 8 −3 0 0 8 −3

Thus  
1 2 4 0
 
 0 1 5 −1 
det(A) = 46 det 

 = −106.

 0 0 1 − 23
2

0 0 0 − 53
23

Determinants and Inverses

We end this subsection with an important observation about the determinant function.
This observation generalizes to dimension n Corollary 3.8.3 of Chapter 3.

116
Theorem 4.1.7. An n × n matrix A is invertible if and only if det(A) %= 0. Moreover, if
A−1 exists, then
1
det A−1 = . (4.1.8)
det A

Proof: If A is invertible, then

det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1.

Thus det(A) %= 0 and (4.1.8) is valid. In particular, the determinants of elementary row ma-
trices are nonzero, since they are all invertible. (This point was proved by direct calculation
in Lemma 4.1.5.)

If A is singular, then A is row equivalent to a non-identity reduced echelon form matrix


E whose determinant is zero (since E is upper triangular and its last diagonal entry is zero).
So it follows from (4.1.5) that

0 = det(E) = det(R1) · · · det(Rs ) det(A)

Since det(Rj ) %= 0, it follows that det(A) = 0.

Corollary 4.1.8. If the rows of an n × n matrix A are linearly dependent (for example, if
one row of A is a scalar multiple of another row of A), then det(A) = 0.

An Inductive Formula for Determinants

In this subsection we present an inductive formula for the determinant — that is, we assume
that the determinant is known for square (n − 1) × (n − 1) matrices and use this formula
to define the determinant for n × n matrices. This inductive formula is called expansion by
cofactors.

Let A = (aij ) be an n × n matrix. Let Aij be the (n − 1) × (n − 1) matrix formed from


A by deleting the ith row and the j th column. The matrices (−1)i+j Aij are called cofactor
matrices of A.

Inductively we define the determinant of an n × n matrix A by:


n
)
det(A) = (−1)1+j a1j det(A1j )
j=1

= a11 det(A11) − a12 det(A12) + · · · + (−1)n+1 a1n det(A1n). (4.1.9)

In Appendix 4.6 we show that the determinant function defined by (4.1.9) satisfies all
properties of a determinant function. Formula (4.1.9) is also called expansion by cofactors
along the 1st row , since the a1j are taken from the 1st row of A. Since det(A) = det(At ), it

117
follows that if (4.1.9) is valid as an inductive definition of determinant, then expansion by
cofactors along the 1st column is also valid. That is,

det(A) = a11 det(A11) − a21 det(A21) + · · · + (−1)n+1 an1 det(An1 ). (4.1.10)

We now explore some of the consequences of this definition, beginning with determinants
of small matrices. For example, Definition 4.1.1(a) implies that the determinant of a 1 × 1
matrix is just
det(a) = a.

Therefore, using (4.1.9), the determinant of a 2 × 2 matrix is:


' (
a11 a12
det = a11 det(a22) − a12 det(a21) = a11a22 − a12a21 ,
a21 a22

which is just the formula for determinants of 2 × 2 matrices given in (4.1.2).

Similarly, we can now find a formula for the determinant of 3 × 3 matrices A as follows.
' ( ' ( ' (
a22 a23 a21 a23 a21 a22
det(A) = a11 det − a12 det + a13 det
a32 a33 a31 a33 a31 a32
(4.1.11)
= a11a22 a33 + a12a23a31 + a13 a21a32 − a11a23 a32 − a12a21a33 − a13 a22a31 .

As an example, compute  
2 1 4
 
det  1 −1 3 
5 6 −2
using formula (4.1.11) as

2(−1)(−2) + 1 · 3 · 5 + 4 · 6 · 1 − 4(−1)5 − 3 · 6 · 2 − (−2)1 · 1 = 4 + 15 + 24 + 20 − 36 + 2 = 29.

There is a visual mnemonic for remembering how to compute the six terms in formula
(4.1.11) for the determinant of 3 × 3 matrices. Write the matrix as a 3 × 5 array by
repeating the first two columns, as shown in bold face in Figure 4.1: Then add the product
of terms connected by solid lines sloping down and to the right and subtract the products of
terms connected by dashed lines sloping up and to the right. Warning: this nice crisscross
algorithm for computing determinants of 3×3 matrices does not generalize to n×n matrices.

When computing determinants of n × n matrices when n > 3, it is usually more efficient


to compute the determinant using row reduction rather than by using formula (4.1.9). In
the appendix to this chapter, Section 4.6, we verify that formula (4.1.9) actually satisfies
the three properties of a determinant, thus completing the proof of Theorem 4.1.2.

118
a11 a12 a13 a a
11 12

a21 a22 a23 a21 a22

a a a a31 a32
31 32 33

Figure 4.1: Mnemonic for computation of determinants of 3 × 3 matrices.

An interesting and useful formula for reducing the effort in computing determinants is
given by the following formula.
Lemma 4.1.9. Let A be an n × n matrix of the form
' (
B 0
A= ,
C D
where B is a k × k matrix and D is an (n − k) × (n − k) matrix. Then

det(A) = det(B) det(D).

Proof: We prove this result using (4.1.9) coupled with induction. Assume that this lemma
is valid or all (n − 1) × (n − 1) matrices of the appropriate form. Now use (4.1.9) to compute

det(A) = a11 det(A11) − a12 det(A12) + · · · ± a1n det(A1n )


= b11 det(A11) − b12 det(A12) + · · · ± b1k det(A1k ).

Note that the cofactor matrices A1j are obtained from A by deleting the 1st row and the
j th column. These matrices all have the form
' (
B1j 0
A1j = ,
Cj D
where Cj is obtained from C by deleting the j th column. By induction on k

det(A1j ) = det(B1j ) det(D).

It follows that

det(A) = (b11 det(B11) − b12 det(B12 ) + · · · ± b1k det(B1k )) det(D)


= det(B) det(D),

as desired.

119
Determinants in MATLAB

The determinant function has been preprogrammed in MATLAB and is quite easy to use.
For example, typing e8 1 11 will load the matrix
 
1 2 3 0
 
 2 1 4 1 
A=  −2 −1
.

 0 1 
−1 0 −2 3

To compute the determinant of A just type det(A) and obtain the answer

ans =
-46

Alternatively, we can use row reduction techniques in MATLAB to compute the deter-
minant of A — just to test the theory that we have developed. Note that to compute the
determinant we do not need to row reduce to reduced echelon form — we need only reduce
to an upper triangular matrix. This can always be done by successively adding multiples
of one row to another — an operation that does not change the determinant. For example,
to clear the entries in the 1st column below the 1st row, type

A(2,:) = A(2,:) - 2*A(1,:);


A(3,:) = A(3,:) + 2*A(1,:);
A(4,:) = A(4,:) + A(1,:)

obtaining

A =
1 2 3 0
0 -3 -2 1
0 3 6 1
0 2 1 3

To clear the 2nd column below the 2nd row type

A(3,:) = A(3,:) + A(2,:);A(4,:) = A(4,:) - A(4,2)*A(2,:)/A(2,2)

obtaining

120
A =
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
0 0 -0.3333 3.6667

Finally, to clear the entry (4, 3) type

A(4,:) = A(4,:) -A(4,3)*A(3,:)/A(3,3)

to obtain

A =
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
0 0 0 3.8333

To evaluate the determinant of A, which is now an upper triangular matrix, type

A(1,1)*A(2,2)*A(3,3)*A(4,4)

obtaining

ans =
-46

as expected.

Hand Exercises

In Exercises 1 – 3 compute the determinants of the given matrix.


 
−2 1 0
 
1. A =  4 5 0 .
1 0 2
 
1 0 2 3
 −1 −2 3 2 
 
2. B =  .
 4 −2 0 3 
1 2 0 −3

121
 
2 1 −1 0 0
 
 1 −2 3 0 0 
 
3. C = 
 −3 2 −2 0 0 .
 
 1 1 −1 2 4 
0 2 3 −1 −3
 
−2 −3 2
 
4. Find det(A−1 ) where A =  4 1 3 .
−1 1 1
5. Two n × n matrices A and B are similar if there exists an n × n matrix P such that
B = P −1AP . Show that the determinants of similar n × n matrices are equal.

In Exercises 6 – 8 use row reduction to compute the determinant of the given matrix.
 
−1 −2 1
 
6. A =  3 1 3 .
−1 1 1
 
1 0 1 0
 0 1 0 −1 
 
7. B =  .
 1 0 −1 0 
0 1 0 1
 
1 2 0 1
 0 2 1 0 
 
8. C =  .
 −2 −3 3 −1 
1 0 5 2
9. Let    
2 −1 0 2 0 0
   
A= 0 3 0  and B =  0 −1 0  .
1 5 3 0 0 3

(a) For what values of λ is det(λA − B) = 0?


(b) Is there a vector x for which Ax = Bx?

In Exercises 10
– 11 verify that the given matrix has determinant −1.
 
1
0 0
 
10. A =  0 0 1 .
0
1 0
 
00 1
 
11. B =  0 1 0 .
10 0
 
3 2 −4
 
12. Compute the cofactor matrices A13, A22, A21 when A =  0 1 5 .
0 0 6

122
 
0 2 −4 5
 −1 7 −2 10 
 
13. Compute the cofactor matrices B11, B23, B43 when B =  .
 0 0 0 −1 
3 4 2 −10

14. Find values of λ where the determinant of the matrix


 
λ−1 0 −1
 
Aλ =  0 λ−1 1 
−1 1 λ

vanishes.

15. Suppose that two n × p matrices A and B are row equivalent. Show that there is an invertible
n × n matrix P such that B = P A.

16. Let A be an invertible n × n matrix and let b ∈ Rn be a column vector. Let Bj be the n × n
matrix obtained from A by replacing the j th column of A by the vector b. Let x = (x1, . . . , xn)t be
the unique solution to Ax = b. Then Cramer’s rule states that

det(Bj )
xj = . (4.1.12)
det(A)

Prove Cramer’s rule. Hint: Let Aj be the j th column of A so that Aj = Aej . Show that

Bj = A(e1 | · · · |ej−1|x|ej+1| · · · |en ).

Using this product, compute the determinant of Bj and verify (4.1.12).

123
4.2 Determinants, An Alternative Treatment

Associated with each n×n matrix A is a number called its determinant. We will give an inductive
development of this concept, beginning with the determinant of a 2 × 2 matrix. Then we’ll express
a 3 × 3 determinant as a sum of 2 × 2 determinants, a 4 × 4 determinant as a sum of 3 × 3
determinants, and so on.

Consider the system of two linear equations in two unknowns

ax + by = α
cx + dy = β

We eliminate the y unknown by multiplying the first equation by d, the second equation by
−b, and adding. This gives
(ad − bc)x = dα − bβ.
dα − bβ
This equation has the solution x = , provided ad − bc #= 0.
ad − bc
Similarly, we can solve the system for the y unknown by multiplying the first equation by −c,
the second equation by a, and adding. This gives

(ad − bc)y = aβ − cα
aβ − cα
which has the solution y = , again provided ad − bc #= 0.
ad − bc
' (
a b
The matrix of coefficients of the system is A = . The number ad − bc is called the
c d
determinant of A. The determinant of A is denoted by det A and by
0 0
0 a b 0
0 0
0 0.
0 c d 0

The determinant has a geometric interpretation in this 2 × 2 case.

The graph of the first equation ax + by = α is a straight line with slope −a/b, provided b #= 0.
The graph of the second equation is a straight line with slope −c/d, provided d #= 0. (If b = 0 or
d = 0, then the corresponding line is vertical.) Assume that b, d #= 0. If
−a −c
#= ,
b d
then the lines have different slopes and the system of equations has a unique solution. However,
−a −c
#= is equivalent to ad − bc #= 0.
b d
Thus, det A #= 0 implies that the system has a unique solution.
−a −c
On the other hand, if ad − bc = 0, then = (assuming b, d #= 0), and the two lines
b d
have the same slope. In this case, the lines are either parallel (the system has no solutions), or the
lines coincide (the system has infinitely many solutions).

124
In general, an n × n matrix A is said to be nonsingular if det A #= 0; A is singular if
det A = 0.

Look again at the solutions


dα − bβ aβ − cα
x= , y= , ad − bc #= 0.
ad − bc ad − bc
The two numerators also have the form of a determinant of a 2 × 2 matrix. In particular, these
solutions can be written as 0 0 0 0
0 α b 0 0 a α 00
0 0 0
0 0 0 0
0 β d 0 0 c β 0
x = 00 0, y = 00 0
0 a b 00 0 a b 00
0 0 0 0
0 c d 0 0 c d 0
This representation of the solutions of a system of two equations in two unknowns is the n = 2
version of a general result known as Cramer’s rule.

Example 1. Given the system of equations

5x − 2y = 8
3x + 4y = 10

Verify that the determinant of the matrix of coefficients is nonzero and solve the system using
Cramer’s rule.
' (
5 −2
SOLUTION The matrix of coefficients is A = and det A = 26. According to Cramer’s
3 4
rule, 0 0 0 0
0 8 −2 0 0 5 8 0
0 0 0 0
0 0 0 0
0 10 4 0 52 0 3 10 0 26
x= 0 0 = = 2, y = 0 0= =1
0 5 −2 0
0 0 26 0 5 −2 0
0 0 26
0 0 0 0
0 2 4 0 0 2 4 0
The solution set is x = 2, y = 1. !

Now we’ll go to 3 × 3 matrices.

The determinant of a 3 × 3 matrix

If  
a1 a2 a3
 
A =  b1 b2 b3 ,
c1 c2 c3
then 0 0
0 a a2 a3 0
0 1 0
0 0
det A = 0 b1 b2 b3 0 = a1b2 c3 − a1 b3c2 + a2 b3c1 − a2b1 c3 + a3 b1c2 − a3 b2c1 .
0 0
0 c1 c2 c3 0

125
The problem with this definition is that it is hard to remember. Fortunately the expression on
the right can be written conveniently in terms of 2 × 2 determinants as follows:

det A = a1 b2c3 − a1b3 c2 + a2 b3c1 − a2 b1c3 + a3b1 c2 − a3 b2c1

= a1 (b2 c3 − b3c2 ) − a2(b1 c3 − b3 c1) + a3 (b1c2 − b2c1 )


0 0 0 0 0 0
0 b b 0 0 b b 0 0 b b 0
0 2 3 0 0 1 3 0 0 1 2 0
= a1 0 0 − a2 0 0 + a3 0 0
0 c2 c3 0 0 c1 c3 0 0 c1 c2 0
This representation of a 3 × 3 determinant is called the expansion of the determinant across the
first row. Notice that the coefficients are the entries a1 , a2, a3 of the first row, that they occur
alternately with + and − signs, and that each is multiplied by a 2 × 2 determinant. You can
remember the determinant that goes with each entry ai as follows: in the original matrix, mentally
cross out the row and column containing ai and take the determinant of the 2 × 2 matrix that
remains.
   
3 −2 −4 7 6 5
   
Example 2. Let A =  2 5 −1  and B =  1 2 1 . Calculate det A and det B.
0 6 1 3 −2 1
SOLUTION
0 0 0 0 0 0
0 5 −1 0 0 2 −1 0 0 2 5 0
0 0 0 0 0 0
det A = 3 0 0 − (−2) 0 0 + (−4) 0 0
0 6 1 0 0 0 1 0 0 0 6 0
= 3[(5)(1) − (−1)(6)] + 2[(2)(1) − (−1)(0)] − 4[(2)(6) − (5)(0)]
= 3(11) + 2(2) − 4(12) = −11

0 0 0 0 0 0
0 2 1 0 0 1 1 0 0 1 2 0
0 0 0 0 0 0
det B = 70 0− 60 0+50 0
0 −2 1 0 0 3 1 0 0 3 −2 0
= 7[(2)(1) − (1)(−2)] − 6[(1)(1) − (1)(3)] + 5[(1)(−2) − (2)(3)]
= 7(4) − 6(−2) + 5(−8) = 0. !

There are other ways to group the terms in the definition. For example

det A = a1 b2c3 − a1b3 c2 + a2 b3c1 − a2 b1c3 + a3b1 c2 − a3 b2c1

= −a2 (b1 c3 − b3 c1) + b2 (a1c3 − a3c1 ) − c2(a1 c3 − a3 c1)


0 0 0 0 0 0
0 b b 0 0 a a 0 0 a a 0
0 1 3 0 0 1 3 0 0 1 3 0
= −a2 0 0 + b2 0 0 + −c2 0 0
0 c1 c3 0 0 c1 c3 0 0 b1 b3 0
This is called the expansion of the determinant down the second column.

In general, depending on how you group the terms in the definition, you can expand across any
row or down any column. The signs of the coefficients in the expansion across a row or down a
column are alternately +, −, starting with a + in the (1,1)-position. The pattern of signs is:
 
+ − +
 
 − + − 
+ − +

126
   
3 −2 −4 7 0 5
   
Example 3. Let A =  2 5 −1  and C =  1 0 1 .
0 6 1 3 −2 1

1. Calculate det A by expanding down the first column.


0 0 0 0 0 0
0 5 −1 0 0 −2 −4 0 0 −2 −4 0
0 0 0 0 0 0
det A = 3 0 0− 20 0+00 0
0 6 1 0 0 6 1 0 0 5 −1 0
= 3[(5)(1) − (−1)(6)] − 2[(−2)(1) − (−4)(6)] + 0
= 3(11) − 2(22) + 0 = −11

2. Calculate det A by expanding across the third row.


0 0 0 0 0 0
0 −2 −4 0 0 3 −4 0 0 3 −2 0
0 0 0 0 0 0
det A = 0 0 0− 60 0 + (1) 0 0
0 5 −1 0 0 2 −1 0 0 2 5 0
= 0 − 6[(3)(−1) − (−4)(2)] + [(3)(5) − (−2)(2)]
= −6(5) + (19) = −11

3. Calculate det C by expanding down the second column.


0 0 0 0 0 0
0 1 1 0 0 7 5 0 0 7 5 0
0 0 0 0 0 0
det C = −0 0 0+ 00 0 − (−2) 0 0
0 3 1 0 0 3 1 0 0 1 1 0
= 0 + 0 + 2(2) = 14

Notice the advantage of expanding across a row or down a column that contains one or more
zeros. !

Now consider the system of three equations in three unknowns

a11x + a12y + a13z = b1


a21x + a22y + a23z = b2
a31x + a32y + a33z = b2

Writing this system in vector-matrix form, we have


    
a11 a12 a13 x b
  1   1 
 a21 a22 a23   x2  =  b2 
 
a31 a32 a33 x3 b3

It can be shown that if det A #= 0, then the system has a unique solution which is given by
det A1 det A2 det A3
x1 = , x2 = , x3 =
det A det A det A
where
     
b1 a12 a13 a11 b1 a13 a11 a12 b1
     
A1 =  b
 2
a22 a23  ,
 A2 =  a
 21
b2 a23  ,
 and A3 =  a
 21
a22 b2 

b3 a32 a33 a31 b3 a33 a31 a32 b3

127
This is Cramer’s rule in the 3 × 3 case.

If det A = 0, then the system either has infinitely many solutions or no solutions.

Example 4. Given the system of equations

2x + y − z = 3
x+y+z = 1 .
x − 2y − 3z = 4

Verify that the determinant of the matrix of coefficients is nonzero and find the value of y using
Cramer’s rule.
 
2 1 −1
 
SOLUTION The matrix of coefficients is A =  1 1 1  and det A = 5.
1 −2 −3

According to Cramer’s rule


0 0
0 2 3 −1 0
0 0
0 0
0 1 1 1 0
0 0
0 1 4 −3 0 −5
y= = = −1. !
5 5

The determinant of a 4 × 4 matrix

Following the pattern suggested by the calculation of a 3 × 3 determinant, we’ll express a 4 × 4


determinant as the sum of four 3 × 3 determinants. For example, the expansion of
0 0
0 a1 a2 a3 a4 0
0 0
0 b 0
0 1 b2 b3 b4 0
det A = 0 0
0 c1 c2 c3 c4 0
0 0
0 d1 d2 d3 d4 0

across the first row is


0 0 0 0 0 0 0 0
0 b2 b3 b4 0 0 b1 b3 b4 0 0 b1 b2 b4 0 0 b1 b2 b3 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
det A = a1 0 c2 c3 c4 0 − a2 0 c1 c3 c4 0 + a3 0 c1 c2 c4 0 + a4 0 c1 c2 c3 0
0 0 0 0 0 0 0 0
0 d2 d3 d4 0 0 d1 d3 d4 0 0 d1 d2 d4 0 0 d1 d2 d3 0

As in the 3 × 3 case, you can calculate a 4 × 4 determinant by expanding across any row or
down any column. The matrix of signs associated with a 4 × 4 determinant is
 
+ − + −
 − + − + 
 
 
 + − + − 
− + − +

128
Cramer’s Rule

Here is the general version of Cramer’s rule: Given the system of n equations in n unknowns
a11x1 + a12x2 + a13x3 + · · · + a1nxn = b1
a21x1 + a22x2 + a23x3 + · · · + a2nxn = b2
a31x1 + a32x2 + a33x3 + · · · + a3nxn = b3 (1)
.........................................................
an1x1 + an2x2 + an3x3 + · · · + annxn = bn

If det A #= 0, then the system has a unique solution x1, x2, . . . , xn given by
det A1 det A2 det An
x1 = , x2 = , ..., xn =
det A det A det A
where det Ai is the determinant obtained by replacing the ith column of det A by the column
 
b1
 
 b2 
 . 
 . 
 . 
bn
i = 1, 2, . . . , n.

If det A = 0, then the system either has no solution or infinitely many solutions. In the special
case of a homogeneous system, det A = 0 implies that the system has infinitely many nontrivial
solutions.

Properties of Determinants

It should now be clear how to calculate an n × n determinant for any n. However, for n > 3 the
calculations, while simple in theory, tend to be long, tedious, and involved. Although the determinant
is a complicated mathematical function (its domain is the set of square matrices, its range is the set
of real numbers), it does have certain properties that can be used to simplify calculations.

Before listing the properties of determinants, we give the determinants of some special types of
matrices.

1. If an n × n matrix A has a row of zeros, or a column of zeros, then det A = 0; an n × n


matrix with a row of zeros or a column of zeros is singular. (Simply expand across the row or
column of zeros.)
2. An n × n matrix is said to be upper triangular if all of its entries below the main diagonal are
zero. (Recall that the entries a11, a22 , a33, . . . , ann form the main diagonal of an n × n
matrix A.) A 4 × 4 upper triangular matrix has the form
 
a11 a12 a13 a14
 0 a 
 22 a23 a24 
T = 
 0 0 a33 a34 
0 0 0 a44

129
Note that this upper triangular form is closely related to the row-echelon form that was so
important in solving systems of linear equations and in finding the inverse of an n × n matrix.
Calculating det T by expanding down the first column, we get
0 0
0 a 0 0 0
0 22 a23 a24 0 0 a 0
0 0 0 33 a34 0
det T = a11 0 0 a33 a34 0 = a11a22 0 0 = a11a22a33a44.
0 0 0 0 a44 0
0 0 0 a44 0

In general, we can see that the determinant of an upper triangular matrix is simply the product
of its entries on the main diagonal.
3. An n × n matrix L is lower triangular if all its entries above the main diagonal are zero.
Just like an upper triangular matrix, the determinant of a lower triangular matrix is the
product of the entries on the main diagonal.

Properties: We’ll list the properties for general n and illustrate with 2 × 2 determinants. Let
A be an n × n matrix

1. If the matrix B is obtained from A by interchanging two rows (or two columns), then
det B = − det A. 0 0 0 0
0 c d 0 0 a b 0
0 0 0 0
0 0 = bc − ad = − 0 0.
0 a b 0 0 c d 0
Note: An immediate consequence of this property is the fact that if A has two identical rows
(or columns), then det A = 0 — interchange the two identical rows, then det A = − det A
which implies det A = 0.
2. If the matrix B is obtained from A by multiplying a row (or column) by a nonzero number
k, then det B = k det A.
0 0 0 0
0 ka kb 0 0 a b 0
0 0 0 0
0 0 = kad − kbc = k(ad − bc) = k 0 0.
0 c d 0 0 c d 0

3. If the matrix B is obtained from A by multiplying a row (or column) by a number k and
adding the result to another row (or column), then det B = det A.
0 0 0 0
0 a b 00 0 a b 0
0 0 0
0 0 = a(kb + d) − b(ka + c) = ad − bc = 0 0.
0 ka + c kb + d 0 0 c d 0

Of course these are the operations we used to row reduce a matrix to row-echelon form. We’ll
use them here to row reduce a determinant to upper triangular form. The difference between row
reducing a matrix and row reducing a determinant is that with a determinant we have to keep track
of the sign if we interchange rows and we have to account for the factor k if we multiply a row by
k.

Example 5. Calculate the determinant


0 0
0 1 2 0 1 00
0
0 0 2 1 0 00
0
0 0
0 −2 −3
0 3 −1 00
0 1 0 5 2 0

130
SOLUTION

0 0 0 0
0 1 2 0 1 00 0 1 2 0 1 0
0 0 0
0 0 2 1 0 00 0 0 2 1 0 0
0 0 0
0 0 = 0 0 =
0 −2 −3 3 −1 0 2R1 +R3 , −R1 +r4 0 0 1 3 1 0 R2 ↔R3
0 0 0 0
0 1 0 5 2 0 0 0 −2 5 1 0

0 0 0 0
0 1 2 0 1 0 0 1 2 0 1 00
0 0 0
0 0 1 3 1 0 0 0 1 3 1 00
0 0 0
−0 0 = −0 0 =
0
0 0 2 1 0 0 −2R2 +R3 →R3 , 2R2 +R4 →R4 0
0 0 0 0 −5 −2 00 −(1/5)R3→R3
0 0 −2 5 1 0 0 0 0 11 3 0

0 0 0 0
0 1 2 0 1 00 0 1 2 0 1 0
0 0 0
0 0 1 3 1 00 0 0 1 3 1 0 −7
0 0 0
50 0 = 50 0=5 = −7
0 0 0 1 2/5 00 −11R3 +R4 →R4 0 0 0 1 2/5 0 5
0 0 0
0 0 0 11 3 0 0 0 0 0 −7/5 0

Note, we could have stopped at


0 0
0 1 2 0 1 00
0
0 0 1 3 1 00
0
−0 0
0
0 0 0 −5 −2 00
0 0 0 11 3 0

and calculated
0 0
0 1 2 0 1 00
0 0 0
0 0 1 3 1 00 0 −5 −2 0
0 0 0
−0 0 = −(1)(1) 0 0 = −(−15 + 22) = −7. !
0
0 0 0 −5 −2 00 0 11 3 0
0 0 0 11 3 0

Inverse, determinant and rank

We have seen that a system of equations Ax = b where A is an n × n matrix has a unique


solution if and only if A has an inverse. We have also seen that the system has a unique solution
if and only if det A #= 0; that is, if and only if A is nonsingular. It now follows that A has
an inverse if and only if det A #= 0. Thus, the determinant provides a test as to whether an n × n
matrix has an inverse.

There is also the connection with rank: an n × n matrix has an inverse if and only if its rank is
n.

Putting all this together we have the following equivalent statements:

1. The system of equations Ax = b, A an n × n matrix, has a unique solution.

131
2. A has an inverse.

3. det A #= 0.

4. A has rank n.

Exercises 4.2

Use the determinant to decide whether the matrix has an inverse. If it exists, find it and verify
your answer by calculating AA−1.
' (
2 0
1.
3 1
' (
0 1
2.
−2 4
' (
1 2
3.
3 4
' (
1 2
4.
−2 −4
 
1 0 2
 
5.  2 −1 3 
4 1 8
 
1 2 −4
 
6.  −1 −1 5 
2 7 −3
 
1 3 −4
 
7.  1 5 −1 
3 13 −6
 
1 2 3
 
8.  0 1 2 
4 5 3
 
1 1 −1 2
 0 2 0 −1 
 
9.  
 −1 2 2 −2 
0 −1 0 1
 
1 1 0 0
 0 1 1 0 
 
10.  
 1 0 1 0 
0 0 1 1

132
11. Each of the matrices in Problems 1 - 10 has integer entries. In some cases the inverse matrix
also had integer entries, in other cases it didn’t. Suppose A is an n × n matrix with integer
entries. Make a conjecture as to when A−1, if it exists, will also have integer entries.

Solve the system of equations by finding the inverse of the matrix of coefficients.
x + 2y = 2
12.
3x + 5y = 4

x + 3y = 5
13.
2x + y = 10

2x + 4y = 2
14.
3x + 8y = 1

2x + y = 4
15.
4x + 3y = 3

x − 3y = 2
16. y + z = −2
2x − y + 4z = 1

x + 2y − z = 2
17. x + y + 2z = 0
x−y−z = 1

−x + y = 5
18. −x + z = −2
6x − 2y − 3z = 1

Evaluate the determinant in two ways, using the indicated row and column expansions
0 0
0 0 3 2 00
0
0 0
19. 0 1 5 7 0; across the 2nd row, down the 1st column.
0 0
0 −2 −6 −1 0
0 0
0 1 2 −3 00
0
0 0
20. 0 2 5 −8 0; across the 3rd row, down the 2nd column.
0 0
0 3 8 −13 0
0 0
0 5 −1 2 0
0 0
0 0
21. 0 3 0 6 0; across the 2nd row, down the 3rd column.
0 0
0 −4 3 1 0
0 0
0 2 −3 0 00
0
0 0
22. 0 5 −2 0 0; across the 3rd row, down the 3nd column.
0 0
0 2 0 −1 0
0 0
0 1 0 3 00
0
0 0
23. 0 2 −2 1 0; across the 1st row, down the 2nd column.
0 0
0 4 0 −3 0

Evaluate the determinant using the row or column that minimizes the amount of computation.

133
0 0
0 1 3 0 0
0 0
0 0
24. 0 2 5 −2 0
0 0
0 3 4 0 0
0 0
0 2 −5 1 0
0 0
0 0
25. 0 0 3 0 0
0 0
0 3 4 −2 0
0 0
0 1 3 0 00
0
0 0
26. 0 2 5 −2 0
0 0
0 3 4 0 0
0 0
0 1 3 −4 00
0
0 0
27. 0 2 0 −2 0
0 0
0 0 0 3 0
0 0
0 1 −2 3 0 0
0 0
0 4 0 5 0 00
0
28. 0 0
0 7 −3 2 2 0
0 0
0 −3 0 4 0 0
0 0
0 2 −1 3 4 00
0
0 1 0 5 2 00
0
29. 0 0
0 −2
0 0 0 2 00
0 −2 0 −1 4 0
0 0
0 x+1 x 00
0
30. Find the values of x such that 0 0 = 3.
0 3 x−2 0
0 0
0 x 0 2 00
0
0 0
31. Find the values of x such that 0 2x x − 1 4 0 = 0.
0 0
0 −x x − 1 x + 1 0

Determine whether Cramer’s rule applies. If it does, solve for the indicated unknown
2x − y + 3z = 1
32. y + 2z = −3 ; x =?
x+z =0

−4x + y = 3
33. 2x + 2y + z = −2 ; y =?
3x + 4z = 2

3x + z = −2
34. x + 2y − z = 0 ; z =?
x − 4y + 3z = 1

−2x − y = 3
35. x + 3y − z = 0 ; z =?
5y − 2z = 3

134
2x + y + 3z = 2
36. 3x − 2y + 4z = 2 ; y =?
x + 4y − 2z = 1

2x + 7y + 3z = 7
37. x + 2y + z = 2 ; x =?
x + 5y + 2z = 5

3x + 6y − z = 3
38. x − 2y + 3z = 2 ; z =?
4x − 2y + 5z = 5
39. Determine the values of λ for which the system

(1 − λ)x + 6y = 0
5x + (2 − λ)y = 0

has nontrivial solutions. Find the solutions for each value of λ.

40. Determine the values of λ for which the system

(λ + 4)x + 4y + 2z = 0
4x + (5 − λ)y + 2z = 0
2x + 2y + (2 − λ)z = 0

has nontrivial solutions. Find the solutions for each value of λ.

135
4.3 Eigenvalues

In this section we discuss how to find eigenvalues for an n×n matrix A. A number λ is an eigenvalue
of A if there exists a nonzero eigenvector v such that

Av = λv. (4.3.1)

The vector v is called an eigenvector corresponding to the eigenvalue λ. It follows that the matrix
A − λIn is singular since
(A − λIn )v = 0.

Theorem 4.1.7 implies that


det(A − λIn ) = 0.

With these observations in mind, we can make the following definition.

Definition 4.3.1. Let A be an n × n matrix. The characteristic polynomial of A is:

pA (λ) = det(A − λIn ).

In Theorem 4.3.3 we show that pA (λ) is indeed a polynomial of degree n in λ. Note here that
the roots of pA are the eigenvalues of A. As we discussed, the real eigenvalues of A are roots
of the characteristic polynomial. Conversely, if λ is a real root of pA , then Theorem 4.1.7 states
that the matrix A − λIn is singular and therefore that there exists a nonzero vector v such that
(4.3.1) is satisfied. Similarly, by using this extended algebraic definition of eigenvalues we allow
the possibility of complex eigenvalues. The complex analog of Theorem 4.1.7 shows that if λ is a
complex eigenvalue, then there exists a nonzero complex n-vector v such that (4.3.1) is satisfied.

Example 4.3.2. Let A be an n × n lower triangular matrix. Then the diagonal entries are the
eigenvalues of A. We verify this statement as follows.
 
a11 − λ 0
 .. 
A − λIn =  . .

(∗) ann − λ

Since the determinant of a triangular matrix is the product of the diagonal entries, it follows that

pA (λ) = (a11 − λ) · · · (ann − λ), (4.3.2)

and hence that the diagonal entries of A are roots of the characteristic polynomial. A similar
argument works if A is upper triangular.

It follows from (4.3.2) that the characteristic polynomial of a triangular matrix is a polynomial
of degree n and that
pA (λ) = (−1)n λn + bn−1λn−1 + · · · + b0. (4.3.3)

for some real constants b0 , . . ., bn−1. In fact, this statement is true in general.

Theorem 4.3.3. Let A be an n×n matrix. Then pA is a polynomial of degree n of the form (4.3.3).

136
Proof: Let C be an n × n matrix whose entries have the form cij + dij λ. Then det(C) is a
polynomial in λ of degree at most n. We verify this statement by induction. It is easily verified
when n = 1, since then C = (c + dλ) for some real numbers c and d. Then det(C) = c + dλ which
is a polynomial of degree at most one. (It may have degree zero, if d = 0.) So assume that this
statement is true for (n − 1) × (n − 1) matrices. Recall from (4.1.9) that

det(C) = (c11 + d11λ) det(C11) + · · · + (−1)n+1 (c1n + d1nλ) det(C1n).

By induction each of the determinants C1j is a polynomial of degree at most n − 1. It follows


that multiplication by c1j + d1j λ yields a polynomial of degree at most n in λ. Since the sum of
polynomials of degree at most n is a polynomial of degree at most n, we have verified our assertion.

Since A−λIn is a matrix whose entries have the desired form, it follows that pA (λ) is a polynomial
of degree at most n in λ. To complete the proof of this theorem we need to show that the coefficient
of λn is (−1)n . Again, we verify this statement by induction. This statement is easily verified for
1 × 1 matrices — we assume that it is true for (n − 1) × (n − 1) matrices. Again use (4.1.9) to
compute

det(A − λIn ) = (a11 − λ) det(B11 ) − a12 det(B12 ) + · · · + (−1)n+1 a1n det(B1n ).

where B1j are the cofactor matrices of A − λIn . Using our previous observation all of the terms
det(B1j ) are polynomials of degree at most n − 1. Thus, in this expansion, the only term that can
contribute a term of degree n is:
−λ det(B11 ).
Note that the cofactor matrix B11 is the (n − 1) × (n − 1) matrix

B11 = A11 − λIn−1 ,

where A11 is the first cofactor matrix of the matrix A. By induction, det(B11 ) is a polynomial of
degree n−1 with leading term (−1)n−1 λn−1. Multiplying this polynomial by −λ yields a polynomial
of degree n with the correct leading term.

General Properties of Eigenvalues

The fundamental theorem of algebra states that every polynomial of degree n has exactly n roots
(counting multiplicity). For example, the quadratic formula shows that every quadratic polynomial
has exactly two roots. In general, the proof of the fundamental theorem is not easy and is certainly
beyond the limits of this course. Indeed, the difficulty in proving the fundamental theorem of algebra
is in proving that a polynomial p(λ) of degree n > 0 has one (complex) root. Suppose that λ0 is a
root of p(λ); that is, suppose that p(λ0 ) = 0. Then it is easy to show that

p(λ) = (λ − λ0 )q(λ) (4.3.4)

for some polynomial q of degree n − 1. So once we know that p has a root, then we can argue by
induction to prove that p has n roots.

Recall that a polynomial need not have any real roots. For example, the polynomial p(λ) = λ2 +1
has no real roots, since p(λ) > 0 for all real λ. This polynomial does have two complex roots

±i = ± −1.

137
However, a polynomial with real coefficients has either real roots or complex roots that come in
complex conjugate pairs. To verify this statement, we need to show that if λ0 is a complex root of
p(λ), then so is λ0 . We claim that
p(λ) = p(λ).
To verify this point, suppose that

p(λ) = cn λn + cn−1λn−1 + · · · + c0 ,

where each cj ∈ R. Then


n n−1
p(λ) = cn λn + cn−1 λn−1 + · · · + c0 = cnλ + cn−1λ + · · · + c0 = p(λ)

If λ0 is a root of p(λ), then


p(λ0 ) = p(λ0 ) = 0 = 0.
Hence λ0 is also a root of p.

It follows that
Theorem 4.3.4. Every (real) n×n matrix A has exactly n eigenvalues λ1 , . . . , λn. These eigenvalues
are either real or complex conjugate pairs. Moreover,

(a) pA (λ) = (λ1 − λ) · · · (λn − λ),


(b) det(A) = λ1 · · · λn .

Proof: Since the characteristic polynomial pA is a polynomial of degree n with real coefficients,
the first part of the theorem follows from the preceding discussion. In particular, it follows from
(4.3.4) that
pA (λ) = c(λ1 − λ) · · · (λn − λ),
for some constant c. Formula (4.3.3) implies that c = 1 — which proves (a). Since pA(λ) =
det(A − λIn ), it follows that pA (0) = det(A). Thus (a) implies that pA (0) = λ1 · · · λn , thus proving
(b).

The eigenvalues of a matrix do not have to be different. For example, consider the extreme case
of a strictly triangular matrix A. Example 4.3.2 shows that all of the eigenvalues of A are zero.

We now discuss certain properties of eigenvalues.


Corollary 4.3.5. Let A be an n × n matrix. Then A is invertible if and only if zero is not an
eigenvalue of A.

Proof: The proof follows from Theorem 4.1.7 and Theorem 4.3.4(b).
Lemma 4.3.6. Let A be a singular n × n matrix. Then the null space of A is the span of all
eigenvectors whose associated eigenvalue is zero.

Proof: An eigenvector v of A has eigenvalue zero if and only if

Av = 0.

This statement is valid if and only if v is in the null space of A.

138
Theorem 4.3.7. Let A be an invertible n × n matrix with eigenvalues λ1 , · · · , λn . Then the eigen-
values of A−1 are λ−1
1 , · · · , λn .
−1

Proof: We claim that


1
pA (λ) = (−1)n det(A)λn pA−1 ( ).
λ
It then follows that λ1 is an eigenvalue for A−1 for each eigenvalue λ of A. This makes sense, since
the eigenvalues of A are nonzero.

Compute:
1 1
(−1)n det(A)λn pA−1 ( ) = (−λ)n det(A) det(A−1 − In )
λ λ
1
= det(−λA) det(A−1 − In)
λ
1
= det(−λA(A−1 − In ))
λ
= det(A − λIn )
= pA (λ),

which verifies the claim.

Theorem 4.3.8. Let A and B be similar n × n matrices. Then

pA = pB ,

and hence the eigenvalues of A and B are identical.

Proof: Since B and A are similar, there exists an invertible n × n matrix S such that B = S −1 AS.
It follows that

det(B − λIn ) = det(S −1 AS − λIn ) = det(S −1 (A − λIn )S) = det(A − λIn ),

which verifies that pA = pB .

Recall that the trace of an n × n matrix A is the sum of the diagonal entries of A; that is

tr(A) = a11 + · · · + ann.

We state without proof the following theorem:

Theorem 4.3.9. Let A be an n × n matrix with eigenvalues λ1 , . . . , λn. Then

tr(A) = λ1 + · · · + λn .

It follows from Theorem 4.3.8 that the traces of similar matrices are equal.

MATLAB Calculations

The commands for computing characteristic polynomials and eigenvalues of square matrices are
straightforward in MATLAB . In particular, for an n×n matrix A, the MATLAB command poly(A)
returns the coefficients of (−1)n pA (λ).

139
For example, reload the 4×4 matrix A of (4.1) by typing e8 1 11. The characteristic polynomial
of A is found by typing

poly(A)

to obtain

ans =
1.0000 -5.0000 15.0000 -10.0000 -46.0000

Thus the characteristic polynomial of A is:

pA (λ) = λ4 − 5λ3 + 15λ2 − 10λ − 46.

The eigenvalues of A are found by typing eig(A) and obtaining

ans =
-1.2224
1.6605 + 3.1958i
1.6605 - 3.1958i
2.9014

Thus A has two real eigenvalues and one complex conjugate pair of eigenvalues. Note that MAT-
LAB has preprogrammed not only the algorithm for finding the characteristic polynomial, but also
numerical routines for finding the roots of the characteristic polynomial.

The trace of A is found by typing trace(A) and obtaining

ans =
5

Using the MATLAB command sum we can verify the statement of Theorem 4.3.9. Indeed sum(v)
computes the sum of the components of the vector v and typing

sum(eig(A))

we obtain the answer 5.0000, as expected.

Hand Exercises

In Exercises 1 – 2 determine the characteristic polynomial and the eigenvalues of the given matrices.

140
 
−9 −2 −10
 
1. A =  3 2 3 .
8 2 9
 
2 1 −5 2
 1 2 13 2 
 
2. B =  .
 0 0 3 −1 
0 0 1 1

3. Find the eigenvectors of  


3 1 −1
 
A =  −1 1 1 
2 2 0
corresponding to the eigenvalue λ = 2.

4. Consider the matrix  


−1 1 1
 
A =  1 −1 1 .
1 1 −1

(a) Verify that the characteristic polynomial of A is pλ (A) = (λ − 1)(λ + 2)2.

(b) Show that (1, 1, 1) is an eigenvector of A corresponding to λ = 1.

(c) Show that (1, 1, 1) is orthogonal to every eigenvector of A corresponding to the eigenvalue
λ = −2.
' (
8 5
5. Consider the matrix A = .
−10 −7

(a) Find the eigenvalues and eigenvectors of A.

(b) Express the vector (a, b) as a linear combination of the vectors found in (a).

6. Find the characteristic polynomial and the eigenvalues of


 
−1 2 2
 
A= 2 2 2 .
−3 −6 −6

Find eigenvectors corresponding to each of the three eigenvalues.

7. Let A be an n × n matrix. Suppose that

A2 + A + In = 0.

Prove that A is invertible.

In Exercises 8 – 9 decide whether the given statements are true or false. If the statements are false,
give a counterexample; if the statements are true, give a proof.

141
8. If the eigenvalues of a 2 × 2 matrix are equal to 1, then the four entries of that matrix are each
less than 500.

9. The trace of the product of two n × n matrices is the product of the traces.

10. When n is odd show that every real n × n matrix has a real eigenvalue.

Computer Exercises

In Exercises 11 – 12, use MATLAB to compute (a) the eigenvalues, traces, and characteristic poly-
nomials of the given matrix. (b) Use the results from part (a) to confirm Theorems 4.3.7 and
4.3.9.

11.  
−12 −19 −3 14 0
 
 −12 10 14 −19 8 
 
A=
 4 −2 1 7 −3 .

 
 −9 17 −12 −5 −8 
−12 −1 7 13 −12

12.  
−12 −5 13 −6 −5 12
 
 7 14 6 1 8 18 
 
 −8 14 13 9 2 1 
B=

.

 2 4 6 −8 −2 15 
 
 −14 0 −6 14 8 −13 
8 16 −8 3 5 19

13. Use MATLAB to compute the characteristic polynomial of the following matrix:
 
4 −6 7
 
A= 2 0 5 
−10 2 5

Denote this polynomial by pA (λ) = −(λ3 + p2 λ2 + p1 λ + p0). Then compute the matrix

B = −(A3 + p2A2 + p1A + p0 I).

What do you observe? In symbols B = pA (A). Compute the matrix B for examples of other square
matrices A and determine whether or not your observation was an accident.

142
4.4 Eigenvalues and Eigenvectors

Here is an alternative discussion of eigenvalues and eigenvectors.

Let A be an n × n matrix. Then A can be viewed as a transformation that maps an


n-component vector to an n component vector; that is, A can be thought of as a function from
Rn into Rn . Here’s an example.

Example 1. Let A be the 3 × 3 matrix


 
2 2 3
 
 1 2 1 .
2 −2 1

Then A maps the vector ( 2, −1, 3 ) to ( 11, 3, 9 ):


    
2 2 3 2 11
    
 1 2 1   −1  =  3  ;
2 −2 1 3 9

A maps the vector ( −1, 2, 3 ) to the vector ( 10, 7, −6 ):


    
2 2 3 −1 10
    
 1 2 1  2  =  7 ;
2 −2 1 3 −6

and, in general,     
2 2 3 a 2a + 2b + 3c
    
 1 2 1  b  =  a + 2b + c  .
2 −2 1 c 2a − 2b + c
Now consider the vector v = ( 8, 5, 2 ). It has a special property relative to A:
      
2 2 3 8 32 8
      
 1 2 1   5  =  20  = 4  5  ;
2 −2 1 2 8 2

A maps v = ( 8, 5, 2 ) to a multiple of itself; Av = 4v.

You can also verify that


         
2 2 3 2 2 2 2 3 1 1
         
 1 2 1  3  = 2 3  and  1 2 1   0  = (−1)  0  .
2 −2 1 −2 −2 2 −2 1 −1 −1

This is the property that we will study in this section.

Definition 4.4.1. Let A be an n × n matrix. A number λ is an eigenvalue of A if there


exists a nonzero vector v in Rn such that

Av = λ v.

The vector v is called an eigenvector corresponding to λ.

143
In Example 1, the numbers 4, 2, −1 are eigenvalues of A and v = ( 8, 5, 2 ), u =
( 2, 3, −2 ), w = ( 1, 0, −1 ) are corresponding eigenvectors.

Eigenvalues (vectors) are also called characteristic values (vectors) or proper values (vectors).

Calculating the Eigenvalues of a Matrix

Let A be an n × n matrix and suppose that λ is an eigenvalue of A with corresponding


eigenvector v. Then
Av = λ v implies Av − λ v = 0.

The latter equation can be written


(A − λIn )v = 0.

This equation says that the homogeneous system of linear equations

(A − λIn )x = 0

has the nontrivial solution v. As we saw in Section 5.6, a homogeneous system of n linear equations
in n unknowns has a nontrivial solution if and only if the determinant of the matrix of coefficients
is zero. Thus, λ is an eigenvalue of A if and only if

det(A − λIn ) = 0.

The equation det(A − λIn ) = 0 is called the characteristic equation of the matrix A. The roots of
the characteristic equation are the eigenvalues of A.

We’ll start with a special case which we’ll illustrate with a 3 × 3 matrix.

Example 2. Let A be an upper triangular matrix:


 
a1 a2 a3
 
A =  0 b2 b3 .
0 0 c3

Then the eigenvalues of A are the entries on the main diagonal.


0 0
0 a −λ a2 a3 0
0 1 0
0 0
det(A − λI3 ) = 0 0 b2 − λ b3 0 = (a1 − λ)(b2 − λ)(c3 − λ).
0 0
0 0 0 c3 − λ 0

Thus, the eigenvalues of A are: λ1 = a1 , λ2 = b2 , λ3 = c3 .

The general result is this: If A is either an upper triangular matrix or a lower triangular matrix,
then the eigenvalues of A are the entries on the main diagonal of A.

Now we’ll look at arbitrary matrices; that is, no special form.

Example 3. Find the eigenvalues of


' (
1 −3
A= .
−2 2

144
SOLUTION We need to find the numbers λ that satisfy the equation det(A − λI2 ) = 0:
' ( ' ( ' (
1 −3 1 0 1−λ −3
A − λI2 = −λ =
−2 2 0 1 −2 2 − λ

and 0 0
0 1−λ −3 0
0 0
0 0 = (1 − λ)(2 − λ) − 6 = λ2 − 3λ − 4 = (λ + 1)(λ − 4).
0 −2 2 − λ 0
Therefore the eigenvalues of A are λ1 = −1 and λ2 = 4.

Example 4. Find the eigenvalues of


 
1 −3 1
 
A =  −1 1 1 .
3 −3 −1

SOLUTION We need to find the numbers λ that satisfy the equation det(A − λI3 ) = 0:
     
1 −3 1 1 0 0 1−λ −3 1
     
A − λI3 =  −1 1 1  − λ 0 1 0 = −1 1 − λ 1 
3 −3 −1 0 0 1 3 −3 −1 − λ

and 0 0
0 1−λ −3 1 0
0 0
0 0
det A = 0 −1 1−λ 1 0.
0 0
0 3 −3 −1 − λ 0

Expanding across the first row (remember, you can go across any row or down any column), we have
0 0
0 1−λ −3 1 00
0
0 0
0 −1 1 − λ 1 0 = (1 − λ) [(1 − λ)(−1 − λ) + 3] + 3 [1 + λ − 3] + [3 − 3(1 − λ)]
0 0
0 3 −3 −1 − λ 0
= −λ3 + λ2 + 4λ − 4 = −(λ + 2)(λ − 1)(λ − 2).

Therefore the eigenvalues of A are λ1 = −2, λ2 = 1, λ3 = 2.

Note that the characteristic equation of our 2 × 2 matrix is a polynomial equation of degree 2,
a quadratic equation; the characteristic equation of our 3 × 3 matrix is a polynomial equation of
degree 3; a cubic equation. This is true in general. That is, the characteristic equation of an n × n
matrix A is 0 0
0 a11 − λ a12 · · · a1n 00
0
0 0
0 a21 a22 − λ · · · a2n 0
0 .. .. .. 00 = p(λ) = 0,
0
0 . . ··· . 0
0 0
0 an1 a2n · · · ann − λ 0
where p is a polynomial of degree n in λ. The polynomial p is called the characteristic polynomial
of A.

Recall the following facts about a polynomial p of degree n with real coefficients.

145
1. p has exactly n roots, counting multiplicities.

2. p may have complex roots, but if a + bi is a root, then its conjugate a − bi is also a root;
the complex roots of p occur in conjugate pairs, counting multiplicities.

3. p can be factored into a product of linear and quadratic factors — the linear factors cor-
responding to the real roots of p and the quadratic factors corresponding to the complex
roots.

Example 5. Calculate the eigenvalues of


' (
1 −1
A= .
4 1

SOLUTION 0 0
0 1−λ −1 0
0 0
0 0 = (1 − λ)2 + 4 = λ2 − 2λ + 5.
0 4 1−λ 0

The roots of λ2 − 2λ + 5 = 0 are λ1 = 1 + 2i, λ2 = 1 − 2i; A has complex eigenvalues.

Example 6. Calculate the eigenvalues of


 
1 −3 3
 
A =  3 −5 3 .
6 −6 4

SOLUTION 0 0
0 1−λ −3 3 0
0 0
0 0
0 3 −5 − λ 3 0 = −λ3 + 12λ + 16 = −(λ + 2)2 (λ − 4).
0 0
0 6 −6 4−λ 0

The eigenvalues of A are λ1 = λ2 = −2, λ3 = 4; −2 is an eigenvalue of multiplicity 2.

Calculating the Eigenvectors of a Matrix

We note first that if λ is an eigenvalue of an n × n matrix A and v is a corresponding


eigenvector, then any nonzero multiple of v is also an eigenvector of A corresponding to λ: if
Av = λ v and r is any nonzero number, then

A(rv) = rAv = r(λ v) = λ(r v).

Thus, each eigenvalue has infinitely many corresponding eigenvectors.

Let A be an n × n matrix. If λ is an eigenvalue of A with corresponding eigenvector v,


then v is a solution of the homogeneous system of equations

(A − λ In )x = 0. (1)

Therefore, to find an eigenvector of A corresponding to the eigenvalue λ, we need to find a


nontrivial solution of (1). This takes us back to the solution methods of Sections 2.3 and 2.4.

146
Example 7. Find the eigenvalues and corresponding eigenvectors of
' (
2 4
.
1 −1

SOLUTION The first step is to find the eigenvalues.


0 0
0 2−λ 4 00
0
det(A − λI2 ) = 0 0 = λ2 − λ − 6 = (λ − 3)(λ + 2).
0 1 −1 − λ 0

The eigenvalues of A are λ1 = 3, λ2 = −2.

Next we find an eigenvector for each eigenvalue. Equation (1) for this problem is
' ( ' (' (
2−λ 4 2−λ 4 x1
x= = (0, 0) . (2)
1 −1 − λ 1 −1 − λ x2

We have to deal with each eigenvalue separately. We set λ = 3 in (2) to get


' (' (
−1 4 x1
= 0.
1 −4 x2

The augmented matrix for this system of equations is


' (
−1 4 0
1 −4 0

which row reduces to ' (


1 −4 0
.
0 0 0
The solution set is x1 = 4a, x2 = a, a any real number.

We get an eigenvector by choosing a value for a. Since an eigenvector is, by definition, a nonzero
vector, we must choose an a #= 0. Any such a will do; we’ll let a = 1. Then, an eigenvector
corresponding to the eigenvalue λ1 = 3 is v1 = (4, 1). Here is a verification:
' (' ( ' ( ' (
2 4 4 12 4
= =3 .
1 −1 1 3 1

Now we set λ = −2 in (2) to get


' (' (
4 4 x1
= 0.
1 1 x2

The augmented matrix for this system of equations is


' (
4 4 0
1 1 0

which row reduces to ' (


1 1 0
.
0 0 0

147
The solution set is x1 = −a, x2 = a, a any real number. Again, to get an eigenvector corresponding
to λ = −2, we can choose any nonzero number for a; we’ll choose a = −1. Then, an eigenvector
corresponding to the eigenvalue λ2 = −2 is v2 = (1, −1). We leave it to you to verify that
Av2 = −2v2 .

NOTE: It is important to understand that in finding eigenvectors we can assign any nonzero value
to the parameter in the solution set of the system of equations (A−λI)x = 0. Typically we’ll choose
values which will avoid fractions in the eigenvector and, just because it reads better, we like to have
the first component of an eigenvector be non-negative. Such choices are certainly not required.

Example 8. Find the eigenvalues and corresponding eigenvectors of


 
2 2 2
 
 −1 2 1 .
1 −2 −1

SOLUTION First we find the eigenvalues.


0 0
0 2−λ 2 2 0
0 0
0 0
det(A − λI3 ) = 0 −1 2 − λ 1 0 = λ3 − 3λ2 + 2λ = λ(λ − 1)(λ − 2).
0 0
0 1 −2 −1 − λ 0

The eigenvalues of A are λ1 = 0, λ2 = 1, λ3 = 2.

Next we find an eigenvector for each eigenvalue. Equation (1) for this problem is
    
2−λ 2 2 2−λ 2 2 x1
    
 −1 2 − λ 1 x =  −1 2 − λ 1   x2  = 0. (3)
1 −2 −1 − λ 1 −2 −1 − λ x3

We have to deal with each eigenvalue separately. We set λ = 0 in (3) to get


  
2 2 2 x1
  
 −1 2 1   x2  = 0.
1 −2 −1 x3

The augmented matrix for this system of equations is


 
2 2 2 0
 
 −1 2 1 0 
1 −2 −1 0

which row reduces to  


1 1 1 0
 
 0 1 2/3 0  .
0 0 0 0
The solution set is x1 = − 13 a, x2 = − 23 a, x3 = a, a any real number.

148
We get an eigenvector by choosing a value for a. We’ll choose a = −3 (this avoids having
fractions as components, a convenience). Thus, an eigenvector corresponding to the eigenvalue
λ1 = 0 is v1 = ( 1, 2, −3 ). Here is a verification:
      
2 2 2 1 0 1
      
 −1 2 1  2  =  0  = 0 2 .
1 −2 −1 −3 0 −3

Next we set λ = 1 in (3) to get


  
1 2 2 x1
  
 −1 1 1   x2  = 0.
1 −2 −2 x3
The augmented matrix for this system of equations is
 
1 2 2 0
 
 −1 1 1 0 
1 −2 −2 0
which row reduces to  
1 2 2 0
 
 0 1 1 0 .
0 0 0 0
The solution set is x1 = 0, x2 = −a, x3 = a, a any real number. If we let a = −1, we get the
eigenvector v2 = ( 0, 1, −1 ). You can verify that Av2 = v2 .

Finally, we set λ = 2 in (3) to get


  
1 2 2 x1
  
 −1 1 1   x2  = 0.
1 −2 −2 x3
The augmented matrix for this system of equations is
 
0 2 2 0
 
 −1 0 1 0 
1 −2 −3 0
which row reduces to  
1 0 −1 0
 
 0 1 1 0 .
0 0 0 0
The solution set is x1 = a, x2 = −a, x3 = a, a any real number. If we let a = 1, we get the
eigenvector v3 = ( 1, −1, 1 ). You can verify that Av3 = v3 .

We have been working mainly with real numbers but the need to consider complex numbers
arises here — the characteristic polynomial of an n × n matrix A may have complex roots. Of
course, since the characteristic polynomial has real coefficients, complex roots occur in conjugate
pairs. Also, we have to expect that an eigenvector corresponding to a complex eigenvalue will be
complex; that is, have complex number components. Here is the main theorem in this regard.

149
Theorem 4.4.2. Let A be a (real) n × n matrix. If the complex number λ = a + bi is an
eigenvalue of A with corresponding (complex) eigenvector u + iv, then λ = a − bi, the conjugate
of a + bi, is also an eigenvalue of A and u − iv is a corresponding eigenvector.

The proof is a simple application of complex arithmetic:

A(u + iv) = (a + bi)(u + iv)


A(u + iv) = (a + bi)(u + iv) (the overline denotes complex conjugate)
A (u + iv) = (a + bi) (u + iv)
A(u − iv) = (a − bi)(u − iv)

Example 9. As we saw in Example 4, the matrix


' (
1 −1
A= .
4 1

has the complex eigenvalues λ1 = 1 + 2i, λ2 = 1 − 2i. We’ll find the eigenvectors. The nice thing
about a pair of complex eigenvalues is that we only have to calculate one eigenvector. Equation (1)
for this problem is
' ( ' (' (
1−λ −1 1−λ −1 x1
x= = 0.
4 1−λ 4 1−λ x2

Substituting λ1 = 1 + 2i in this equation gives


' ( ' (' ( ' (
−2i −1 −2i −1 x1 0
x= = .
4 −2i 4 −2i x2 0

The augmented matrix for this system of equations is


' (
−2i −1 0
4 −2i 0

which row reduces to ' (


1 − 12 i 0
.
0 0 0

The solution set is x1 = 12 i a, x2 = a, a any number; in this case, either real or complex. If we
set x2 = −2i, we get the eigenvector v1 = (1, −2i) = (1, 0) + i (0, −2):
' (' ( ' ( ' (
1 −1 1 1 + 2i 1
= = (1 + 2i) .
4 1 −2i 4 − 2i −2i

Now, by Theorem 2, an eigenvector corresponding to λ2 = 1 − 2i is v2 = (1, 0) − i (0, −2).

Eigenvalues of multiplicity greater than 1 can cause difficulties which we will investigate later
in the text. These difficulties carry over into other fields, such as differential equations. In the next
two examples we’ll illustrate what can happen when an eigenvalue has multiplicity 2.

150
Example 10. Find the eigenvalues and eigenvectors of
 
1 −3 3
 
A =  3 −5 3 .
6 −6 4

SOLUTION First we find the eigenvalues:


0 0
0 1−λ −3 3 0
0 0
0 0
det(A − λI3 ) = 0 3 −5λ 3 0 = 16 + 12λ − λ3 = −(λ − 4)(λ + 2)2.
0 0
0 6 −6 4 − λ 0

The eigenvalues of A are λ1 = 4, λ2 = λ3 = −2; −2 is an eigenvalue of multiplicity 2.

You can verify that v1 = ( 1, 1, 2 ) is an eigenvector corresponding to λ1 = 4.

Now we investigate what happens with the eigenvalue −2:


    
3 −3 3 x1 0
    
(A − (−2)I3 )x =  3 −3 3   x2  =  0  . (a)
6 −6 6 x3 0
The augmented matrix for this system of equation is
 
3 −3 3 0
 
 3 −3 3 0 
6 −6 6 0
which row reduces to  
1 −1 1 0
 
 0 0 0 0 .
0 0 0 0
The solution set of the corresponding system of equations is x1 = a − b, a, b any real numbers.
We can assign any values we want to a and b, except a = b = 0 (an eigenvector is a nonzero
vector). Setting a = 1, b = 0 gives the eigenvector v2 = ( 1, 1, 0 ); setting a = 0, b = −1 gives
the eigenvector v3 = ( 1, 0, −1 ).

Note that our two choices of a and b produced two linearly independent eigenvectors (the
vectors are not multiples of each other). The fact that the solution set of the system of equations
(a) had two independent parameters guarantees that we can obtain two independent eigenvectors.

Example 11. Find the eigenvalues and eigenvectors of


 
5 6 2
 
A =  0 −1 −8 .
1 0 −2

SOLUTION First we find the eigenvalues:


0 0
0 5−λ 6 2 0
0 0
0 0
det(A − λI3 ) = 0 0 −1 − λ −8 0 = −36 + 15λ + 2λ2 − λ3 = −(λ + 4)(λ − 3)2.
0 0
0 1 0 −2 − λ 0

151
The eigenvalues of A are λ1 = −4, λ2 = λ3 = 3; 3 is an eigenvalue of multiplicity 2.

You can verify that v1 = ( 6, −8, −3 ) is an eigenvector corresponding to λ1 = −4.

We now find the eigenvectors for 3:


    
2 6 2 x1 0
    
(A − 3I3)x =  0 −4 −8   x2  =  0  . (b)
1 0 −5 x3 0
The augmented matrix for this system of equation is
 
2 6 2 0
 
 0 −4 −8 0 
1 0 −5 0
which row reduces to  
1 3 1 0
 
 0 1 2 0 .
0 0 0 0
The solution set of the corresponding system of equations is x1 = 5a, x2 = −2a, x3 = a, a any
real number. Setting a = 1 gives the eigenvector v2 = ( 5, −2, 1 )

In this case, the eigenvalue of multiplicity two yielded only one (independent) eigenvector.

In general, if the matrix A has an eigenvalue λ of multiplicity k, then λ may have only one
(independent) eigenvector, it may have two independent eigenvectors, it may have three independent
eigenvectors, and so on, up to k independent eigenvectors. It can be shown using Theorem 2 that
λ cannot have more than k linearly independent eigenvectors.

Eigenvalues, Determinant, Inverse, Rank

There is a relationship between the eigenvalues of an n × n matrix A, the determinant of A, the


existence of A−1, and the rank of A.

Theorem 4.4.3. Let A be an n × n matrix with eigenvalues λ1, λ2 , . . . , λn . (Note: the λ’s
here are not necessarily distinct, one or more of the eigenvalues may have multiplicity greater than
1, and they are not necessarily real.) Then

det A = λ1 · λ2 · λ3 · · · · λn .

That is, det A is the product of the eigenvalues of A.

Proof: The eigenvalues of A are the roots of the characteristic polynomial

det(A − λI) = p(λ).

Writing p(λ) in factored form, we have

det(A − λI) = p(λ) = (λ1 − λ)(λ2 − λ)(λ3 − λ) · · · (λn − λ).

Setting λ = 0, we get
det A = λ1 · λ2 · λ3 · · · · λn .

152
At the end of Section 5.6 we listed equivalences between the determinant, existence of an inverse
and rank. With Theorem 3, we can add eigenvalues to the list of equivalences.

Let A be an n × n matrix. The following are equivalent:

1. The system of equations Ax = b has a unique solution.

2. A has an inverse.

3. det A #= 0.

4. A has rank n.

5. 0 is not an eigenvalue of A

Exercises 4.4

Determine the eigenvalues and the eigenvectors.

' (
2 −1
1. A = .
0 3
' (
3 1
2. A = .
2 0
' (
1 2
3. A = .
3 2
' (
1 4
4. A = .
2 3
' (
6 5
5. A = .
−5 −4
' (
−1 1
6. A = .
4 2
' (
3 −1
7. A = .
1 1
' (
1 −1
8. A = .
1 1
' (
2 −1
9. A = .
1 2
 
3 2 −2
 
10. A =  −3 −1 3 . Hint: 1 is an eigenvalue.
1 2 0

153
 
15 7 −7
 
11. A =  −1 1 1 . Hint: 2 is an eigenvalue.
13 7 −5
 
2 −2 1
 
12. A =  1 −1 1 . Hint: 1 is an eigenvalue.
−3 2 −2
 
2 1 1
 
13. A= 1 2 1 . Hint: 1 is an eigenvalue.
−2 −2 −1
 
1 −3 1
 
14. A= 0 1 0 . Hint: 1 is an eigenvalue.
3 0 −1
 
0 1 1
 
15. A =  −1 1 1 .
−1 1 1
 
1 −4 −1
 
16. A= 3 2 3 . Hint: 2 is an eigenvalue.
1 1 3
 
−1 1 2
 
17. A =  −1 1 1 .
−2 1 3
 
2 2 1
 
18. A =  1 3 1 . Hint: 5 is an eigenvalue.
1 2 2
 
1 −2 2
 
19. A =  −2 1 2 . Hint: 3 is an eigenvalue.
−2 0 3
 
4 1 −1
 
20. A =  2 5 −2 . Hint: 3 is an eigenvalue.
1 1 2
 
4 2 −2 2
 1 3 1 −1 
 
21. A= . Hint: 2 is an eigenvalue of multiplicity 2.
 0 0 2 0 
1 1 −3 5
 
3 5 −5 5
 3 1 3 −3 
 
22. A= . Hint: 2 and −2 are eigenvalues.
 −2 2 0 2 
0 4 −6 8

23. Prove that if λ is an eigenvalue of A, then for every positive integer k, λk is an eigenvalue
of Ak .

154
24. Suppose that λ is a nonzero eigenvalue of A with corresponding eigenvector v. Prove that
if A has an inverse, then 1/λ is an eigenvalue of A−1 with corresponding vector v.

155
4.5 *Markov Chains

MARKOV CHAINS

Markov chains provide an interesting and useful application of matrices and linear algebra. In
this section we introduce Markov chains via some of the theory and two examples. The theory can
be understood and applied to examples using just the background in linear algebra that we have
developed in this chapter.

An Example of Cats

Consider the four room apartment pictured in Figure 4.2. One way passages between the rooms are
indicated by arrows. For example, it is possible to go from room 1 directly to any other room, but
when in room 3 it is possible to go only to room 4.

1 2

3 4
Figure 4.2: Schematic design of apartment passages.

Suppose that there is a cat in the apartment and that at each hour the cat is asked to move from
the room that it is in to another. True to form, however, the cat chooses with equal probability to
stay in the room for another hour or to move through one of the allowed passages. Suppose that
we let pij be the probability that the cat will move from room i to room j; in particular, pii is the
probability that the cat will stay in room i. For example, when the cat is in room 1, it has four
choices — it can stay in room 1 or move to any of the other rooms. Assuming that each of these
choices is made with equal probability, we see that
1 1 1 1
p11 = p12 = p13 = p14 = .
4 4 4 4

156
It is now straightforward to verify that
1 1
p21 = p22 = p23 = 0 p24 = 0
2 2
1 1
p31 = 0 p32 = 0 p33 = p34 =
2 2
1 1 1
p41 = 0 p42 = p43 = p44 = .
3 3 3

Putting these probabilities together yields the transition matrix:


 1 1 1 1 
4 4 4 4
 1 1
0 0 
 
P = 2 2
1 
 0 0 1
2 2

0 1
3
1
3
1
3

This transition matrix has the properties that all entries are nonnegative and that the entries in
each row sum to 1.

Three Basic Questions

Using the transition matrix P , we discuss the answers to three questions:

(A) What is the probability that a cat starting in room i will be in room j after exactly k steps?
We call the movement that occurs after each hour a step.
(B) Suppose that we put 100 cats in the apartment with some initial distribution of cats in each
room. What will the distribution of cats look like after a large number of steps?
(C) Suppose that a cat is initially in room i and takes a large number of steps. For how many of
those steps will the cat be expected to be in room j?

A Discussion of Question (A)

We begin to answer Question (A) by determining the probability that the cat moves from room 1
(2)
to room 4 in two steps. We denote this probability by p14 and compute
(2)
p14 = p11p14 + p12p24 + p13p34 + p14p44; (4.5.1)

that is, the probability is the sum of the probabilities that the cat will move from room 1 to each
room i and then from room i to room 4. In this case the answer is:
(2) 1 1 1 1 1 1 1 13
p14 = × + × 0 + × + × = ≈ 0.27 .
4 4 4 4 2 4 3 48
(2)
It follows from (4.5.1) and the definition of matrix multiplication that p14 is just the (1, 4)th entry in
the matrix P 2. An induction argument shows that the probability of the cat moving from room i to
room j in k steps is precisely the (i, j)th entry in the matrix P k — which answers Question (A). In
particular, we can answer the question: What is the probability that the cat will move from room 4
to room 3 in four steps? Using MATLAB the answer is given by typing e4 10 1 to recall the matrix
P and then typing

157
P4 = P^4;
P4(4,3)

obtaining

ans =
0.2728

A Discussion of Question (B)

We answer Question (B) in two parts: first we compute a formula for determining the number of
cats that are expected to be in room i after k steps, and second we explore that formula numerically
for large k. We begin by supposing that 100 cats are distributed in the rooms according to the initial
vector V0 = (v1 , v2, v3, v4)t ; that is, the number of cats initially in room i is vi . Next, we denote the
(k)
number of cats that are expected to be in room i after k steps by vi . For example, we determine
how many cats we expect to be in room 2 after one step. That number is:
(1)
v2 = p12v1 + p22v2 + p32v3 + p42v4 ; (4.5.2)
(1)
that is, v2 is the sum of the proportion of cats in each room i that are expected to migrate to
room 2 in one step. In this case, the answer is:
1 1 1
v1 + v2 + v4 .
4 2 3
It now follows from (4.5.2), the definition of the transpose of a matrix, and the definition of matrix
(1) (k)
multiplication that v2 is the 2nd entry in the vector P tV0 . Indeed, it follows by induction that vi
is the ith entry in the vector (P t )k V0 which answers the first part of Question (B).

We may rephrase the second part of Question (B) as follows. Let

Vk = (v1k , v2k , v3k , v4k )t = (P t)k V0.

Question (B) actually asks: What will the vector Vk look like for large k. To answer that question
we need some results about matrices like the matrix P in (4.5). But first we explore the answer to
this question numerically using MATLAB.

Suppose, for example, that the initial vector is


 
2
 43 
 
V0 =  .
 21 
34

Typing e4 10 1 and e4 10 4 enters the matrix P and the initial vector V0 into MATLAB. To compute
V20, the distribution of cats after 20 steps, type

Q=P’
V20 = Q^(20)*V0

158
and obtain

V20 =
18.1818
27.2727
27.2727
27.2727

Thus, after rounding to the nearest integer, we expect 27 cats to be in each of rooms 2,3 and 4 and
18 cats to be in room 1 after 20 steps. In fact, the vector V20 has a remarkable feature. Compute
Q*V20 in MATLAB and see that V20 = P tV20; that is, V20 is, to within four digit numerical precision,
an eigenvector of P t with eigenvalue equal to 1. This computation was not a numerical accident,
as we now describe. Indeed, compute V20 for several initial distributions V0 of cats and see that the
answer will always be the same — up to four digit accuracy.

A Discussion of Question (C)

Suppose there is just one cat in the apartment; and we ask how many times that cat is expected to
visit room 3 in 100 steps. Suppose the cat starts in room 1; then the initial distribution of cats is
one cat in room 1 and zero cats in any of the other rooms. So V0 = e1 . In our discussion of Question
(B) we saw that the 3rd entry in (P t)k V0 gives the probability ck that the cat will be in room 3 after
k steps.

In the extreme, suppose that the probability that the cat will be in room 3 is 1 for each step k.
Then the fraction of the time that the cat is in room 3 is

(1 + 1 + · · · + 1)/100 = 1.

In general, the fraction of the time f that the cat will be in room 3 during a span of 100 steps is
1
f= (c1 + c2 + · · · + c100).
100
Since ck = (P t)k V0 , we see that
1
f = (P tV0 + (P t )2V0 + · · · + (P t)100V0 ). (4.5.3)
100

So, to answer Question (C), we need a way to sum the expression for f in (4.5.3), at least
approximately. This is not an easy task — though the answer itself is easy to explain. Let V be the
eigenvector of P t with eigenvalue 1 such that the sum of the entries in V is 1. The answer is: f is
approximately equal to V . See Theorem 4.5.4 for a more precise statement.

In our previous calculations the vector V20 was seen to be (approximately) an eigenvector of P t
with eigenvalue 1. Moreover the sum of the entries in V20 is precisely 100. Therefore, we normalize
V20 to get V by setting
1
V = V20.
100
So, the fraction of time that the cat spends in room 3 is f ≈ 0.2727. Indeed, we expect the cat to
spend approximately 27% of its time in rooms 2,3,4 and about 18% of its time in room 1.

159
Markov Matrices

We now abstract the salient properties of our cat example. A Markov chain is a system with a finite
number of states labeled 1,. . . ,n along with probabilities pij of moving from site i to site j in a single
step. The Markov assumption is that these probabilities depend only on the site that you are in and
not on how you got there. In our example, we assumed that the probability of the cat moving from
say room 2 to room 4 did not depend on how the cat got to room 2 in the first place.

We make a second assumption: there is a k such that it is possible to move from any site i to
any site j in exactly k steps. This assumption is not valid for general Markov chains, though it
is valid for the cat example, since it is possible to move from any room to any other room in that
example in exactly three steps. (It takes a minimum of three steps to get from room 3 to room 1
in the cat example.) To simplify our discussion we include this assumption in our definition of a
Markov chain.

Definition 4.5.1. Markov matrices are square matrices P such that

(a) all entries in P are nonnegative,

(b) the entries in each row of P sum to 1, and

(c) there is a positive integer k such that all of the entries in P k are positive.

It is straightforward to verify that parts (a) and (b) in the definition of Markov matrices are
satisfied by the transition matrix
 
p11 · · · p1n
 . .. .. 
P = . 
 . . . 
pn1 · · · pnn

of a Markov chain. To verify part (c) requires further discussion.

Proposition 4.5.2. Let P be a transition matrix for a Markov chain.

(a) The probability of moving from site i to site j in exactly k steps is the (i, j)th entry in the
matrix P k .

(b) The expected number of individuals at site i after exactly k steps is the ith entry in the vector
Vk ≡ (P t)k V0 .

(c) P is a Markov matrix.

Proof: Only minor changes in our discussion of the cat example proves parts (a) and (b) of the
proposition.

(c) The assumption that it is possible to move from each site i to each site j in exactly k steps
means that the (i, j)th entry of P k is positive. For that k, all of the entries of P k are positive. In
the cat example, all entries of P 3 are positive.

160
Proposition 4.5.2 gives the answer to Question (A) and the first part of Question (B) for general
Markov chains.
(0) (0) (0)
Let vi ≥ 0 be the number of individuals initially at site i, and let V0 = (v1 , . . ., vn )t . The
total number of individuals in the initial population is:
(0)
#(V0) = v1 + · · · + vn(0) .

Theorem 4.5.3. Let P be a Markov matrix. Then

(a) #(Vk ) = #(V0); that is, the number of individuals after k time steps is the same as the initial
number.
(b) V = lim Vk exists and #(V ) = #(V0 ).
k→∞

(c) V is an eigenvector of P t with eigenvalue equal to 1.

Proof: (a) By induction it is sufficient to show that #(V1 ) = #(V0). We do this by calculating
from V1 = P tV0 that
(1)
#(V1 ) = v1 + · · · + vn(1)
(0) (0)
= (p11v1 + · · · + pn1vn(0) ) + · · · + (p1n v1 + · · · + pnnvn(0) )
(0)
= (p11 + · · · + p1n)v1 + · · · + (pn1 + · · · + pnn)vn(0)
(0)
= v1 + · · · + vn(0)

since the entries in each row of P sum to 1. Thus #(V1) = #(V0), as claimed.

(b) The hard part of this theorem is proving that the limiting vector V exists; we give a proof of
this fact in Chapter 8, Theorem 8.5.4. Once V exists it follows directly from (a) that #(V ) = #(V0).

(c) Just calculate that

P tV = P t( lim Vk ) = P t( lim (P t)k V0 ) = lim (P t)k+1 V0 = lim (P t)k V0 = V,


k→∞ k→∞ k→∞ k→∞

which proves (c).

Theorem 4.5.3(b) gives the answer to the second part of Question (B) for general Markov chains.
Next we discuss Question (C).

Theorem 4.5.4. Let P be a Markov matrix. Let V be the eigenvector of P t with eigenvalue 1 and
#(V ) = 1. Then after a large number of steps N the expected number of times an individual will
visit site i is N vi where vi is the ith entry in V .

Sketch of proof: In our discussion of Question (C) for the cat example, we explained why the
fraction fN of time that an individual will visit site j when starting initially at site i is the j th entry
in the sum
1
fN = (P t + (P t)2 + · · · + (P t)N )ei .
N
See (4.5.3). The proof of this theorem involves being able to calculate the limit of fN as N → ∞.
There are two main ideas. First, the limit of the matrix (P t)N exists as N approaches infinity —

161
call that limit Q. Moreover, Q is a matrix all of whose columns equal V . Second, for large N , the
sum
P t + (P t)2 + · · · + (P t )N ≈ Q + Q + · · · + Q = N Q,

so that the limit of the fN is Qei = V .

The verification of these statements is beyond the scope of this text. For those interested, the
idea of the proof of the second part is roughly the following. Fix k large enough so that (P t)k is
close to Q. Then when N is large, much larger than k, the sum of the first k terms in the series is
nearly zero.

Theorem 4.5.4 gives the answer to Question (C) for a general Markov chain. It follows from
Theorem 4.5.4 that for Markov chains the amount of time that an individual spends in room i is
independent of the individual’s initial room — at least after a large number of steps.

A complete proof of this theorem relies on a result known as the ergodic theorem. Roughly
speaking, the ergodic theorem relates space averages with time averages. To see how this point is
relevant, note that Question (B) deals with the issue of how a large number of individuals will be
distributed in space after a large number of steps, while Question (C) deals with the issue of how
the path of a single individual will be distributed in time after a large number of steps.

An Example of Umbrellas

This example focuses on the utility of answering Question (C) and reinforces the fact that results in
Theorem 4.5.3 have the second interpretation given in Theorem 4.5.4.

Consider the problem of a man with four umbrellas. If it is raining in the morning when the
man is about to leave for his office, then the man takes an umbrella from home to office, assuming
that he has an umbrella at home. If it is raining in the afternoon, then the man takes an umbrella
from office to home, assuming that he has an umbrella in his office. Suppose that the probability
that it will rain in the morning is p = 0.2 and the probability that it will rain in the afternoon is
q = 0.3, and these probabilities are independent. What percentage of days will the man get wet
going from home to office; that is, what percentage of the days will the man be at home on a rainy
morning with all of his umbrellas at the office?

There are five states in the system depending on the number of umbrellas that are at home. Let
si where 0 ≤ i ≤ 4 be the state with i umbrellas at home and 4 − i umbrellas at work. For example,
s2 is the state of having two umbrellas at home and two at the office. Let P be the 5 × 5 transition
matrix of state changes from morning to afternoon and Q be the 5 × 5 transition matrix of state
changes from afternoon to morning. For example, the probability p23 of moving from site s2 to site
s3 is 0, since it is not possible to have more umbrellas at home after going to work in the morning.
The probability q23 = q, since the number of umbrellas at home will increase by one only if it is
raining in the afternoon. The transition probabilities between all states are given in the following

162
transition matrices:
   
1 0 0 0 0 1−q q 0 0 0
   
 p 1−p 0 0 0   0 1−q q 0 0 
   
P =  0 p 1−p 0 0 ;
 Q=
 0 0 1−q q 0 

   
 0 0 p 1−p 0   0 0 0 1−q q 
0 0 0 p 1−p 0 0 0 0 1

Specifically,
   
1 0 0 0 0 0.7 0.3 0 0 0
   
 0.2 0.8 0 0 0   0 0.7 0.3 0 0 
   
P =
 0 0.2 0.8 0 0 
 and Q = 
 0 0 0.7 0.3 0 

   
 0 0 0.2 0.8 0   0 0 0 0.7 0.3 
0 0 0 0.2 0.8 0 0 0 0 1

The transition matrix M from moving from state si on one morning to state sj the next morning
is just M = P Q. We can compute this matrix using MATLAB by typing

e4_10_6
M = P*Q

obtaining

M =
0.7000 0.3000 0 0 0
0.1400 0.6200 0.2400 0 0
0 0.1400 0.6200 0.2400 0
0 0 0.1400 0.6200 0.2400
0 0 0 0.1400 0.8600

It is easy to check using MATLAB that all entries in the matrix M 4 are nonzero. So M is a Markov
matrix and we can use Theorem 4.5.4 to find the limiting distribution of states. Start with some
initial condition like V0 = (0, 0, 1, 0, 0)t corresponding to the state in which two umbrellas are at
home and two at the office. Then compute the vectors Vk = (M t)k V0 until arriving at an eigenvector
of M t with eigenvalue 1. For example, V70 is computed by typing V70 = M’^(70)*V0 and obtaining

V70 =
0.0419
0.0898
0.1537
0.2633
0.4512

We interpret V ≈ V70 in the following way. Since v1 is approximately .042, it follows that for
approximately 4.2% of all steps the umbrellas are in state s0 . That is, approximately 4.2% of all

163
days there are no umbrellas at home. The probability that it will rain in the morning on one of those
days is 0.2. Therefore, the probability of being at home in the morning when it is raining without
any umbrellas is approximately 0.008.

Hand Exercises

1. Let P be a Markov matrix and let w = (1, . . . , 1)t. Show that the vector w is an eigenvector of
P with eigenvalue 1.

In Exercises 2 – 4 which of the matrices are Markov matrices, and why?


' (
0.8 0.2
2. P = .
0.2 0.8
' (
0.8 0.2
3. Q = .
0 1
' (
0.8 0.2
4. R = .
−0.2 1.2

5. The state diagram of a Markov chain is given in Figure 4.3. Assume that each arrow leaving a
state has equal probability of being chosen. Find the transition matrix for this chain.

1 3

Figure 4.3: State diagram of a Markov chain.

6. Suppose that P and Q are each n × n matrices whose rows sum to 1. Show that P Q is also an
n × n matrix whose rows sum to 1.

Computer Exercises

164
7. Suppose the apartment in Figure 4.2 is populated by dogs rather than cats. Suppose that dogs
will actually move when told; that is, at each step a dog will move from the room that he occupies
to another room.

(a) Calculate the transition matrix PDOG for this Markov chain and verify that PDOG is a Markov
matrix.
(b) Find the probability that a dog starting in room 2 will end up in room 3 after 5 steps.
(c) Find the probability that a dog starting in room 3 will end up in room 1 after 4 steps. Explain
why your answer is correct without using MATLAB.
(d) Suppose that the initial population consists of 100 dogs. After a large number of steps what
will be the distribution of the dogs in the four rooms.

8. A truck rental company has locations in three cities A, B and C. Statistically, the company knows
that the trucks rented at one location will be returned in one week to the three locations in the
following proportions.

Rental Location Returned to A Returned to B Returned to C


A 75% 10% 15%
B 5% 85% 10%
C 20% 20% 60%

Suppose that the company has 250 trucks. How should the company distribute the trucks so that
the number of trucks available at each location remains approximately constant from one week to
the next?

9. Let  
0.10 0.20 0.30 0.15 0.25
 
 0.05 0.35 0.10 0.40 0.10 
 
P =
 0 0 0.35 0.55 0.10 

 
 0.25 0.25 0.25 0.25 0 
0.33 0.32 0 0 0.35
be the transition matrix of a Markov chain.

(a) What is the probability that an individual at site 2 will move to site 5 in three steps?
(b) What is the probability that an individual at site 4 will move to site 1 in seven steps?
(c) Suppose that 100 individuals are initially uniformly distributed at the five sites. How will the
individuals be distributed after four steps?
(d) Find an eigenvector of P t with eigenvalue 1.

10. Suppose that the probability that it will rain in the morning in p = 0.3 and the probability that
it will rain in the afternoon is q = 0.25. In the man with umbrellas example, what is the probability
that the man will be at home with no umbrellas while it is raining?

11. Suppose that the original man in the text with umbrellas has only three umbrellas instead of
four. What is the probability that on a given day he will get wet going to work?

165
4.6 Appendix: Existence of Determinants

The purpose of this appendix is to verify the inductive definition of determinant (4.1.9). We have
already shown that if a determinant function exists, then it is unique. We also know that the
determinant function exists for 1 × 1 matrices. So we assume by induction that the determinant
function exists for (n−1)×(n−1) matrices and prove that the inductive definition gives a determinant
function for n × n matrices.

Recall that Aij is the cofactor matrix obtained from A by deleting the ith row and j th column
— so Aij is an (n − 1) × (n − 1) matrix. The inductive definition is:

D(A) = a11 det(A11 ) − a12 det(A12 ) + · · · + (−1)n+1 a1n det(A1n ).

We use the notation D(A) to remind us that we have not yet verified that this definition satisfies
properties (a)-(c) of Definition 4.1.1. In this appendix we verify these properties after assuming that
the inductive definition satisfies properties (a)-(c) for (n − 1) × (n − 1) matrices. For emphasis, we
use the notation det to indicate the determinant of square matrices of size less than n.

Property (a) is easily verified for D(A) since if A is lower triangular, then

D(A) = a11 det(A11 ) = a11a22 · · · ann

by induction.

Before verifying that D satisfies properties (b) and (c) of a determinant, we prove:

Lemma 4.6.1. Let E be a elementary row matrix and let B be any n × n matrix. Then

D(EB) = D(E)D(B) (4.6.1)

Proof: We verify (4.6.1) for each of the three types of elementary row operations.

(I) Suppose that E multiplies the ith row by a nonzero scalar c. If i > 1, then the cofactor matrix
(EA)1j is obtained from the cofactor matrix A1j by multiplying the (i − 1)st row by c. By induction,
det(EA)1j = c det(A1j ) and D(EA) = cD(A). On the other hand, D(E) = det(E11) = c. So (4.6.1)
is verified in this instance. If i = 1, then the 1st row of EA is (ca11, . . . , ca1n) from which it is easy
to verify (4.6.1).

(II) Next suppose that E adds a multiple c of the ith row to the j th row. We note that D(E) = 1.
When j > 1 then D(E) = det(E11) = 1 by induction. When j = 1 then D(E) = det(E11) ±
c det(E1i) = det(In−1) ± c det(E1i). But E1i is strictly upper triangular and det(E1i ) = 0. Thus
D(E) = 1.

If i > 1 and j > 1, then the result D(EA) = D(A) = D(E)D(A) follows by induction.

If i = 1, then

D(EB) = b11 det((EB)11 ) + · · · + (−1)n+1 b1n det((EB)1n )


= D(B) + cD(C)

166
where the 1st and ith row of C are equal.

If j = 1, then

D(EB) = (b11 + cbi1) det(B11 ) + · · · + (−1)n+1 (b1n + cbin) det(B1n )


1 2
= b11 det(B11 ) + · · · + (−1)n+1 b1n det(B1n ) +
1 2
c bi1 det(B11 ) + · · · + (−1)n+1 bi1 det(B1n )
= D(B) + cD(C)

where the 1st and ith row of C are equal.

The hardest part of this proof is a calculation that shows that if the 1st and ith rows of C are
equal, then D(C) = 0. By induction, we can swap the ith row with the 2nd . Hence we need only
verify this fact when i = 2.

(III) E is the matrix that swaps two rows.

As we saw earlier (4.1.4), E is the product of four matrices of types (I) and (II). It follows that
D(E) = −1 and D(EA) = −D(A) = D(E)D(A).

We now verify that when the 1st and 2nd rows of an n × n matrix C are equal, then D(C) = 0.
This is a tedious calculation that requires some facility with indexes and summations. Rather than
do this proof for general n, we present the proof for n = 4. This case contains all of the ideas of the
general proof.

We begin with the definition of D(C)


   
c22 c23 c24 c21 c23 c24
   
D(C) = c11 det  c32 c33 c34  − c12 det  c31 c33 c34  +
c42 c43 c44 c41 c43 c44
   
c21 c22 c24 c21 c22 c23
   
c13 det  c31 c32 c34  − c14 det  c31 c32 c33  .
c41 c42 c44 c41 c42 c43

Next we expand each of the four 3 × 3 matrices along their 1st rows, obtaining
' ' ( ' ( ' ((
c33 c34 c32 c34 c32 c33
D(C) = c11 c22 det − c23 det + c24 det
c43 c44 c42 c44 c42 c43
' ' ( ' ( ' ((
c33 c34 c31 c34 c31 c33
−c12 c21 det − c23 det + c24 det
c43 c44 c41 c44 c41 c43
' ' ( ' ( ' ((
c32 c34 c31 c34 c31 c32
+c13 c21 det − c22 det + c24 det
c42 c44 c41 c44 c41 c42
' ' ( ' ( ' ((
c32 c33 c31 c33 c31 c32
−c14 c21 det − c22 det + c23 det
c42 c43 c41 c43 c41 c42

167
Combining the 2 × 2 determinants leads to:
' ( ' (
c33 c34 c32 c33
D(C) = (c11c22 − c12c21) det + (c11c24 − c14c21) det
c43 c44 c42 c43
' ( ' (
c31 c34 c32 c34
+(c12c23 − c13c22 ) det + (c13c21 − c11c23) det
c41 c44 c42 c44
' ( ' (
c31 c32 c31 c33
+(c13c24 − c14c23 ) det + (c14c22 − c12c24) det
c41 c42 c41 c43
Supposing that
c21 = c11 c22 = c12 c23 = c13 c24 = c14
it is now easy to check that D(C) = 0.

We now return to verifying that D(A) satisfies properties (b) and (c) of being a determinant.
We begin by showing that D(A) = 0 if A has a row that is identically zero. Suppose that the zero
row is the ith row and let E be the matrix that multiplies the ith row of A by c. Then EA = A.
Using (4.6.1) we see that
D(A) = D(EA) = D(E)D(A) = cD(A),
which implies that D(A) = 0 since c is arbitrary.

Next we prove that D(A) = 0 when A is singular. Using row reduction we can write

A = Es · · · E1R

where the Ej are elementary row matrices and R is in reduced echelon form. Since A is singular,
the last row of R is identically zero. Hence D(R) = 0 and (4.6.1) implies that D(A) = 0.

We now verify property (b). Suppose that A is singular; we show that D(At ) = D(A) = 0. Since
the row rank of A equals the column rank of A, it follows that At is singular when A is singular.
Next assume that A is nonsingular. Then A is row equivalent to In and we can write

A = Es · · · E1, (4.6.2)

where the Ej are elementary row matrices. Since

At = E1t · · · Est

and D(E) = D(E t ), property (b) follows.

We now verify property (c): D(AB) = D(A)D(B). Recall that A is singular if and only if
there exists a nonzero vector v such that Av = 0. Now if A is singular, then so is At. Therefore
(AB)t = B t At is also singular. To verify this point, let w be the nonzero vector such that At w = 0.
Then B t At w = 0. Thus AB is singular since (AB)t is singular. Thus D(AB) = 0 = D(A)D(B)
when A is singular. Suppose now that A is nonsingular. It follows that

AB = Es · · · E1B.

Using (4.6.1) we see that

D(AB) = D(Es ) · · · D(E1 )D(B) = D(Es · · · E1)D(B) = D(A)D(B),

as desired. We have now completed the proof that a determinant function exists.

168
Chapter 5

Vector Spaces

In Chapter 2 we discussed how to solve systems of m linear equations in n unknowns.


We found that solutions of these equations are vectors (x1 , . . ., xn ) ∈ Rn . In Chapter 3
we discussed how the notation of matrices and matrix multiplication drastically simplifies
the presentation of linear systems and how matrix multiplication leads to linear mappings.
We also discussed briefly how linear mappings lead to methods for solving linear systems
— superposition, eigenvectors, inverses. These chapters have provided an introduction to
many of the ideas of linear algebra and now we begin the task of formalizing these ideas.

Sets having the two operations of vector addition and scalar multiplication are called
vector spaces. This concept is introduced in Section 5.1 along with the two primary exam-
ples — the set Rn in which solutions to systems of linear equations sit and the set C 1 of
differentiable functions in which solutions to systems of ordinary differential equations sit.
Solutions to systems of homogeneous linear equations form subspaces of Rn and solutions
of systems of linear differential equations form subspaces of C 1. These issues are discussed
in Sections 5.1 and 5.2.

When we solve a homogeneous system of equations, we write every solution as a su-


perposition of a finite number of specific solutions. Abstracting this process is one of the
main points of this chapter. Specifically, we show that every vector in many commonly
occurring vector spaces (in particular, the subspaces of solutions) can be written as a linear
combination (superposition) of a few solutions. The minimum number of solutions needed
is called the dimension of that vector space. Sets of vectors that generate all solutions by
superposition and that consist of that minimum number of vectors are called bases. These
ideas are discussed in detail in Sections 5.3–5.5. The proof of the main theorem (Theo-
rem 5.5.3), which gives a computable method for determining when a set is a basis, is given
in Section 5.6. This proof may be omitted on a first reading, but the statement of the
theorem is most important and must be understood.

169
5.1 Vector Spaces and Subspaces

Vector spaces abstract the arithmetic properties of addition and scalar multiplication of
vectors. In Rn we know how to add vectors and to multiply vectors by scalars. Indeed,
it is straightforward to verify that each of the eight properties listed in Table 5.1 is valid
for vectors in V = Rn . Remarkably, sets that satisfy these eight properties have much in
common with Rn . So we define:

Definition 5.1.1. Let V be a set having the two operations of addition and scalar multi-
plication. Then V is a vector space if the eight properties listed in Table 5.1.1 hold. The
elements of a vector space are called vectors.

Table 5.1: Properties of Vector Spaces: suppose u, v, w ∈ V and r, s ∈ R.

(A1) Addition is commutative v+w=w+v


(A2) Addition is associative (u + v) + w = u + (v + w)
(A3) Additive identity 0 exists v+0=v
(A4) Additive inverse −v exists v + (−v) = 0
(M1) Multiplication is associative (rs)v = r(sv)
(M2) Multiplicative identity exists 1v = v
(D1) Distributive law for scalars (r + s)v = rv + sv
(D2) Distributive law for vectors r(v + w) = rv + rw

The vector 0 mentioned in (A3) in Table 5.1 is called the zero vector.

When we say that a vector space V has the two operations of addition and scalar
multiplication we mean that the sum of two vectors in V is again a vector in V and the
scalar product of a vector with a number is again a vector in V . These two properties are
called closure under addition and closure under scalar multiplication.

In this discussion we focus on just two types of vector spaces: Rn and function spaces.
The reason that we make this choice is that solutions to linear equations are vectors in Rn
while solutions to linear systems of differential equations are vectors of functions.

An Example of a Function Space

For example, let F denote the set of all functions f : R → R. Note that functions like
f1 (t) = t2 − 2t + 7 and f2 (t) = sin t are in F since they are defined for all real numbers t,
but that functions like g1(t) = 1t and g2 (t) = tan t are not in F since they are not defined
for all t.

170
We can add two functions f and g by defining the function f + g to be:
(f + g)(t) = f (t) + g(t).
We can also multiply a function f by a scalar c ∈ R by defining the function cf to be:
(cf )(t) = cf (t).
With these operations of addition and scalar multiplication, F is a vector space; that is, F
satisfies the eight vector space properties in Table 5.1. More precisely:

(A3) Define the zero function O by


O(t) = 0 for all t ∈ R.
For every x in F the function O satisfies:
(x + O)(t) = x(t) + O(t) = x(t) + 0 = x(t).
Therefore, x + O = x and O is the additive identity in F .

(A4) Let x be a function in F and define y(t) = −x(t). Then y is also a function in F , and
(x + y)(t) = x(t) + y(t) = x(t) + (−x(t)) = 0 = O(t).
Thus, x has the additive inverse −x.

After these comments it is straightforward to verify that the remaining six properties in
Table 5.1 are satisfied by functions in F .

Sets that are not Vector Spaces

It is worth considering how closure under vector addition and scalar multiplication can fail.
Consider the following three examples.

(i) Let V1 be the set that consists of just the x and y axes in the plane. Since (1, 0) and
(0, 1) are in V1 but
(1, 0) + (0, 1) = (1, 1)
is not in V1, we see that V1 is not closed under vector addition. On the other hand,
V1 is closed under scalar multiplication.

(ii) Let V2 be the set of all vectors (k, ') ∈ R2 where k and ' are integers. The set V2 is
closed under addition but not under scalar multiplication since 12 (1, 0) = ( 12 , 0) is not
in V2.

(iii) Let V3 = [1, 2] be the closed interval in R. The set V3 is neither closed under addition
(1 + 1.5 = 2.5 %∈ V3) nor under scalar multiplication (4 · 1.5 = 6 %∈ V3). Hence the set
V3 is not closed under vector addition and not closed under scalar multiplication.

171
Subspaces

Definition 5.1.2. Let V be a vector space. A nonempty subset W ⊂ V is a subspace if W


is a vector space using the operations of addition and scalar multiplication defined on V .

Note that in order for a subset W of a vector space V to be a subspace it must be closed
under addition and closed under scalar multiplication. That is, suppose w1, w2 ∈ W and
r ∈ R. Then

(i) w1 + w2 ∈ W , and

(ii) rw1 ∈ W .

The x-axis and the xz-plane are examples of subsets of R3 that are closed under addition
and closed under scalar multiplication. Every vector on the x-axis has the form (a, 0, 0) ∈
R3 . The sum of two vectors (a, 0, 0) and (b, 0, 0) on the x-axis is (a + b, 0, 0) which is also
on the x-axis. The x-axis is also closed under scalar multiplication as r(a, 0, 0) = (ra, 0, 0),
and the x-axis is a subspace of R3 . Similarly, every vector in the xz-plane in R3 has the
form (a1 , 0, a3). As in the case of the x-axis, it is easy to verify that this set of vectors is
closed under addition and scalar multiplication. Thus, the xz-plane is also a subspace of
R3 .

In Theorem 5.1.4 we show that every subset of a vector space that is closed under
addition and scalar multiplication is a subspace. To verify this statement, we need the
following lemma in which some special notation is used. Typically, we use the same notation
0 to denote the real number zero and the zero vector. In the following lemma it is convenient
to distinguish the two different uses of 0, and we write the zero vector in boldface.

Lemma 5.1.3. Let V be a vector space, and let 0 ∈ V be the zero vector. Then

0v = 0 and (−1)v = −v

for every vector in v ∈ V .

Proof: Let v be a vector in V and use (D1) to compute

0v + 0v = (0 + 0)v = 0v.

By (A4) the vector 0v has an additive inverse −0v. Adding −0v to both sides yields

(0v + 0v) + (−0v) = 0v + (−0v) = 0.

Associativity of addition (A2) now implies

0v + (0v + (−0v)) = 0.

172
A second application of (A4) implies that

0v + 0 = 0

and (A3) implies that 0v = 0.

Next, we show that the additive inverse −v of a vector v is unique. That is, if v + a = 0,
then a = −v.

Before beginning the proof, note that commutativity of addition (A1) together with
(A3) implies that 0 + v = v. Similarly, (A1) and (A4) imply that −v + v = 0.

To prove uniqueness of additive inverses, add −v to both sides of the equation v + a = 0


yielding
−v + (v + a) = −v + 0.

Properties (A2) and (A3) imply

(−v + v) + a = −v.

But
(−v + v) + a = 0 + a = a.

Therefore a = −v, as claimed.

To verify that (−1)v = −v, we show that (−1)v is the additive inverse of v. Using (M1),
(D1), and the fact that 0v = 0, calculate

v + (−1)v = 1v + (−1)v = (1 − 1)v = 0v = 0.

Thus, (−1)v is the additive inverse of v and must equal −v, as claimed.

Theorem 5.1.4. Let W be a subset of the vector space V . If W is closed under addition
and closed under scalar multiplication, then W is a subspace.

Proof: We have to show that W is a vector space using the operations of addition and
scalar multiplication defined on V . That is, we need to verify that the eight properties
listed in Table 5.1 are satisfied. Note that properties (A1), (A2), (M1), (M2), (D1), and
(D2) are valid for vectors in W since they are valid for vectors in V .

It remains to verify (A3) and (A4). Let w ∈ W be any vector. Since W is closed under
scalar multiplication, it follows that 0w and (−1)w are in W . Lemma 5.1.3 states that
0w = 0 and (−1)w = −w; it follows that 0 and −w are in W . Hence, properties (A3) and
(A4) are valid for vectors in W , since they are valid for vectors in V .

173
Examples of Subspaces of Rn

Example 5.1.5. (a) Let V be a vector space. Then the subsets V and {0} are always
subspaces of V . A subspace W ⊂ V is proper if W %= 0 and W %= V .

(b) Lines through the origin are subspaces of Rn . Let w ∈ Rn be a nonzero


vector and let W = {rw : r ∈ R}. The set W is closed under addition and scalar
multiplication and is a subspace of Rn by Theorem 5.1.4. The subspace W is just a
line through the origin in Rn , since the vector rw points in the same direction as w
when r > 0 and the exact opposite direction when r < 0.

(c) Planes containing the origin are subspaces of R3. To verify this point, let P be a
plane through the origin and let N be a vector perpendicular to P . Then P consists of
all vectors v ∈ R3 perpendicular to N ; using the dot-product (see Chapter 2, (2.2.3))
we recall that such vectors satisfy the linear equation N · v = 0. By superposition, the
set of all solutions to this equation is closed under addition and scalar multiplication
and is therefore a subspace by Theorem 5.1.4.

In a sense that will be made precise all subspaces of Rn can be written as the span of a
finite number of vectors generalizing Example 5.1.5(b) or as solutions to a system of linear
equations generalizing Example 5.1.5(c).

Examples of Subspaces of the Function Space F

Let P be the set of all polynomials in F . The sum of two polynomials is a polynomial and
the scalar multiple of a polynomial is a polynomial. Thus, P is closed under addition and
scalar multiplication, and P is a subspace of F .

As a second example of a subspace of F , let C 1 be the set of all continuously differentiable


functions u : R → R. A function u is in C 1 if u and u! exist and are continuous for all t ∈ R.
Examples of functions in C 1 are:

(i) Every polynomial p(t) = am tm + am−1 tm−1 + · · · + a1 t + a0 is in C 1.

(ii) The function u(t) = eλt is in C 1 for each constant λ ∈ R.

(iii) The trigonometric functions u(t) = sin(λt) and v(t) = cos(λt) are in C 1 for each
constant λ ∈ R.

(iv) u(t) = t7/3 is twice differentiable everywhere and is in C 1.

Equally there are many commonly used functions that are not in C 1. Examples include:

174
(i) u(t) = 1
t−5 is neither defined nor continuous at t = 5.

(ii) u(t) = |t| is not differentiable (at t = 0).

(iii) u(t) = csc(t) is neither defined nor continuous at t = kπ for any integer k.

The subset C 1 ⊂ F is a subspace and hence a vector space. The reason is simple. If x(t)
and y(t) are continuously differentiable, then
d dx dy
(x + y) = + .
dt dt dt
Hence x + y is differentiable and is in C 1 and C 1 is closed under addition. Similarly, C 1 is
closed under scalar multiplication. Let r ∈ R and let x ∈ C 1. Then
d dx
(rx)(t) = r (t).
dt dt
Hence rx is differentiable and is in C 1.

The Vector Space (C 1)n

Another example of a vector space that combines the features of both Rn and C 1 is (C 1)n .
Vectors u ∈ (C 1)n have the form

u(t) = (u1 (t), . . ., un (t)),

where each coordinate function uj (t) ∈ C 1. Addition and scalar multiplication in (C 1)n are
defined coordinatewise — just like addition and scalar multiplication in Rn . That is, let
u, v be in (C 1)n and let r be in R, then

(u + v)(t) = (u1(t) + v1(t), . . . , un (t) + vn (t))


(ru)(t) = (ru1(t), . . . , run(t)).

The set (C 1)n satisfies the eight properties of vector spaces and is a vector space. Solutions
to systems of n linear ordinary differential equations are vectors in (C 1)n .

Hand Exercises

1. Verify that the set V1 consisting of all scalar multiples of (1, −1, −2) is a subspace of R3 .

2. Let V2 be the set of all 2 × 3 matrices. Verify that V2 is a vector space.

3. Let ' (
1 1 0
A= .
1 −1 1
Let V3 be the set of vectors x ∈ R3 such that Ax = 0. Verify that V3 is a subspace of R3. Compare
V1 with V3 .

175
In Exercises 4 – 10 you are given a vector space V and a subset W . For each pair, decide whether
or not W is a subspace of V .

4. V = R3 and W consists of vectors in R3 that have a 0 in their first component.

5. V = R3 and W consists of vectors in R3 that have a 1 in their first component.

6. V = R2 and W consists of vectors in R2 for which the sum of the components is 1.

7. V = R2 and W consists of vectors in R2 for which the sum of the components is 0.


34
8. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying −2 x(t)dt = 0.

9. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying x(1) = 0.

10. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying x(1) = 1.

In Exercises 11 – 15 which of the sets S are subspaces?

11. S = {(a, b, c) ∈ R3 : a ≥ 0, b ≥ 0, c ≥ 0}.

12. S = {(x1, x2, x3) ∈ R3 : a1x1 + a2x2 + a3 x3 = 0 where a1 , a2, a3 ∈ R are fixed}.

13. S = {(x, y) ∈ R2 : (x, y) is on the line through (1, 1) with slope 1}.

14. S = {x ∈ R2 : Ax = 0} where A is a 3 × 2 matrix.

15. S = {x ∈ R2 : Ax = b} where A is a 3 × 2 matrix and b ∈ R3 is a fixed nonzero vector.

16. Let V be a vector space and let W1 and W2 be subspaces. Show that the intersection W1 ∩ W2
is also a subspace of V .

17. For which scalars a, b, c do the solutions to the equation

ax + by = c

form a subspace of R2 ?

18. For which scalars a, b, c, d do the solutions to the equation

ax + by + cz = d

form a subspace of R3 ?

176
5.2 Construction of Subspaces

The principle of superposition shows that the set of all solutions to a homogeneous system of linear
equations is closed under addition and scalar multiplication and is a subspace. Indeed, there are
two ways to describe subspaces: first as solutions to linear systems, and second as the span of a set
of vectors. We shall see that solving a homogeneous linear system of equations just means writing
the solution set as the span of a finite set of vectors.

Solutions to Homogeneous Systems Form Subspaces

Definition 5.2.1. Let A be an m × n matrix. The null space of A is the set of solutions to the
homogeneous system of linear equations
Ax = 0. (5.2.1)
Lemma 5.2.2. Let A be an m × n matrix. Then the null space of A is a subspace of Rn .

Proof: Suppose that x and y are solutions to (5.2.1). Then

A(x + y) = Ax + Ay = 0 + 0 = 0;

so x + y is a solution of (5.2.1). Similarly, for r ∈ R

A(rx) = rAx = r0 = 0;

so rx is a solution of (5.2.1). Thus, x + y and rx are in the null space of A, and the null space is
closed under addition and scalar multiplication. So Theorem 5.1.4 implies that the null space is a
subspace of the vector space Rn.

Writing Solution Subspaces as a Span

The way we solve homogeneous systems of equations gives a second method for defining subspaces.
For example, consider the system
Ax = 0,
where ' (
2 1 4 0
A= .
−1 0 2 1
The matrix A is row equivalent to the reduced echelon form matrix
' (
1 0 −2 −1
E= .
0 1 8 2
Therefore x = (x1 , x2, x3, x4) is a solution of Ex = 0 if and only if x1 = 2x3 +x4 and x2 = −8x3 −2x4.
It follows that every solution of Ex = 0 can be written as:
   
2 1
 −8   −2 
   
x = x3   + x4  .
 1   0 
0 1

177
Since row operations do not change the set of solutions, it follows that every solution of Ax = 0
has this form. We have also shown that every solution is generated by two vectors by use of vector
addition and scalar multiplication. We say that this subspace is spanned by the two vectors
   
2 1
 −8   −2 
   
  and  .
 1   0 
0 1

For example, a calculation verifies that the vector


 
1
 −10 
 
 
 2 
−3

is also a solution of Ax = 0 since


     
1 2 1
 −10   −8   −2 
     
  = 2 − 3 . (5.2.2)
 2   1   0 
−3 0 1

Spans

Let v1 , . . ., vk be a set of vectors in a vector space V . A vector v ∈ V is a linear combination of


v1 , . . . , vk if
v = r1v1 + · · · + rk vk
for some scalars r1, . . . , rk .

Definition 5.2.3. The set of all linear combinations of the vectors v1 , . . . , vk in a vector space V
is the span of v1 , . . . , vk and is denoted by span{v1 , . . ., vk }.

For example, the vector on the left hand side in (5.2.2) is a linear combination of the two vectors
on the right hand side.

The simplest example of a span is Rn itself. Let vj = ej where ej ∈ Rn is the vector with a 1 in
the j th coordinate and 0 in all other coordinates. Then every vector x = (x1 , . . . , xn) ∈ Rn can be
written as
x = x1e1 + · · · + xnen .
It follows that
Rn = span{e1 , . . . , en}.
Similarly, the set span{e1 , e3} ⊂ R3 is just the x1x3 -plane, since vectors in this span are

x1e1 + x3e3 = x1 (1, 0, 0) + x3(0, 0, 1) = (x1 , 0, x3).

Proposition 5.2.4. Let V be a vector space and let w1 , . . ., wk ∈ V . Then W = span{w1, . . . , wk} ⊂
V is a subspace.

178
Proof: Suppose x, y ∈ W . Then

x = r1 w1 + · · · + rk wk
y = s1 w1 + · · · + sk wk

for some scalars r1, . . . , rk and s1 , . . . , sk . It follows that

x + y = (r1 + s1 )w1 + · · · + (rk + sk )wk

and
rx = (rr1)w1 + · · · + (rrk )wk

are both in span{w1, . . . , wk}. Hence W ⊂ V is closed under addition and scalar multiplication, and
is a subspace by Theorem 5.1.4.

For example, let


v = (2, 1, 0) and w = (1, 1, 1) (5.2.3)
3
be vectors in R . Then linear combinations of the vectors v and w have the form

αv + βw = (2α + β, α + β, β)

for real numbers α and β. Note that every one of these vectors is a solution to the linear equation

x1 − 2x2 + x3 = 0, (5.2.4)

that is, the 1st coordinate minus twice the 2nd coordinate plus the 3rd coordinate equals zero.
Moreover, you may verify that every solution of (5.2.4) is a linear combination of the vectors v and
w in (5.2.3). Thus, the set of solutions to the homogeneous linear equation (5.2.4) is a subspace,
and that subspace can be written as the span of all linear combinations of the vectors v and w.

In this language we see that the process of solving a homogeneous system of linear equations is
just the process of finding a set of vectors that span the subspace of all solutions. Indeed, we can
now restate Theorem 2.4.6 of Chapter 2. Recall that a matrix A has rank # if it is row equivalent
to a matrix in echelon form with # nonzero rows.

Proposition 5.2.5. Let A be an m × n matrix with rank #. Then the null space of A is the span of
n − # vectors.

We have now seen that there are two ways to describe subspaces — as solutions of homogeneous
systems of linear equations and as a span of a set of vectors, the spanning set. Much of linear algebra
is concerned with determining how one goes from one description of a subspace to the other.

Hand Exercises

In Exercises 1 – 4 a single equation in three variables is given. For each equation write the subspace
of solutions in R3 as the span of two vectors in R3 .

179
1. 4x − 2y + z = 0.

2. x − y + 3z = 0.

3. x + y + z = 0.

4. y = z.

In Exercises 5 – 8 each of the given matrices is in reduced echelon form. Write solutions of the
corresponding homogeneous system of linear equations as a span of vectors.
 
1 2 0 1 0
 
5. A =  0 0 1 4 0 .
0 0 0 0 1
' (
1 3 0 5
6. B = .
0 0 1 2
' (
1 0 2
7. A = .
0 1 1
 
1 −1 0 5 0 0
 
8. B =  0 0 1 2 0 2 .
0 0 0 0 1 2

9. Write a system of two linear equations of the form Ax = 0 where A is a 2 × 4 matrix whose
subspace of solutions in R4 is the span of the two vectors
   
1 0
 −1   0 
   
v1 =   and v2 =  .
 0   1 
0 −1
' (
2 2
10. Write the matrix A = as a linear combination of the matrices
−3 0
' ( ' (
1 1 0 0
B= and C= .
0 0 1 0

11. Is (2, 20, 0) in the span of w1 = (1, 1, 3) and w2 = (1, 4, 2)? Answer this question by setting
up a system of linear equations and solving that system by row reducing the associated augmented
matrix.

In Exercises 12 – 15 let W ⊂ C 1 be the subspace spanned by the two polynomials x1 (t) = 1 and
x2(t) = t2 . For the given function y(t) decide whether or not y(t) is an element of W . Furthermore,
if y(t) ∈ W , determine whether the set {y(t), x2(t)} is a spanning set for W .

12. y(t) = 1 − t2,

13. y(t) = t4 ,

180
14. y(t) = sin t,

15. y(t) = 0.5t2

16. Let W ⊂ R4 be the subspace that is spanned by the vectors

w1 = (−1, 2, 1, 5) and w2 = (2, 1, 3, 0).

Find a linear system of two equations such that W = span{w1, w2} is the set of solutions of this
system.

17. Let V be a vector space and let v ∈ V be a nonzero vector. Show that

span{v, v} = span{v}.

18. Let V be a vector space and let v, w ∈ V be vectors. Show that

span{v, w} = span{v, w, v + 3w}.

19. Let W = span{w1, . . . , wk } be a subspace of the vector space V and let wk+1 ∈ W be another
vector. Prove that W = span{w1 , . . ., wk+1}.

20. Let Ax = b be a system of m linear equations in n unknowns, and let r = rank(A) and
s = rank(A|b). Suppose that this system has a unique solution. What can you say about the
relative magnitudes of m, n, r, s?

5.3 Spanning Sets and MATLAB

In this section we discuss:

• how to find a spanning set for the subspace of solutions to a homogeneous system of linear
equations using the MATLAB command null, and
• how to determine when a vector is in the subspace spanned by a set of vectors using the
MATLAB command rref.

Spanning Sets for Homogeneous Linear Equations

In Chapter 2 we saw how to use Gaussian elimination, back substitution, and MATLAB to compute
solutions to a system of linear equations. For systems of homogeneous equations, MATLAB provides
a command to find a spanning set for the subspace of solutions. That command is null. For example,
if we type

A = [2 1 4 0; -1 0 2 1]
B = null(A)

then we obtain

181
B =
0.4830 0
-0.4140 0.8729
-0.1380 -0.2182
0.7591 0.4364

The two columns of the matrix B span the set of solutions of the equation Ax = 0. In particular,
the vector (2, −8, 1, 0) is a solution to Ax = 0 and is therefore a linear combination of the column
vectors of B. Indeed, type

4.1404*B(:,1)-7.2012*B(:,2)

and observe that this linear combination is the desired one.

Next we describe how to find the coefficients 4.1404 and -7.2012 by showing that these coeffi-
cients themselves are solutions to another system of linear equations.

When is a Vector in a Span?

Let w1 , . . ., wk and v be vectors in Rn . We now describe a method that allows us to decide whether
v is in span{w1, . . . , wk}. To answer this question one has to solve a system of n linear equations in
k unknowns. The unknowns correspond to the coefficients in the linear combination of the vectors
w1, . . . , wk that gives v.

Let us be more precise. The vector v is in span{w1, . . . , wk } if and only if there are constants
r1, . . . , rk such that the equation
r1w1 + · · · + rk wk = v (5.3.1)
is valid. Define the n × k matrix A as the one having w1, . . . , wk as its columns; that is,

A = (w1 | · · · |wk). (5.3.2)

Let r be the k-vector  


r1
 . 
r= . 
 . .
rk
Then we may rewrite equation (5.3.1) as
Ar = v. (5.3.3)
To summarize:
Lemma 5.3.1. Let w1, . . . , wk and v be vectors in Rn. Then v is in span{w1, . . . , wk} if and only
if the system of linear equations (5.3.3) has a solution where A is the n × k defined in (5.3.2).

To solve (5.3.3) we row reduce the augmented matrix (A|v). For example, is v = (2, 1) in the
span of w1 = (1, 1) and w2 = (1, −1)? That is, do there exist scalars r1, r2 such that

r1 (1, 1) + r2 (1, −1) = (2, 1)?

182
As noted, we can rewrite this equation as
' (
1 1
(r1 , r2) = (2, 1) .
1 −1
We can solve this equation by row reducing the augmented matrix
' (
1 1 2
1 −1 1
to obtain ' (
1 0 3
2 .
0 1 1
2

So v = 32 w1 + 12 w2.

Row reduction to reduced echelon form has been preprogrammed in the MATLAB command
rref. Consider the following example. Let

w1 = (2, 0, −1, 4) and w2 = (2, −1, 0, 2) (5.3.4)

and ask the question whether v = (−2, 4, −3, 4) is in span{w1, w2}.

In MATLAB load the matrix A having w1 and w2 as its columns and the vector v by typing
e5 3 5    
2 2 −2
 0 −1   4 
   
A=  and v =  .
 −1 0   −3 
4 2 4
We can solve the system of equations using MATLAB. First, form the augmented matrix by typing

aug = [A v]

Then solve the system by typing rref(aug) to obtain

ans =
1 0 3
0 1 -4
0 0 0
0 0 0

It follows that (r1 , r2) = (3, −4) is a solution and v = 3w1 − 4w2.

Now we change the 4th entry in v slightly by typing v(4) = 4.01. There is no solution to the
system of equations
 
−2
 4 
 
Ar =  
 −3 
4.01
as we now show. Type

183
aug = [A v]
rref(aug)

which yields

ans =
1 0 0
0 1 0
0 0 1
0 0 0

This matrix corresponds to an inconsistent system; thus v is no longer in the span of w1 and w2.

Computer Exercises

In Exercises 1 – 3 use the null command in MATLAB to find all the solutions of the linear system
of equations Ax = 0.

1. ' (
−4 0 −4 3
A=
−4 1 −1 1

2.  
1 2
 
A= 1 0 
3 −2

3. ' (
1 1 2
A= .
−1 2 −1

4. Use the null command in MATLAB to verify your answers to Exercises 5 and 6.

5. Use row reduction to find the solutions to Ax = 0 where A is given in (1). Does your answer
agree with the MATLAB answer using null? If not, explain why.

In Exercises 6 – 8 let W ⊂ R5 be the subspace spanned by the vectors

w1 = (2, 0, −1, 3, 4), w2 = (1, 0, 0, −1, 2), w3 = (0, 1, 0, 0, −1).

Use MATLAB to decide whether the given vectors are elements of W .

6. v1 = (2, 1, −2, 8, 3).

7. v2 = (−1, 12, 3, −14, −1).

8. v3 = (−1, 12, 3, −14, −14).

184
5.4 Linear Dependence and Linear Independence

An important question in linear algebra concerns finding spanning sets for subspaces having the
smallest number of vectors. Let w1, . . . , wk be vectors in a vector space V and let W = span{w1, . . . , wk}.
Suppose that W is generated by a subset of these k vectors. Indeed, suppose that the kth vector is
redundant in the sense that W = span{w1 , . . ., wk−1}. Since wk ∈ W , this is possible only if wk is
a linear combination of the k − 1 vectors w1 , . . . , wk−1; that is, only if
wk = r1w1 + · · · + rk−1wk−1. (5.4.1)
Definition 5.4.1. Let w1 , . . . , wk be vectors in the vector space V . The set {w1, . . . , wk } is linearly
dependent if one of the vectors wj can be written as a linear combination of the remaining k − 1
vectors.

Note that when k = 1, the phrase ‘{w1} is linearly dependent’ means that w1 = 0.

If we set rk = −1, then we may rewrite (5.4.1) as


r1 w1 + · · · + rk−1wk−1 + rk wk = 0.
It follows that:
Lemma 5.4.2. The set of vectors {w1, . . . , wk} is linearly dependent if and only if there exist scalars
r1, . . . , rk such that

(a) at least one of the rj is nonzero, and


(b) r1w1 + · · · + rk wk = 0.

For example, the vectors w1 = (2, 4, 7), w2 = (5, 1, −1), and w3 = (1, −7, −15) are linearly
dependent since 2w1 − w2 + w3 = 0.
Definition 5.4.3. A set of k vectors {w1, . . ., wk } is linearly independent if none of the k vectors
can be written as a linear combination of the other k − 1 vectors.

Since linear independence means not linearly dependent, Lemma 5.4.2 can be rewritten as:
Lemma 5.4.4. The set of vectors {w1, . . . , wk } is linearly independent if and only if whenever
r1w1 + · · · + rk wk = 0,
it follows that
r1 = r2 = · · · = rk = 0.

Let ej be the vector in Rn whose j th component is 1 and all of whose other components are 0.
The set of vectors e1 , . . . , en is the simplest example of a set of linearly independent vectors in Rn.
We use Lemma 5.4.4 to verify independence by supposing that
r1e1 + · · · + rnen = 0.
A calculation shows that
0 = r1e1 + · · · + rnen = (r1, . . . , rn).
It follows that each rj equals 0, and the vectors e1 , . . . , en are linearly independent.

185
Deciding Linear Dependence and Linear Independence

Deciding whether a set of k vectors in Rn is linearly dependent or linearly independent is equivalent


to solving a system of linear equations. Let w1 , . . . , wk be vectors in Rn, and view these vectors as
column vectors. Let
A = (w1 | · · ·|wk ) (5.4.2)
be the n × k matrix whose columns are the vectors wj . Then a vector
 
r1
 . 
R= . 
 . 
rk
is a solution to the system of equations AR = 0 precisely when

r1w1 + · · · + rk wk = 0. (5.4.3)

If there is a nonzero solution R to AR = 0, then the vectors {w1, . . . , wk } are linearly dependent; if
the only solution to AR = 0 is R = 0, then the vectors are linearly independent.

The preceding discussion is summarized by:


Lemma 5.4.5. The vectors w1 , . . ., wk in Rn are linearly dependent if the null space of the n × k
matrix A defined in (5.4.2) is nonzero and linearly independent if the null space of A is zero.

A Simple Example of Linear Independence with Two Vectors

The two vectors    


2 1
 −8   −2 
   
w1 =   and w2 =  
 1   0 
0 1
are linearly independent. To see this suppose that r1w1 + r2 w2 = 0. Using the components of w1
and w2 this equality is equivalent to the system of four equations

2r1 + r2 = 0, −8r1 − 2r2 = 0, r1 = 0, and r2 = 0.

In particular, r1 = r2 = 0; hence w1 and w2 are linearly independent.

Using MATLAB to Decide Linear Dependence

Suppose that we want to determine whether or not the vectors


       
1 −1 1 0
       
 2   1   1   4 
       
w1 = 
 −1  w2 =  4  w3 =  −1
   

 w4 = 
 3 

       
 3   −2   3   1 
5 0 12 −2
are linearly dependent. After typing e5 4 4 in MATLAB, form the 5 × 4 matrix A by typing

186
A = [w1 w2 w3 w4]

Determine whether there is a nonzero solution to AR = 0 by typing

null(A)

The response from MATLAB is

ans =
-0.7559
-0.3780
0.3780
0.3780

showing that there is a nonzero solution to AR = 0 and the vectors wj are linearly dependent.
Indeed, this solution for R shows that we can solve for w1 in terms of w2 , w3, w4. We can now ask
whether or not w2, w3, w4 are linearly dependent. To answer this question form the matrix

B = [w2 w3 w4]

and type null(B) to obtain

ans =
Empty matrix: 3-by-0

showing that the only solution to BR = 0 is the zero solution R = 0. Thus, w2, w3, w4 are linearly
independent. For these particular vectors, any three of the four are linearly independent.

Hand Exercises

1. Let w be a vector in the vector space V . Show that the sets of vectors {w, 0} and {w, −w} are
linearly dependent.

2. For which values of b are the vectors (1, b) and (3, −1) linearly independent?

3. Let
u1 = (1, −1, 1) u2 = (2, 1, −2) u3 = (10, 2, −6).
Is the set {u1 , u2, u3} linearly dependent or linearly independent?

4. For which values of b are the vectors (1, b, 2b) and (2, 1, 4) linearly independent?

5. Show that the polynomials p1 (t) = 2+t, p2 (t) = 1+t2, and p3 (t) = t−t2 are linearly independent
vectors in the vector space C 1.

187
4 5
6. Show that the functions f1 (t) = sin t, f2 (t) = cos t, and f3 (t) = cos t + π3 are linearly dependent
vectors in C 1 .

7. Suppose that the three vectors u1 , u2, u3 ∈ Rn are linearly independent. Show that the set

{u1 + u2 , u2 + u3 , u3 + u1 }

is also linearly independent.

Computer Exercises

In Exercises 8 – 8, determine whether the given sets of vectors are linearly independent or linearly
dependent.

8.
v1 = (2, 1, 3, 4) v2 = (−4, 2, 3, 1) v3 = (2, 9, 21, 22)

9.
w1 = (1, 2, 3) w2 = (2, 1, 5) w3 = (−1, 2, −4) w4 = (0, 2, −1)

10.
x1 = (3, 4, 1, 2, 5) x2 = (−1, 0, 3, −2, 1) x3 = (2, 4, −3, 0, 2)

11. Perform the following experiments.

(a) Use MATLAB to choose randomly three column vectors in R3 . The MATLAB commands to
choose these vectors are:

y1 = rand(3,1)
y2 = rand(3,1)
y3 = rand(3,1)

Use the methods of this section to determine whether these vectors are linearly independent
or linearly dependent.
(b) Now perform this exercise five times and record the number of times a linearly independent
set of vectors is chosen and the number of times a linearly dependent set is chosen.
(c) Repeat the experiment in (b) — but this time randomly choose four vectors in R3 to be in
your set.

5.5 Dimension and Bases

The minimum number of vectors that span a vector space has special significance.

Definition 5.5.1. The vector space V has finite dimension if V is the span of a finite number of
vectors. If V has finite dimension, then the smallest number of vectors that span V is called the
dimension of V and is denoted by dim V .

188
For example, recall that ej is the vector in Rn whose j th component is 1 and all of whose other
components are 0. Let x = (x1 , . . ., xn) be in Rn . Then

x = x1e1 + · · · + xnen . (5.5.1)

Since every vector in Rn is a linear combination of the vectors e1 , . . ., en , it follows that Rn =


span{e1 , . . . , en}. Thus, Rn is finite dimensional. Moreover, the dimension of Rn is at most n, since
Rn is spanned by n vectors. It seems unlikely that Rn could be spanned by fewer than n vectors—
but this point needs to be proved.

An Example of a Vector Space that is Not Finite Dimensional

Next we discuss an example of a vector space that does not have finite dimension. Consider the
subspace P ⊂ C 1 consisting of polynomials of all degrees. We show that P is not the span of a finite
number of vectors and hence that P does not have finite dimension. Let p1 (t), p2(t), . . . , pk (t) be a set
of k polynomials and let d be the maximum degree of these k polynomials. Then every polynomial
in the span of p1 (t), . . . , pk(t) has degree less than or equal to d. In particular, p(t) = td+1 is a
polynomial that is not in the span of p1 (t), . . . , pk (t) and P is not spanned by finitely many vectors.

Bases and The Main Theorem

Definition 5.5.2. Let B = {w1, . . . , wk } be a set of vectors in a vector space W . The subset B is
a basis for W if B is a spanning set for W with the smallest number of elements in a spanning set
for W .

It follows that if {w1, . . . , wk} is a basis for W , then k = dim W . The main theorem about bases
is:
Theorem 5.5.3. A set of vectors B = {w1, . . ., wk } in a vector space W is a basis for W if and
only if the set B is linearly independent and spans W .

Remark: The importance of Theorem 5.5.3 is that we can show that a set of vectors is a basis by
verifying spanning and linear independence. We never have to check directly that the spanning set
has the minimum number of vectors for a spanning set.

For example, we have shown previously that the set of vectors {e1, . . . , en } in Rn is linearly
independent and spans Rn. It follows from Theorem 5.5.3 that this set is a basis, and that the
dimension of Rn is n. In particular, Rn cannot be spanned by fewer than n vectors.

The proof of Theorem 5.5.3 is given in Section 5.6.

Consequences of Theorem 5.5.3

We discuss two applications of Theorem 5.5.3. First, we use this theorem to derive a way of deter-
mining the dimension of the subspace spanned by a finite number of vectors. Second, we show that

189
the dimension of the subspace of solutions to a homogeneous system of linear equation Ax = 0 is
n − rank(A) where A is an m × n matrix.

Computing the Dimension of a Span

We show that the dimension of a span of vectors can be found using elementary row operations on
M.

Lemma 5.5.4. Let w1 , . . ., wk be k row vectors in Rn and let W = span{w1, . . . , wk } ⊂ Rn. Define
 
w1
 . 
M = . 
 . 
wk

to be the matrix whose rows are the wj s. Then

dim(W ) = rank(M ). (5.5.2)

Proof: To verify (5.5.2), observe that the span of w1, . . . , wk is unchanged by

(a) swapping wi and wj ,

(b) multiplying wi by a nonzero scalar, and

(c) adding a multiple of wi to wj .

That is, if we perform elementary row operations on M , the vector space spanned by the rows of M
does not change. So we may perform elementary row operations on M until we arrive at the matrix
E in reduced echelon form. Suppose that # = rank(M ); that is, suppose that # is the number of
nonzero rows in E. Then  
v1
 . 
 . 
 . 
 
 v! 
E=  0 ,

 
 . 
 . 
 . 
0
where the vj are the nonzero rows in the reduced echelon form matrix.

We claim that the vectors v1 , . . . , v! are linearly independent. It then follows from Theorem 5.5.3
that {v1, . . . , v! } is a basis for W and that the dimension of W is #. To verify the claim, suppose

a1v1 + · · · + a! v! = 0. (5.5.3)

We show that ai must equal 0 as follows. In the ith row, the pivot must occur in some column —
say in the j th column. It follows that the j th entry in the vector of the left hand side of (5.5.3) is

0a1 + · · · + 0ai−1 + 1ai + 0ai+1 + · · · + 0a! = ai,

190
since all entries in the j th column of E other than the pivot must be zero, as E is in reduced echelon
form.

For instance, let W = span{w1, w2, w3} in R4 where

w1 = (3, −2, 1, −1), w2 = (1, 5, 10, 12), w3 = (1, −12, −19, −25).

To compute dim W in MATLAB , type e5_5_4 to load the vectors and type

M = [w1; w2; w3]

Row reduction of the matrix M in MATLAB leads to the reduced echelon form matrix

ans =
1.0000 0 1.4706 1.1176
0 1.0000 1.7059 2.1765
0 0 0 0

indicating that the dimension of the subspace W is two, and therefore {w1, w2, w3} is not a basis of
W . Alternatively, we can use the MATLAB command rank(M) to compute the rank of M and the
dimension of the span W .

However, if we change one of the entries in w3 , for instance w3(3)=-18 then indeed the command
rank([w1;w2;w3]) gives the answer three indicating that for this choice of vectors {w1, w2, w3} is
a basis for span{w1, w2, w3}.

Solutions to Homogeneous Systems Revisited

We return to our discussions in Chapter 2 on solving linear equations. Recall that we can write all
solutions to the system of homogeneous equations Ax = 0 in terms of a few parameters, and that the
null space of A is the subspace of solutions (See Definition 5.2.1). More precisely, Proposition 5.2.5
states that the number of parameters needed is n − rank(A) where n is the number of variables in
the homogeneous system. We claim that the dimension of the null space is exactly n − rank(A).

For example, consider the reduced echelon form 3 × 7 matrix


 
1 −4 0 2 −3 0 8
 
A= 0 0 1 3 2 0 4  (5.5.4)
0 0 0 0 0 1 2
that has rank three. Suppose that the unknowns for this system of equations are x1, . . . , x7. We can
solve the equations associated with A by solving the first equation for x1 , the second equation for
x3, and the third equation for x6, as follows:

x1 = 4x2 − 2x4 + 3x5 − 8x7


x3 = −3x4 − 2x5 − 4x7
x6 = −2x7

191
Thus, all solutions to this system of equations have the form
         
4x2 − 2x4 + 3x5 − 8x7 4 −2 3 −8
 x2   1   0   0   0 
         
         
 −3x4 − 2x5 − 4x7   0   −3   −2   −4 
         
 x4  = x2  0  + x4  1  + x5  0  + x7  0  (5.5.5)
         
         
 x5   0   0   1   0 
         
 −2x7   0   0   0   −2 
x7 0 0 0 1
We can rewrite the right hand side of (5.5.5) as a linear combination of four vectors w2 , w4, w5, w7

x2w2 + x4 w4 + x5w5 + x7w7 . (5.5.6)

This calculation shows that the null space of A, which is W = {x ∈ R7 : Ax = 0}, is spanned
by the four vectors w2, w4, w5, w7. Moreover, this same calculation shows that the four vectors are
linearly independent. From the left hand side of (5.5.5) we see that if this linear combination sums
to zero, then x2 = x4 = x5 = x7 = 0. It follows from Theorem 5.5.3 that dim W = 4.
Definition 5.5.5. The nullity of A is the dimension of the null space of A.
Theorem 5.5.6. Let A be an m × n matrix. Then

nullity(A) + rank(A) = n.

Proof: Neither the rank nor the null space of A are changed by elementary row operations. So we
can assume that A is in reduced echelon form. The rank of A is the number of nonzero rows in the
reduced echelon form matrix. Proposition 5.2.5 states that the null space is spanned by p vectors
where p = n − rank(A). We must show that these vectors are linearly independent.

Let j1 , . . ., jp be the columns of A that do not contain pivots. In example (5.5.4) p = 4 and

j1 = 2, j2 = 4, j3 = 5, j4 = 7.

After solving for the variables corresponding to pivots, we find that the spanning set of the null
space consists of p vectors in Rn, which we label as {wj1 , . . ., wjp }. See (5.5.5). Note that the jm th
entry of wjm is 1 while the jm th entry in all of the other p − 1 vectors is 0. Again, see (5.5.5) as
an example that supports this statement. It follows that the set of spanning vectors is a linearly
independent set. That is, suppose that

r1 wj1 + · · · + rp wjp = 0.

From the jm th entry in this equation, it follows that rm = 0; and the vectors are linearly independent.

Theorem 5.5.6 has an interesting and useful interpretation. We have seen in the previous sub-
section that the rank of a matrix A is just the number of linearly independent rows in A. In linear
systems each row of the coefficient matrix corresponds to a linear equation. Thus, the rank of A
may be thought of as the number of independent equations in a system of linear equations. This
theorem just states that the space of solutions loses a dimension for each independent equation.

192
Hand Exercises

1. Show that U = {u1, u2, u3} where

u1 = (1, 1, 0) u2 = (0, 1, 0) u3 = (−1, 0, 1)

is a basis for R3.

2. Let S = span{v1, v2, v3 } where

v1 = (1, 0, −1, 0) v2 = (0, 1, 1, 1) v3 = (5, 4, −1, 4).

Find the dimension of S and find a basis for S.

3. Find a basis for the null space of


 
1 0 −1 2
 
A =  1 −1 0 0 .
4 −5 1 −2

What is the dimension of the null space of A?

4. Show that the set V of all 2 × 2 matrices is a vector space. Show that the dimension of V is four
by finding a basis of V with four elements. Show that the space M (m, n) of all m × n matrices is
also a vector space. What is dim M (m, n)?

5. Show that the set Pn of all polynomials of degree less than or equal to n is a subspace of C 1.
What is dim P2 ? What is dim Pn ?

6. Let P3 be the vector space of polynomials of degree at most three in one variable t. Let p(t) =
t3 + a2 t2 + a1 t + a0 where a0 , a1, a2 ∈ R are fixed constants. Show that
6 7
dp d2p d3 p
p, , 2 , 3
dt dt dt
is a basis for P3.

7. Let u ∈ Rn be a nonzero row vector.

(a) Show that the n × n matrix A = utu is symmetric and that rank(A) = 1. Hint: Begin by
showing that Avt = 0 for every vector v ∈ Rn that is perpendicular to u and that Aut is a
nonzero multiple of ut .
(b) Show that the matrix P = In + utu is invertible. Hint: Show that rank(P ) = n.

5.6 The Proof of the Main Theorem

We begin the proof of Theorem 5.5.3 with two lemmas on linearly independent and spanning sets.

Lemma 5.6.1. Let {w1, . . ., wk } be a set of vectors in a vector space V and let W be the subspace
spanned by these vectors. Then there is a linearly independent subset of {w1, . . . , wk} that also spans
W.

193
Proof: If {w1, . . . , wk } is linearly independent, then the lemma is proved. If not, then the set
{w1, . . . , wk} is linearly dependent. If this set is linearly dependent, then at least one of the vectors
is a linear combination of the others. By renumbering if necessary, we can assume that wk is a linear
combination of w1, . . . , wk−1; that is,

wk = a1w1 + · · · + ak−1wk−1.

Now suppose that w ∈ W . Then


w = b1w1 + · · · + bk wk .
It follows that
w = (b1 + bk a1)w1 + · · · + (bk−1 + bk ak−1)wk−1,
and that W = span{w1, . . ., wk−1}. If the vectors w1, . . . , wk−1 are linearly independent, then the
proof of the lemma is complete. If not, continue inductively until a linearly independent subset of
the wj that also spans W is found.

The important point in proving that linear independence together with spanning imply that we
have a basis is discussed in the next lemma.

Lemma 5.6.2. Let W be an m-dimensional vector space and let k > m be an integer. Then any
set of k vectors in W is linearly dependent.

Proof: Since the dimension of W is m we know that this vector space can be written as
W = span{v1 , . . ., vm }. Moreover, Lemma 5.6.1 implies that the vectors v1 , . . . , vm are linearly
independent. Suppose that {w1, . . . , wk } is another set of vectors where k > m. We have to show
that the vectors w1, . . . , wk are linearly dependent; that is, we must show that there exist scalars
r1, . . . , rk not all of which are zero that satisfy

r1w1 + · · · + rk wk = 0. (5.6.1)

We find these scalars by solving a system of linear equations, as we now show.

The fact that W is spanned by the vectors vj implies that

w1 = a11v1 + · · · + am1 vm
w2 = a12v1 + · · · + am2 vm
..
.
wk = a1k v1 + · · · + amk vm .

It follows that r1 w1 + · · · + rk wk equals

r1(a11v1 + · · · + am1 vm ) +
r2(a12v1 + · · · + am2 vm ) + · · ·+
rk (a1k v1 + · · · + amk vm )

Rearranging terms leads to the expression:

(a11r1 + · · · + a1k rk )v1 +


(a21r1 + · · · + a2k rk )v2 +···+ (5.6.2)
(am1 r1 + · · · + amk rk )vm .

194
Thus, (5.6.1) is valid if and only if (5.6.2) sums to zero. Since the set {v1, . . . , vm } is linearly
independent, (5.6.2) can equal zero if and only if

a11r1 + · · · + a1k rk = 0
a21r1 + · · · + a2k rk = 0
..
.
am1 r1 + · · · + amk rk = 0.

Since m < k, Chapter 2, Theorem 2.4.6 implies that this system of homogeneous linear equations
always has a nonzero solution r = (r1 , . . . , rk) — from which it follows that the wi are linearly
dependent.

Corollary 5.6.3. Let V be a vector space of dimension n and let {u1, . . . , uk} be a linearly inde-
pendent set of vectors in V . Then k ≤ n.

Proof: If k > n then Lemma 5.6.2 implies that {u1, . . . , uk} is linearly dependent. Since we have
assumed that this set is linearly independent, it follows that k ≤ n.

Proof of Theorem 5.5.3: Suppose that B = {w1, . . . , wk} is a basis for W . By definition,
B spans W and k = dim W . We must show that B is linearly independent. Suppose B is linearly
dependent, then Lemma 5.6.1 implies that there is a proper subset of B that spans W (and is linearly
independent). This contradicts the fact that as a basis B has the smallest number of elements of
any spanning set for W .

Suppose that B = {w1 , . . ., wk } both spans W and is linearly independent. Linear independence
and Corollary 5.6.3 imply that k ≤ dim W . Since, by definition, any spanning set of W has at least
dim W vectors, it follows that k ≥ dim W . Thus, k = dim W and B is a basis.

Extending Linearly Independent Sets to Bases

Lemma 5.6.1 leads to one approach to finding bases. Suppose that the subspace W is spanned by a
finite set of vectors {w1, . . . , wk}. Then, we can throw out vectors one by one until we arrive at a
linearly independent subset of the wj . This subset is a basis for W .

We now discuss a second approach to finding a basis for a nonzero subspace W of a finite
dimensional vector space V .

Lemma 5.6.4. Let {u1, . . . , uk} be a linearly independent set of vectors in a vector space V and
assume that
uk+1 #∈ span{u1, . . . , uk }.
Then {u1, . . . , uk+1} is also a linearly independent set.

Proof: Let r1, . . . , rk+1 be scalars such that

r1u1 + · · · + rk+1uk+1 = 0. (5.6.3)

195
To prove independence, we need to show that all rj = 0. Suppose rk+1 #= 0. Then we can solve
(5.6.3) for
1
uk+1 = − (r1 u1 + · · · + rk uk ),
rk+1
which implies that uk+1 ∈ span{u1, . . . , uk}. This contradicts the choice of uk+1. So rk+1 = 0 and

r1u1 + · · · + rk uk = 0.

Since {u1 , . . ., uk } is linearly independent, it follows that r1 = · · · = rk = 0.

The second method for constructing a basis is:

• Choose a nonzero vector w1 in W .


• If W is not spanned by w1, then choose a vector w2 that is not on the line spanned by w1.
• If W #= span{w1, w2}, then choose a vector w3 #∈ span{w1, w2}.
• If W #= span{w1, w2, w3}, then choose a vector w4 #∈ span{w1, w2, w3}.
• Continue until a spanning set for W is found. This set is a basis for W .

We now justify this approach to finding bases for subspaces. Suppose that W is a subspace of
a finite dimensional vector space V . For example, suppose that W ⊂ Rn. Then our approach to
finding a basis of W is as follows. Choose a nonzero vector w1 ∈ W . If W = span{w1}, then we
are done. If not, choose a vector w2 ∈ W – span{w1}. It follows from Lemma 5.6.4 that {w1, w2} is
linearly independent. If W = span{w1, w2}, then Theorem 5.5.3 implies that {w1 , w2} is a basis for
W , dim W = 2, and we are done. If not, choose w3 ∈ W – span{w1, w2} and {w1, w2, w3} is linearly
independent. The finite dimension of V implies that continuing inductively must lead to a spanning
set of linear independent vectors for W — which by Theorem 5.5.3 is a basis. This discussion proves:
Corollary 5.6.5. Every linearly independent subset of a finite dimensional vector space V can be
extended to a basis of V .

Further consequences of Theorem 5.5.3

We summarize here several important facts about dimensions.


Corollary 5.6.6. Let W be a subspace of a finite dimensional vector space V .

(a) Suppose that W is a proper subspace. Then dim W < dim V .


(b) Suppose that dim W = dim V . Then W = V .

Proof: (a) Let dim W = k and let {w1, . . . , wk} be a basis for W . Since W is a proper subspace
of V , there is a vector w ∈ V – W . It follows from Lemma 5.6.4 that {w1, . . . , wk, w} is a linearly
independent set. Therefore, Corollary 5.6.3 implies that k + 1 ≤ n.

(b) Let {w1 , . . ., wk } be a basis for W . Theorem 5.5.3 implies that this set is linearly independent.
If {w1 , . . ., wk } does not span V , then it can be extended to a basis as above. But then dim V >
dim W , which is a contradiction.

196
Corollary 5.6.7. Let B = {w1, . . . , wn} be a set of n vectors in an n-dimensional vector space V .
Then the following are equivalent:

(a) B is a spanning set of V ,


(b) B is a basis for V , and
(c) B is a linearly independent set.

Proof: By definition, (a) implies (b) since a basis is a spanning set with the number of vectors
equal to the dimension of the space. Theorem 5.5.3 states that a basis is a linearly independent set;
so (b) implies (c). If B is a linearly independent set of n vectors, then it spans a subspace W of
dimension n. It follows from Corollary 5.6.6(b) that W = V and that (c) implies (a).

Subspaces of R3

We can now classify all subspaces of R3. They are: the origin, lines through the origin, planes
through the origin, and R3 . All of these sets were shown to be subspaces in Example 5.1.5(a–c).

To verify that these sets are the only subspaces of R3, note that Theorem 5.5.3 implies that
proper subspaces of R3 have dimension equal either to one or two. (The zero dimensional subspace
is the origin and the only three dimensional subspace is R3 itself.) One dimensional subspaces of R3
are spanned by one nonzero vector and are just lines through the origin. See Example 5.1.5(b). We
claim that all two dimensional subspaces are planes through the origin.

Suppose that W ⊂ R3 is a subspace spanned by two non-collinear vectors w1 and w2 . We show


that W is a plane through the origin using results in Chapter 2. Observe that there is a vector
N = (N1 , N2 , N3) perpendicular to w1 = (a11, a12, a13) and w2 = (a21, a22, a23). Such a vector N
satisfies the two linear equations:

w1 · N = a11N1 + a12 N2 + a13N3 = 0


w2 · N = a21N1 + a22 N2 + a23N3 = 0.

Chapter 2, Theorem 2.4.6 implies that a system of two linear equations in three unknowns has a
nonzero solution. Let P be the plane perpendicular to N that contains the origin. We show that
W = P and hence that the claim is valid.

The choice of N shows that the vectors w1 and w2 are both in P . In fact, since P is a subspace
it contains every vector in span{w1, w2}. Thus W ⊂ P . If P contains just one additional vector
w3 ∈ R3 that is not in W , then the span of w1 , w2, w3 is three dimensional and P = W = R3 .

Hand Exercises

In Exercises 1 – 3 you are given a pair of vectors v1 , v2 spanning a subspace of R3. Decide whether
that subspace is a line or a plane through the origin. If it is a plane, then compute a vector N that
is perpendicular to that plane.

197
1. v1 = (2, 1, 2) and v2 = (0, −1, 1).

2. v1 = (2, 1, −1) and v2 = (−4, −2, 2).

3. v1 = (0, 1, 0) and v2 = (4, 1, 0).

4. The pairs of vectors


v1 = (−1, 1, 0) and v2 = (1, 0, 1)

span a plane P in R3 . The pairs of vectors

w1 = (0, 1, 0) and w2 = (1, 1, 0)

span a plane Q in R3 . Show that P and Q are different and compute the subspace of R3 that is
given by the intersection P ∩ Q.

5. Let A be a 7 × 5 matrix with rank(A) = r.

(a) What is the largest value that r can have?

(b) Give a condition equivalent to the system of equations Ax = b having a solution.

(c) What is the dimension of the null space of A?

(d) If there is a solution to Ax = b, then how many parameters are needed to describe the set of
all solutions?

6. Let  
1 3 −1 4
 
A= 2 1 5 7 .
3 4 4 11

(a) Find a basis for the subspace C ⊂ R3 spanned by the columns of A.

(b) Find a basis for the subspace R ⊂ R4 spanned by the rows of A.

(c) What is the relationship between dim C and dim R?

7. Show that the vectors


v1 = (2, 3, 1) and v2 = (1, 1, 3)

are linearly independent. Show that the span of v1 and v2 forms a plane in R3 by showing that every
linear combination is the solution to a single linear equation. Use this equation to determine the
normal vector N to this plane. Verify Lemma 5.6.4 by verifying directly that v1 , v2, N are linearly
independent vectors.

8. Let W be an infinite dimensional subspace of the vector space V . Show that V is infinite
dimensional.

Computer Exercises

198
9. Consider the following set of vectors

w1 = (2, −2, 1), w2 = (−1, 2, 0), w3 = (3, −2, λ), w4 = (−5, 6, −2),

where λ is a real number.

(a) Find a value for λ such that the dimension of span{w1, w2, w3, w4} is three. Then decide
whether {w1, w2, w3} or {w1, w2, w4} is a basis for R3 .

(b) Find a value for λ such that the dimension of span{w1 , w2, w3, w4} is two.

10. Find a basis for R5 as follows. Randomly choose vectors x1, x2 ∈ R5 by typing x1 = rand(5,1)
and x2 = rand(5,1). Check that these vectors are linearly independent. If not, choose another pair
of vectors until you find a linearly independent set. Next choose a vector x3 at random and check
that x1 , x2, x3 are linearly independent. If not, randomly choose another vector for x3. Continue
until you have five linearly independent vectors — which by a dimension count must be a basis and
span R5 . Verify this comment by using MATLAB to write the vector
 
2
 
 1 
 
 3 
 
 
 −2 
4

as a linear combination of x1, . . . , x5.

11. Find a basis for the subspace of R5 spanned by

u1 = (1, 1, 0, 0, 1)
u2 = (0, 2, 0, 1, −1)
u3 = (0, −1, 1, 0, 2)
u4 = (1, 4, 1, 2, 1)
u5 = (0, 0, 2, 1, 3).

199
Chapter 6

Linear Maps and Changes of


Coordinates

The first section in this chapter, Section 6.1, defines linear mappings between abstract
vector spaces, shows how such mappings are determined by their values on a basis, and
derives basic properties of invertible linear mappings.

The notions of row rank and column rank of a matrix are discussed in Section 6.2 along
with the theorem that states that these numbers are equal to the rank of that matrix.

Section 6.3 discusses the underlying meaning of similarity — the different ways to view
the same linear mapping on Rn in different coordinates systems or bases. This discussion
makes sense only after the definitions of coordinates corresponding to bases and of changes
in coordinates are given and justified. In Section 6.4, we discuss the matrix associated to a
linearity transformation between two finite dimensional vector spaces in a given set of coor-
dinates and show that changes in coordinates correspond to similarity of the corresponding
matrices.

6.1 Linear Mappings and Bases

The examples of linear mappings from Rn → Rm that we introduced in Section 3.3 were
matrix mappings. More precisely, let A be an m × n matrix. Then

LA (x) = Ax

defines the linear mapping LA : Rn → Rm . Recall that Aej is the j th column of A (see Chap-
ter 3, Lemma 3.3.4); it follows that A can be reconstructed from the vectors Ae1 , . . ., Aen .

200
This remark implies (Chapter 3, Lemma 3.3.3) that linear mappings of Rn to Rm are de-
termined by their values on the standard basis e1 , . . ., en . Next we show that this result
is valid in greater generality. We begin by defining what we mean for a mapping between
vector spaces to be linear.
Definition 6.1.1. Let V and W be vector spaces and let L : V → W be a mapping. The
map L is linear if

L(u + v) = L(u) + L(v)


L(cv) = cL(v)

for all u, v ∈ V and c ∈ R.

Examples of Linear Mappings

(a) Let v ∈ Rn be a fixed vector. Use the dot product to define the mapping L : Rn → R
by
L(x) = x · v.
Then L is linear. Just check that

L(x + y) = (x + y) · v = x · v + y · v = L(x) + L(y)

for every vector x and y in Rn and

L(cx) = (cx) · v = c(x · v) = cL(x)

for every scalar c ∈ R.

(b) The map L : C 1 → R defined by

L(f ) = f !(2)

is linear. Indeed,

L(f + g) = (f + g)!(2) = f !(2) + g !(2) = L(f ) + L(g).

Similarly, L(cf ) = cL(f ).

(c) The map L : C 1 → C 1 defined by

L(f )(t) = f (t − 1)

is linear. Indeed,

L(f + g)(t) = (f + g)(t − 1) = f (t − 1) + g(t − 1) = L(f )(t) + L(g)(t).

Similarly, L(cf ) = cL(f ). It may be helpful to compute L(f )(t) when f (t) = t2 − t + 1.
That is,

L(f )(t) = (t − 1)2 − (t − 1) + 1 = t2 − 2t + 1 − t + 1 + 1 = t2 − 3t + 3.

201
Constructing Linear Mappings from Bases

Theorem 6.1.2. Let V and W be vector spaces. Let {v1 , . . . , vn} be a basis for V and let
{w1, . . . , wn} be n vectors in W . Then there exists a unique linear map L : V → W such
that L(vi ) = wi .

Proof: Let v ∈ V be a vector. Since span{v1, . . . , vn } = V , we may write v as

v = α1 v1 + · · · + αn vn ,

where α1, . . . , αn in R. Moreover, v1, . . . , vn are linearly independent, these scalars are
uniquely defined. More precisely, if

α1 v1 + · · · + αn vn = β1 v1 + · · · + βn vn ,

then
(α1 − β1 )v1 + · · · + (αn − βn )vn = 0.

Linear independence implies that αj − βj = 0; that is αj = βj . We can now define

L(v) = α1 w1 + · · · + αn wn . (6.1.1)

We claim that L is linear. Let v̂ ∈ V be another vector and let

v̂ = β1v1 + · · · + βn vn .

It follows that
v + v̂ = (α1 + β1)v1 + · · · + (αn + βn )vn ,

and hence by (6.1.1) that

L(v + v̂) = (α1 + β1 )w1 + · · · + (αn + βn )wn


= (α1 w1 + · · · + αn wn ) + (β1w1 + · · · + βn wn )
= L(v) + L(v̂).

Similarly

L(cv) = L((cα1)v1 + · · · + (cαn )vn )


= c(α1 w1 + · · · + αn wn )
= cL(v).

Thus L is linear.

202
Let M : V → W be another linear mapping such that M (vi) = wi. Then

L(v) = L(α1 v1 + . . . + αn vn )
= α1 w1 + · · · + αn wn
= α1 M (v1 ) + · · · + αn M (vn )
= M (α1 v1 + · · · + αn vn )
= M (v).

Thus L = M and the linear mapping is uniquely defined.

There are two assertions made in Theorem 6.1.2. The first is that a linear map exists
mapping vi to wi . The second is that there is only one linear mapping that accomplishes
this task. If we drop the constraint that the map be linear, then many mappings may satisfy
these conditions. For example, find a linear map from R → R that maps 1 to 4. There is
only one: y = 4x. However there are many nonlinear maps that send 1 to 4. Examples are
y = x + 3 and y = 4x2.

Finding the Matrix of a Linear Map from Rn → Rm Given by Theorem 6.1.2

Suppose that V = Rn and W = Rm . We know that every linear map L : Rn → Rm can be


defined as multiplication by an m × n matrix. The question that we next address is: How
can we find the matrix whose existence is guaranteed by Theorem 6.1.2?

More precisely, let v1 , . . . , vn be a basis for Rn and let w1 , . . ., wn be vectors in Rm . We


suppose that all of these vectors are row vectors. Then we need to find an m × n matrix A
such that Avit = wit for all i. We find A as follows. Let v ∈ Rn be a row vector. Since the
vi form a basis, there exist scalars αi such that

v = α1 v1 + · · · + αn vn .

In coordinates  
α1
 . 
v t = (v1t | · · · |vnt )  .  (6.1.2)
 . ,
αn
where (v1t | · · · |vnt ) is an n × n invertible matrix. By definition (see (6.1.1))

L(v) = α1 w1 + · · · + αn wn .

Thus the matrix A must satisfy


 
α1
 . 
Av t = (w1t | · · · |wnt )  . 
 . ,
αn

203
where (w1t | · · · |wnt ) is an m × n matrix. Using (6.1.2) we see that

Av t = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 v t ,

and
A = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 (6.1.3)

is the desired m × n matrix.

An Example of a Linear Map from R3 to R2

As an example we illustrate Theorem 6.1.2 and (6.1.3) by defining a linear mapping from
R3 to R2 by its action on a basis. Let

v1 = (1, 4, 1) v2 = (−1, 1, 1) v3 = (0, 1, 0).

We claim that {v1 , v2, v3} is a basis of R3 and that there is a unique linear map for which
L(vi ) = wi where
w1 = (2, 0) w2 = (1, 1) w3 = (1, −1).

We can verify that {v1 , v2, v3} is a basis of R3 by showing that the matrix
 
1 −1 0
 
(v1t |v2t |v3t ) =  4 1 1 
1 1 0

is invertible. This can either be done in MATLAB using the inv command or by hand by
row reducing the matrix

 
1 −1 0 1 0 0
 
 4 1 1 0 1 0 
1 1 0 0 0 1
to obtain  
1 0 1
1 
(v1t |v2t |v3t )−1 =  −1 0 1 .
2
−3 2 −5
Now apply (6.1.3) to obtain
 
' ( 1 0 1 ' (
1 2 1 1   −1 1 −1
A=  −1 0 1 = .
2 0 1 −1 1 −1 3
−3 2 −5

As a check, verify by matrix multiplication that Avi = wi , as claimed.

204
Properties of Linear Mappings

Lemma 6.1.3. Let U, V, W be vector spaces and L : V → W and M : U → V be linear


maps. Then L◦M : U → W is linear.

Proof: The proof of Lemma 6.1.3 is identical to that of Chapter 3, Lemma 3.5.1.

A linear map L : V → W is invertible if there exists a linear map M : W → V such that


L◦M : W → W is the identity map on W and M ◦L : V → V is the identity map on V .

Theorem 6.1.4. Let V and W be finite dimensional vector spaces and let v1 , . . . , vn be a
basis for V . Let L : V → W be a linear map. Then L is invertible if and only if w1, . . . , wn
is a basis for W where wj = L(vj ).

Proof: If w1, . . . , wn is a basis for W , then use Theorem 6.1.2 to define a linear map
M : W → V by M (wj ) = vj . Note that

L◦M (wj ) = L(vj ) = wj .

It follows by linearity (using the uniqueness part of Theorem 6.1.2) that L◦M is the identity
of W . Similarly, M ◦L is the identity map on V , and L is invertible.

Conversely, suppose that L◦M and M ◦L are identity maps and that wj = L(vj ). We
must show that w1 , . . ., wn is a basis. We use Theorem 5.5.3 and verify separately that
w1, . . . , wn are linearly independent and span W .

If there exist scalars α1 , . . . , αn such that

α1 w1 + · · · + αn wn = 0,

then apply M to both sides of this equation to obtain

0 = M (α1w1 + · · · + αn wn ) = α1 v1 + · · · + αn vn .

But the vj are linearly independent. Therefore, αj = 0 and the wj are linearly independent.

To show that the wj span W , let w be a vector in W . Since the vj are a basis for V ,
there exist scalars β1, . . . , βn such that

M (w) = β1v1 + · · · + βn vn .

Applying L to both sides of this equation yields

w = L◦M (w) = β1w1 + · · · + βn wn .

Therefore, the wj span W .

205
Corollary 6.1.5. Let V and W be finite dimensional vector spaces. Then there exists an
invertible linear map L : V → W if and only if dim(V ) = dim(W ).

Proof: Suppose that L : V → W is an invertible linear map. Let v1 , . . . , vn be a basis for


V where n = dim(V ). Then Theorem 6.1.4 implies that L(v1), . . . , L(vn) is a basis for W
and dim(W ) = n = dim(V ).

Conversely, suppose that dim(V ) = dim(W ) = n. Let v1 , . . . , vn be a basis for V and


let w1, . . . , wn be a basis for W . Using Theorem 6.1.2 define the linear map L : V → W by
L(vj ) = wj . Theorem 6.1.4 states that L is invertible.

Hand Exercises

1. Use the method described above to construct a linear mapping L from R3 to R2 with L(vi ) = wi,
i = 1, 2, 3, where
v1 = (1, 0, 2) v2 = (2, −1, 1) v3 = (−2, 1, 0)
and
w1 = (−1, 0) w2 = (0, 1) w3 = (3, 1).

2. Let Pn be the vector space of polynomials p(t) of degree less than or equal to n. Show that
{1, t, t2, . . . , tn} is a basis for Pn .

3. Show that
d
: P3 → P2
dt
is a linear mapping.

4. Show that 8 t
L(p) = p(s)ds
0
is a linear mapping of P2 → P3.

5. Use Exercises 3, 4 and Theorem 6.1.2 to show that


d
◦L : P2 → P2
dt
is the identity map.

6. Let C denote the set of complex numbers. Verify that C is a two-dimensional vector space. Show
that L : C → C defined by
L(z) = λz,
where λ = σ + iτ is a linear mapping.

7. Let M(n) denote the vector space of n × n matrices and let A be an n × n matrix. Let L :
M(n) → M(n) be the mapping defined by L(X) = AX − XA where X ∈ M(n). Verify that L is
a linear mapping. Show that the null space of L, {X ∈ M : L(X) = 0}, is a subspace consisting of
all matrices that commute with A.

206
3 2π
8. Let L : C 1 → R be defined by L(f) = 0
f(t) cos(t)dt for f ∈ C 1 . Verify that L is a linear
mapping.

9. Let P be the vector space of polynomials in one variable x. Define L : P → P by L(p)(x) =


3x
0
(t − 1)p(t)dt. Verify that L is a linear mapping.

6.2 Row Rank Equals Column Rank

Let A be an m × n matrix. The row space of A is the span of the row vectors of A and is a subspace
of Rn . The column space of A is the span of the columns of A and is a subspace of Rm .

Definition 6.2.1. The row rank of A is the dimension of the row space of A and the column rank
of A is the dimension of the column space of A.

Lemma 5.5.4 of Chapter 5 states that

row rank(A) = rank(A).

We show below that row ranks and column ranks are equal. We begin by continuing the discussion
of the previous section on linear maps between vector spaces.

Null Space and Range

Each linear map between vector spaces defines two subspaces. Let V and W be vector spaces and
let L : V → W be a linear map. Then

null space(L) = {v ∈ V : L(v) = 0} ⊂ V

and
range(L) = {L(v) ∈ W : v ∈ V } ⊂ W.

Lemma 6.2.2. Let L : V → W be a linear map between vector spaces. Then the null space of L is
a subspace of V and the range of L is a subspace of W .

Proof: The proof that the null space of L is a subspace of V follows from linearity in precisely
the same way that the null space of an m × n matrix is a subspace of Rn . That is, if v1 and v2 are
in the null space of L, then

L(v1 + v2) = L(v1 ) + L(v2 ) = 0 + 0 = 0,

and for c ∈ R
L(cv1 ) = cL(v1 ) = c0 = 0.

So the null space of L is closed under addition and scalar multiplication and is a subspace of V .

207
To prove that the range of L is a subspace of W , let w1 and w2 be in the range of L. Then, by
definition, there exist v1 and v2 in V such that L(vj ) = wj . It follows that

L(v1 + v2 ) = L(v1 ) + L(v2 ) = w1 + w2.

Therefore, w1 + w2 is in the range of L. Similarly,

L(cv1 ) = cL(v1 ) = cw1.

So the range of L is closed under addition and scalar multiplication and is a subspace of W .

Suppose that A is an m × n matrix and LA : Rn → Rm is the associated linear map. Then


the null space of LA is precisely the null space of A, as defined in Definition 5.2.1 of Chapter 5.
Moreover, the range of LA is the column space of A. To verify this, write A = (A1 | · · ·|An ) where
Aj is the j th column of A and let v = (v1, . . . vn )t . Then, LA (v) is the linear combination of columns
of A
LA (v) = Av = v1 A1 + · · · + vnAn .

There is a theorem that relates the dimensions of the null space and range with the dimension
of V .

Theorem 6.2.3. Let V and W be vector spaces with V finite dimensional and let L : V → W be a
linear map. Then
dim(V ) = dim(null space(L)) + dim(range(L)).

Proof: Since V is finite dimensional, the null space of L is finite dimensional (since the null space
is a subspace of V ) and the range of L is finite dimensional (since it is spanned by the vectors
L(vj ) where v1 , . . . , vn is a basis for V ). Let u1, . . . , uk be a basis for the null space of L and let
w1, . . . , w! be a basis for the range of L. Choose vectors yj ∈ V such that L(yj ) = wj . We claim
that u1, . . . , uk , y1, . . . , y! is a basis for V , which proves the theorem.

To verify that u1, . . . , uk , y1, . . . , y! are linear independent, suppose that

α1u1 + · · · + αk uk + β1 y1 + · · · + β! y! = 0. (6.2.1)

Apply L to both sides of (6.2.1) to obtain

β1 w1 + · · · + β! w! = 0.

Since the wj are linearly independent, it follows that βj = 0 for all j. Now (6.2.1) implies that

α1u1 + · · · + αk uk = 0.

Since the uj are linearly independent, it follows that αj = 0 for all j.

To verify that u1, . . . , uk , y1, . . . , y! span V , let v be in V . Since w1, . . . , w! span W , it follows
that there exist scalars βj such that

L(v) = β1 w1 + · · · + β! w! .

208
Note that by choice of the yj

L(β1 y1 + · · · + β! y! ) = β1 w1 + · · · + β! w!.

It follows by linearity that


u = v − (β1 y1 + · · · + β! y! )
is in the null space of L. Hence there exist scalars αj such that

u = α 1 u1 + · · · + α k uk .

Thus, v is in the span of u1 , . . ., uk , y1, . . . , y! , as desired.

Row Rank and Column Rank

Recall Theorem 5.5.6 of Chapter 5 that states that the nullity plus the rank of an m × n matrix
equals n. At first glance it might seem that this theorem and Theorem 6.2.3 contain the same
information, but they do not. Theorem 5.5.6 of Chapter 5 is proved using a detailed analysis of
solutions of linear equations based on Gaussian elimination, back substitution, and reduced echelon
form, while Theorem 6.2.3 is proved using abstract properties of linear maps.

Let A be an m × n matrix. Theorem 5.5.6 of Chapter 5 states that

nullity(A) + rank(A) = n.

Meanwhile, Theorem 6.2.3 states that

dim(null space(LA )) + dim(range(LA )) = n.

But the dimension of the null space of LA equals the nullity of A and the dimension of the range of
A equals the dimension of the column space of A. Therefore,

nullity(A) + dim(column space(A)) = n.

Hence, the rank of A equals the column rank of A. Since rank and row rank are identical, we have
proved:

Theorem 6.2.4. Let A be an m × n matrix. Then

row rank A = column rank A.

Since the row rank of A equals the column rank of At , we have:

Corollary 6.2.5. Let A be an m × n matrix. Then

rank(A) = rank(At).

Hand Exercises

209
1. The 3 × 3 matrix  
1 2 5
 
A= 2 −1 1 
3 1 6
has rank two. Let r1, r2, r3 be the rows of A and c1 , c2, c3 be the columns of A. Find scalars αj and
βj such that

α1r1 + α2r2 + α3r3 = 0


β1 c1 + β2 c2 + β3 c3 = 0.

2. What is the largest row rank that a 5 × 3 matrix can have?

3. Let  
1 1 0 1
 
A =  0 −1 1 2 .
1 2 −1 3

(a) Find a basis for the row space of A and the row rank of A.
(b) Find a basis for the column space of A and the column rank of A.
(c) Find a basis for the null space of A and the nullity of A.
(d) Find a basis for the null space of At and the nullity of At .

4. Let A be a nonzero 3 × 3 matrix such that A2 = 0. Show that rank(A) = 1.

5. Let B be an m × p matrix and let C be a p × n matrix. Prove that the rank of the m × n matrix
A = BC satisfies
rank(A) ≤ min{rank(B), rank(C)}.

Computer Exercises

6. Let  
1 1 2 2
 0 −1 3 1 
 
A= .
 2 −1 1 0 
−1 0 7 4

(a) Compute rank(A) and exhibit a basis for the row space of A.
(b) Find a basis for the column space of A.
(c) Find all solutions to the homogeneous equation Ax = 0.
(d) Does
 
4
 2 
 
Ax =  
 2 
1
have a solution?

210
6.3 Vectors and Matrices in Coordinates

In the last half of this chapter we discuss how similarity of matrices should be thought of as change
of coordinates for linear mappings. There are three steps in this discussion.

1. Formalize the idea of coordinates for a vector in terms of basis.


2. Discuss how to write a linear map as a matrix in each coordinate system.
3. Determine how the matrices corresponding to the same linear map in two different coordinate
systems are related.

The answer to the last question is simple: the matrices are related by a change of coordinates if and
only if they are similar. We discuss these steps in this section in Rn and in Section 6.4 for general
vector spaces.

Coordinates of Vectors using Bases

Throughout, we have written vectors v ∈ Rn in coordinates as v = (v1 , . . . , vn), and we have used
this notation almost without comment. From the point of view of vector space operations, we are
just writing
v = v1 e1 + · · · + vn en
as a linear combination of the standard basis E = {e1 , . . . , en} of Rn .

More generally, each basis provides a set of coordinates for a vector space. This fact is described
by the following lemma (although its proof is identical to the first part of the proof of Theorem 6.1.2
in Chapter 5).
Lemma 6.3.1. Let W = {w1, . . . , wn} be a basis for the vector space V . Then each vector v in V
can be written uniquely as a linear combination of vectors in W; that is,

v = α1w1 + · · · + αnwn,
for uniquely defined scalars α1, . . . , αn.

Proof: Since W is a basis, Theorem 5.5.3 of Chapter 5 implies that the vectors w1 , . . ., wn span
V and are linearly independent. Therefore, we can write v in V as a linear combination of vectors
in B. That is, there are scalars α1, . . . , αn such that

v = α1w1 + · · · + αnwn.

Next we show that these scalars are uniquely defined. Suppose that we can write v as a linear
combination of the vectors in B in a second way; that is, suppose

v = β1 w1 + · · · + βn wn

for scalars β1 , . . . , βn. Then

(α1 − β1 )w1 + · · · + (αn − βn )wn = 0.

Since the vectors in W are linearly independent, it follows that αj = βj for all j.

211
Definition 6.3.2. Let W = {w1, . . . , wn} be a basis in a vector space V . Lemma 6.3.1 states that
we can write v ∈ V uniquely as
v = α1w1 + · · · + αnwn. (6.3.1)

The scalars α1, . . . , αn are the coordinates of v relative to the basis W, and we denote the coordinates
of v in the basis W by
[v]W = (α1, . . . , αn) ∈ Rn . (6.3.2)

We call the coordinates of a vector v ∈ Rn relative to the standard basis, the standard coordinates
of v.

Writing Linear Maps in Coordinates as Matrices

Let V be a finite dimensional vector space of dimension n and let L : V → V be a linear mapping.
We now show how each basis of V allows us to associate an n × n matrix to L. Previously we
considered this question with the standard basis on V = Rn . We showed in Chapter 3 that we can
write the linear mapping L as a matrix mapping, as follows. Let E = {e1 , . . . , en} be the standard
basis in Rn. Let A be the n × n matrix whose j th column is the n vector L(ej ). Then Chapter 3,
Theorem 3.3.5 shows that the linear map is given by matrix multiplication as

L(v) = Av.

Thus every linear mapping on Rn can be written in this matrix form.

Remark 6.3.3. Another way to think of the j th column of the matrix A is as the coordinate vector
of L(ej ) relative to the standard basis, that is, as [L(ej )]E . We denote the matrix A by [L]E ; this
notation emphasizes the fact that A is the matrix of L relative to the standard basis.

We now discuss how to write a linear map L as a matrix using different coordinates.

Definition 6.3.4. Let W = {w1, . . . , wn} be a basis for the vector space V . The n × n matrix [L]W
associated to the linear map L : V → V and the basis W is defined as follows. The j th column of
[L]W is [L(wj )]W — the coordinates of L(wj ) relative to the basis W.

Note that when V = Rn and when W = E, the standard basis of Rn, then the definition of the
matrix [L]E is exactly the same as the matrix associated with the linear map L in Remark 6.3.3.

Lemma 6.3.5. The coordinate vector of L(v) relative to the basis W is

[L(v)]W = [L]W [v]W . (6.3.3)

Proof: The process of choosing the coordinates of vectors relative to a given basis W = {w1, . . . , wn}
of a vector space V is itself linear. Indeed,

[u + v]W = [u]W + [v]W


[cv]W = c[v]W .

212
Thus the coordinate mapping relative to a basis W of V defined by

v )→ [v]W (6.3.4)

is a linear mapping of V into Rn . We denote this linear mapping by [·]W : V → Rn .

It now follows that both the left hand and right hand sides of (6.3.3) can be thought of as linear
mappings of V → Rn . In verifying this comment, we recall Lemma 6.1.3 of Chapter 5 that states
that the composition of linear maps is linear. On the left hand side we have the mapping

v )→ L(v) )→ [L(v)]W ,

which is the composition of the linear maps: [·]W with L. See (6.3.4). The right hand side is

v )→ [v]W )→ [L]W [v]W ,

which is the composition of the linear maps: multiplication by the matrix [L]W with [·]W .

Theorem 6.1.2 of Chapter 5 states that linear mappings are determined by their actions on a
basis. Thus to verify (6.3.3), we need only verify this equality for v = wj for all j. Since [wj ]W = ej ,
the right hand side of (6.3.3) is:
[L]W [wj ]W = [L]W ej ,

which is just the j th column of [L]W . The left hand side of (6.3.3) is the vector [L(wj )]W , which by
definition is also the j th column of [L]W (see Definition 6.3.4).

Computations of Vectors in Coordinates in Rn

We divide this subsection into three parts. We consider a simple example in R2 algebraically in the
first part and geometrically in the second. In the third part we formalize and extend the algebraic
discussion to Rn .

An Example of Coordinates in R2

How do we find the coordinates of a vector v in a basis? For example, choose a (nonstandard) basis
in the plane — say
w1 = (1, 1) and w2 = (1, −2).

Since {w1, w2} is a basis, we may write the vector v as a linear combination of the vectors w1 and
w2. Thus we can find scalars α1 and α2 so that

v = α1w1 + α2w2 = α1(1, 1) + α2(1, −2) = (α1 + α2, α1 − 2α2).

In standard coordinates, set v = (v1 , v2); this equation leads to the system of linear equations

v1 = α1 + α2
v2 = α1 − 2α2

213
in the two variables α1 and α2. As we have seen, the fact that w1 and w2 form a basis of R2 implies
that these equations do have a solution. Indeed, we can write this system in matrix form as
' (
1 1
(v1 , v2 ) = (α1, α2 ) ,
1 −2
which is solved by inverting the matrix to obtain:
' (
1 2 1
(α1, α2) = (v1, v2) . (6.3.5)
3 1 −1

For example, suppose v = (2.0, 0.5). Using (6.3.5) we find that (α1 , α2) = (1.5, 0.5); that is, we can
write
v = 1.5w1 + 0.5w2,
and (1.5, 0.5) are the coordinates of v in the basis {w1, w2}.

Using the notation in (6.3.2), we may rewrite (6.3.5) as


' (
1 2 1
[v]W = [v]E ,
3 1 −1

where E = {e1 , e2} is the standard basis.

Planar Coordinates Viewed Geometrically using MATLAB

Next we use MATLAB to view geometrically the notion of coordinates relative to a basis W =
{w1, w2} in the plane. Type

w1 = [1 1];
w2 = [1 -2];
bcoord

MATLAB will create a graphics window showing the two basis vectors w1 and w2 in red. Using the
mouse click on a point near (2, 0.5) in that figure. MATLAB will respond by plotting the new vector
v in yellow and the parallelogram generated by α1 w1 and α2w2 in cyan. The values of α1 and α2
are also plotted on this figure. See Figure 6.1.

Abstracting R2 to Rn

Suppose that we are given a basis W = {w1, . . . , wn} of Rn and a vector v ∈ Rn . How do we find
the coordinates [v]W of v in the basis W?

For definiteness, assume that v and the wj are row vectors. Equation (6.3.1) may be rewritten
as  
α1
 . 
vt = (w1t | · · · |wnt )  . 
 . .
αn

214
Coordinates in the {w1,w2} basis

1.5

1 w1
1.499
0.5 v

−0.5 0.5075

−1

−1.5

−2 w2

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

Figure 6.1: The coordinates of v = (2.0, 0.5) in the basis w1 = (1, 1), w2 = (1, −2).

Thus,  
α1
 . 
[v]W = . 
 .  = PW v , (6.3.6)
−1 t

αn
where PW = (w1t | · · · |wnt ). Since the wj are a basis for Rn, the columns of the matrix PW are linearly
independent, and PW is invertible.

We may use (6.3.6) to compute [v]W using MATLAB. For example, let

v = (4, 1, 3)

and
w1 = (1, 4, 7) w2 = (2, 1, 0) w3 = (−4, 2, 1).

Then [v]W is found by typing

w1 = [ 1 4 7];
w2 = [ 2 1 0];
w3 = [-4 2 1];
inv([w1’ w2’ w3’])*[4 1 3]’

The answer is:

ans =
0.5306
0.3061
-0.7143

215
Determining the Matrix of a Linear Mapping in Coordinates

Suppose that we are given the linear map LA : Rn → Rn associated to the matrix A in standard
coordinates and a basis w1, . . . , wn of Rn . How do we find the matrix [LA ]W . As above, we assume
that the vectors wj and the vector v are row vectors Since LA (v) = Avt we can rewrite (6.3.3) as

[LA ]W [v]W = [Avt ]W

As above, let PW = (w1t | · · · |wnt ). Using (6.3.6) we see that


−1 t −1
[LA ]W PW v = PW Avt .

Setting
−1 t
u = PW v
we see that
−1
[LA ]W u = PW APW u.

Therefore,
[LA]W = PW
−1
APW .

We have proved:

Theorem 6.3.6. Let A be an n × n matrix and let LA : Rn → Rn be the associated linear map. Let
W = {w1, . . . , wn} be a basis for Rn . Then the matrix [LA ]W associated to to LA in the basis W is
similar to A. Therefore the determinant, trace, and eigenvalues of [LA]W are identical to those of
A.

Matrix Normal Forms in R2

If we are careful about how we choose the basis W, then we can simplify the form of the matrix
[L]W . Indeed, we have already seen examples of this process when we discussed how to find closed
form solutions to linear planar systems of ODEs in the previous chapter. For example, suppose that
L : R2 → R2 has real eigenvalues λ1 and λ2 with two linearly independent eigenvectors w1 and w2.
Then the matrix associated to L in the basis W = {w1, w2} is the diagonal matrix
' (
λ1 0
[L]W = , (6.3.7)
0 λ2

since
[L(w1 )]W = [λ1w1 ]W = (λ1 , 0) and [L(w2)]W = [λ2w2 ]W = (0, λ2 ) .

In Chapter ?? we showed how to classify 2 × 2 matrices up to similarity (see Chapter ??,


Theorem ??) and how to use this classification to find closed form solutions to planar systems of
linear ODEs (see Section ??). We now use the ideas of coordinates and matrices associated with
bases to reinterpret the normal form result (Chapter ??, Theorem ??) in a more geometric fashion.

Theorem 6.3.7. Let L : R2 → R2 be a linear mapping. Then in an appropriate coordinate system


defined by the basis W below, the matrix LW has one of the following forms.

216
(a) Suppose that L has two linearly independent real eigenvectors w1 and w2 with real eigenvalues
λ1 and λ2 . Then ' (
λ1 0
[L]W = .
0 λ2

(b) Suppose that L has no real eigenvectors and complex conjugate eigenvalues σ ± iτ where
τ #= 0. Let w1 + iw2 be a complex eigenvector of L associated with the eigenvalue σ − iτ . Then
W = {w1, w2} is a basis and ' (
σ −τ
[L]W = .
τ σ

(c) Suppose that L has exactly one linearly independent real eigenvector w1 with real eigenvalue
λ. Choose the generalized eigenvector w2

(L − λI2 )(w2 ) = w1. (6.3.8)

Then W = {w1, w2} is a basis and


' (
λ 1
[L]W = .
0 λ

Proof: The verification of (a) was discussed in (6.3.7). The verification of (b) follows from
Chapter ??, (??) on equating w1 with v and w2 with w. The verification of (c) follows directly from
(6.3.8) as
[L(w1)]W = λe1 and [L(w2)]W = e1 + λe2 .

Hand Exercises

1. Let
w1 = (1, 4) and w2 = (−2, 1).

Find the coordinates of v = (−1, 32) in the W basis.

2. Let w1 = (1, 2) and w2 = (0, 1) be a basis for R2 . Let LA : R2 → R2 be the linear map given by
the matrix ' (
2 1
A=
−1 0
in standard coordinates. Find the matrix [L]W .

3. Let Eij be the 2 × 3 matrix whose entry in the ith row and j th column is 1 and all of whose other
entries are 0.

(a) Show that


V = {E11, E12, E13, E21, E22, E23}

is a basis for the vector space of 2 × 3 matrices.

217
(b) Compute [A]V where ' (
−1 0 2
A= .
3 −2 4

4. Verify that V = {p1 , p2, p3} where

p1(t) = 1 + 2t, p2 (t) = t + 2t2, and p3(t) = 2 − t2 ,

is a basis for the vector space of polynomials P2. Let p(t) = t and find [p]V .

Computer Exercises

5. Let
w1 = (1, 0, 2), w2 = (2, 1, 4), and w3 = (0, 1, −1)

be a basis for R3. Find [v]W where v = (2, 1, 5).

6. Let
w1 = (0.2, −1.3, 0.34, −1.1)
w2 = (0.5, −0.6, 0.7, 0.8)
w3 = (−1.0, 1.0, 2.0, 4.5)
w4 = (−5.1, 0.0, 1.6, −1.7)

be a basis W for R4. Find [v]W where v = (1.7, 2.3, 1.0, −5.0).

7. Find a basis W = {w1, w2} such that [LA ]W is a diagonal matrix, where LA is the linear map
associated with the matrix ' (
−10 −6
A= .
18 11

8. Let A be the 4 × 4 matrix  


2 1 4 6
 1 2 1 1 
 
A= 
 0 1 2 4 
2 1 1 5
and let W = {w1 , w2, w3, w4} where

w1 = (1, 2, 3, 4)
w2 = (0, −1, 1, 3)
w3 = (2, 0, 0, 1)
w4 = (−1, 1, 3, 0)

Verify that W is a basis of R4 and compute the matrix associated to A in the W basis.

218
6.4 Matrices of Linear Maps on a Vector Space

Returning to the general finite dimensional vector space V , suppose that

W = {w1, . . ., wn} and Z = {z1, . . . , zn}

are bases of V . Then we can write

v = α1w1 + · · · + αnwn and v = β1 z1 + · · · + βn zn

to obtain the coordinates

[v]W = (α1, . . . , αn) and [v]Z = (β1 , . . . , βn ) (6.4.1)

of v relative to the bases W and Z. The question that we address is: How are [v]W and [v]Z related?
We answer this question by finding an n × n matrix CWZ such that
   
α1 β1
 .   
 .  = CWZ  ..  . (6.4.2)
 .   . 
αn βn

We may rewrite (6.4.2) as


[v]W = CWZ [v]Z . (6.4.3)

Definition 6.4.1. Let W and Z be bases for the n-dimensional vector space V . The n × n matrix
CWZ is a transition matrix if CWZ satisfies (6.4.3).

Transition Mappings Defined

The next theorem presents a method for finding the transition matrix between coordinates associated
to bases in an n-dimensional vector space V .

Theorem 6.4.2. Let W = {w1, . . . , wn} and Z = {z1, . . . , zn} be bases for the n-dimensional vector
space V . Then  
c11 · · · c1n
 . .. .. 
CWZ =  .  (6.4.4)
 . . . 
cn1 · · · cnn
is the transition matrix, where

z1 = c11w1 + · · · + cn1wn
..
.
zn = c1nw1 + · · · + cnnwn
for scalars cij .

Proof: We can restate (6.4.5) as  


c1j
 . 
[zj ]W = . 
 . .
cnj

219
Note that
[zj ]Z = ej ,

by definition. Since the transition matrix satisfies [v]W = CWZ [v]Z for all vectors v ∈ V , it must
satisfy this relation for v = zj . Therefore,

[zj ]W = CWZ [zj ]Z = CWZ ej .

It follows that [zj ]W is the j th column of CWZ , which proves the theorem.

A Formula for CWZ when V = Rn

For bases in Rn, there is a formula for finding transition matrices. Let W = {w1, . . . , wn} and
Z = {z1 , . . . , zn} be bases of Rn — written as row vectors. Also, let v ∈ Rn be written as a row
vector. Then (6.3.6) implies that

[v]W = PW
−1 t
v and [v]Z = PZ−1vt ,

where
PW = (w1t | · · · |wnt ) and PZ = (z1t | · · ·|znt ).

It follows that
−1
[v]W = PW PZ [v]Z

and that
CWZ = PW
−1
PZ . (6.4.5)

As an example, consider the following bases of R4 . Let

w1 = [1, 4, 2, 3] z1 = [3, 2, 0, 1]
w2 = [2, 1, 1, 4] z2 = [−1, 0, 2, 3]
w3 = [0, 1, 5, 6] z3 = [3, 1, 1, 3]
w4 = [2, 5, −1, 0] z4 = [2, 2, 3, 5]

Then the matrix CWZ is obtained by typing e9 4 7 to enter the bases and

inv([w1’ w2’ w3’ w4’])*[z1’ z2’ z3’ z4’]

to compute CWZ . The answer is:

ans =
-8.0000 5.5000 -7.0000 -3.2500
-0.5000 0.7500 0.0000 0.1250
4.5000 -2.7500 4.0000 2.3750
6.0000 -4.0000 5.0000 2.5000

220
Coordinates Relative to Two Different Bases in R2

Recall the basis W


w1 = (1, 1) and w2 = (1, −2)
of R2 that was used in a previous example. Suppose that Z = {z1 , z2} is a second basis of R2 . Write
v = (v1 , v2) as a linear combination of the basis Z

v = β1 z1 + β2 z2 ,

obtaining the coordinates [v]Z = (β1 , β2 ).

We use MATLAB to illustrate how the coordinates of a vector v relative to two bases may be
viewed geometrically. Suppose that z1 = (1, 3) and z2 = (1, −2). Then enter the two bases W and
Z by typing

w1 = [1 1];
w2 = [1 -2];
z1 = [1 3];
z2 = [-1 2];
ccoord

The MATLAB program ccoord opens two graphics windows representing the W and Z planes
with the basis vectors plotted in red. Clicking the left mouse button on a vector in the W plane
simultaneously plots this vector v in both planes in yellow and the coordinates of v in the respective
bases in cyan. See Figure 6.2. From this display you can visualize the coordinates of a vector relative
to two different bases.
Coordinates in the {w1,w2} basis Coordinates in the {z1,z2} basis

3 3 z1

2 2 z2

0.7916
1 w1 1
1.319

0 v 0 v

0.6645
−1 −1
−1.192

−2 w2 −2

−3 −3

−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4

Figure 6.2: The coordinates of v = (1.9839, −0.0097) in the bases w1 = (1, 1), w2 = (1, −2)
and z1 = (1, 3), z2 = (−1, 2).

Note that the program ccoord prints the transition matrix CWZ in the MATLAB control window.
We can verify the calculations of the program ccoord on this example by hand. Recall that (6.4.5)

221
states that
' (−1 ' (
1 2 1 2
CWZ =
2 3 4 1
' (' (
−3 2 1 2
=
2 −1 4 1
' (
5 −4
= .
−2 3

Matrices of Linear Maps in Different Bases

Theorem 6.4.3. Let L : V → V be a linear mapping and let W and Z be bases of V . Then

[L]Z and [L]W

are similar matrices. More precisely,

[L]W = CZW
−1
[L]Z CZW . (6.4.6)

Proof: For every v ∈ Rn we compute

CZW [L]W [v]W = CZW [L(v)]W


= [L(v)]Z
= [L]Z [v]Z
= [L]Z CZW [v]W .

Since this computation holds for every [v]W , it follows that

CZW [L]W = [L]Z CZW .

Thus (6.4.6) is valid.

Hand Exercises

1. Let
w1 = (1, 2) and w2 = (0, 1)

and
z1 = (2, 3) and z2 = (3, 4)
2
be two bases of R . Find CW Z .

2. Let f1 (t) = cos t and f2 (t) = sin t be functions in C 1 . Let V be the two dimensional subspace
spanned by f1 , f2; so F = {f1 , f2} is a basis for V . Let L : V → V be the linear mapping defined
by L(f) = dfdt
. Find [L]F .

222
3. Let L : V → W and M : W → V be linear mappings, and assume dim V > dim W . Show that
M ◦L : V → V is not invertible.

Computer Exercises

4. Let
w1 = (0.23, 0.56) and w2 = (0.17, −0.71)

and
z1 = (−1.4, 0.3) and z2 = (0.1, −0.2)

be two bases of R2 and let v = (0.6, 0.1). Find [v]W , [v]Z , and CWZ .

5. Consider the matrix


 √ √   
1 1− 3 1+ 3 0.3333 −0.2440 0.9107
1 √ √   
A =  1+ 3 1 1 − 3  =  0.9107 0.3333 −0.2440 
3 √ √
1− 3 1+ 3 1 −0.2440 0.9107 0.3333

(a) Try to determine the way that the matrix A moves vectors in R3. For example, let
1 1
w1 = (1, 1, 1)t w2 = √ (1, −2, 1)t w3 = √ (1, 0, −1)t
6 2
and compute Awj .

(b) Let W = {w1, w2, w3} be the basis of R3 given in (a). Compute [LA ]W .

(c) Determine the way that the matrix [LA ]W moves vectors in R3. For example, consider how
this matrix moves the standard basis vectors e1 , e2, e3 . Compare this answer with that in part
(a).

223
Chapter 7

Orthogonality

In Section 7.1 we discuss orthonormal bases — bases in which each basis vector has unit
length and any two basis vectors are perpendicular. We will see that the computation of
coordinates in an orthonormal basis is particularly straightforward. We use orthonormality
in Section 7.2 to study the geometric problem of least squares approximations (given a point
v and a subspace W , find the point in W closest to v) and in Section 7.4 to study the eigen-
values and eigenvectors of symmetric matrices (the eigenvalues are real and the eigenvectors
can be chosen to be orthonormal). We present two applications of least squares approxima-
tions: the Gram-Schmidt orthonormalization process for constructing orthonormal bases
(Section 7.2) and regression or least squares fitting of data (Section 7.3). The chapter ends
with a discussion of the QR decomposition for finding orthonormal bases in Section 7.5.
This decomposition leads to an algorithm that is numerically superior to Gram-Schmidt
and is the one used in MATLAB.

7.1 Orthonormal Bases

In Section 6.3 we discussed how to write the coordinates of a vector in a basis. We now
show that finding coordinates of vectors in certain bases is a very simple task — these bases
are called orthonormal bases.

Nonzero vectors v1 , . . . , vk in Rn are orthogonal if the dot products

vi · vj = 0

when i %= j. These vectors are orthonormal if they are orthogonal and of unit length, that
is,
vi · vi = 1.

224
The standard example of a set of orthonormal vectors in Rn is the standard basis e1 , . . . , en.
Lemma 7.1.1. Nonzero orthogonal vectors are linearly independent.

Proof: Let v1 , . . . , vk be a set of nonzero orthogonal vectors in Rn and suppose that


α1 v1 + · · · + αk vk = 0.
To prove the lemma we must show that each αj = 0. Since vi · vj = 0 for i %= j,
αj vj · vj = α1 v1 · vj + · · · + αk vk · vj = (α1v1 + · · · + αk vk ) · vj = 0 · vj = 0.
Since vj · vj = ||vj ||2 > 0, it follows that αj = 0.
Corollary 7.1.2. A set of n nonzero orthogonal vectors in Rn is a basis.

Proof: Lemma 7.1.1 implies that the n vectors are linearly independent, and Chapter 5,
Corollary 5.6.7 states that n linearly independent vectors in Rn form a basis.

Next we discuss how to find coordinates of a vector in an orthonormal basis, that is, a
basis consisting of orthonormal vectors.
Theorem 7.1.3. Let V ⊂ Rn be a subspace and let {v1 , . . ., vk } be an orthonormal basis of
V . Let v ∈ V be a vector. Then
v = α1v1 + · · · + αk vk .
where
αi = v · vi .

Proof: Since {v1 , . . . , vk } is a basis of V , we can write


v = α1 v1 + · · · + αk vk
for some scalars αj . It follows that
v · vj = (α1v1 + · · · + αk vk ) · vj = αj ,
as claimed.

An Example in R3

Let
1 1 1
v1 = √ (1, 1, 1), v2 = √ (1, −2, 1) and v3 = √ (1, 0, −1).
3 6 2
It is a straightforward calculation to verify that these vectors have unit length and are
pairwise orthogonal. Let v = (1, 2, 3) be a vector and determine the coordinates of v in the
basis V = {v1, v2, v3}. Theorem 7.1.3 states that these coordinates are:
√ 7 √
[v]V = (v · v1, v · v2 , v · v3 ) = (2 3, √ , − 2).
6

225
Matrices in Orthonormal Coordinates

Next we discuss how to find the matrix associated with a linear map in an orthonormal
basis. Let L : Rn → Rn be a linear map and let V = {v1 , . . . , vn} be an orthonormal basis
for Rn . Then the matrix associated to L in the basis V is easy to calculate in terms of dot
product. That matrix is:
[L]V = (L(vj ) · vi ). (7.1.1)

To verify this claim, recall from Definition 6.3.4 of Chapter ?? that the (i, j)th entry of [L]V
is the ith entry in the vector [L(vj )]V which is L(vj ) · vi by Theorem 7.1.3.

An Example in R2

Let V = {v1, v2} ⊂ R2 where


1 1
v1 = √ (1, 1) and v2 = √ (1, −1) .
2 2

The set V is an orthonormal basis of R2 . Using (7.1.1) we can find the matrix associated
to the linear map ' (
2 1
LA (x) = x
−1 3
in the basis V by straightforward calculation. That is, compute
' ( ' (
Av1 · v1 Av2 · v1 1 5 −3
[L]V = = .
Av1 · v2 Av2 · v2 2 1 5

Remarks Concerning MATLAB

In the next section we prove that every vector subspace of Rn has an orthonormal basis (see
Theorem 7.2.3), and we present a method for constructing such a basis (the Gram-Schmidt
orthonormalization process). Here we note that certain commands in MATLAB produce
bases for vector spaces. For those commands MATLAB always produces an orthonormal
basis. For example, null(A) produces a basis for the null space of A. Take the 3 × 5 matrix
 
1 2 3 4 5
 
A =  0 1 2 3 4 .
2 3 4 0 0

Since rank(A) = 3, it follows that the null space of A is two-dimensional. Typing B =


null(A) in MATLAB produces

226
B =
-0.4666 0
0.6945 0.4313
-0.2876 -0.3235
0.3581 -0.6470
-0.2984 0.5392

The columns of B form an orthonormal basis for the null space of A. This assertion can be
checked by first typing

v1 = B(:,1);
v2 = B(:,2);

and then typing

norm(v1)
norm(v2)
dot(v1,v2)
A*v1
A*v2

yields answers 1, 1, 0, (0, 0, 0)t, (0, 0, 0)t (to within numerical accuracy). Recall that the
MATLAB command norm(v) computes the norm of a vector v.

Hand Exercises

1. Find an orthonormal basis for the solutions to the linear equation

2x1 − x2 + x3 = 0.

2. (a) Find the coordinates of the vector v = (1, 4) in the orthonormal basis V
1 1
v1 = √ (1, 2) and v2 = √ (2, −1).
5 5
' (
1 1
(b) Let A = . Find [A]V .
2 −3

Computer Exercises

227
3. Load the matrix  
1 2 0
 
A= 0 1 0 
0 0 0
into MATLAB. Then type the command orth(A). Verify that the result is an orthonormal basis for
the column space of A.

7.2 Least Squares Approximations

Let W ⊂ Rn be a subspace and x0 ∈ Rn be a vector. In this section we solve a basic geometric


problem and investigate some of its consequences. The problem is:

Find a vector w0 ∈ W that is the nearest vector in W to x0 .

x0

w0
W

Figure 7.1: Approximation of x0 by w0 ∈ W by least squares.

The distance between two vectors v and w is ||v−w|| and the geometric problem can be rephrased
as follows: find a vector w0 ∈ W such that

||x0 − w0|| ≤ ||x0 − w|| ∀w ∈ W. (7.2.1)

Condition (7.2.1) is called the least squares approximation. In order to see where this name comes
from we write(7.2.1) in the equivalent form

||x0 − w0 ||2 ≤ ||x0 − w||2 ∀w ∈ W.

This form means that for w = w0 the sum of the squares of the components of the vector x0 − w is
minimal.

Before continuing, we state and prove the Law of Phythagorus. Let z1 , z2 ∈ Rn be orthogonal
vectors. Then
||z1 + z2 ||2 = ||z1||2 + ||z2||2. (7.2.2)
To verify (7.2.2) calculate

||z1 + z2 ||2 = (z1 + z2 ) · (z1 + z2 ) = z1 · z1 + 2z1 · z2 + z2 · z2 = ||z1||2 + 2z1 · z2 + ||z2||2.

228
Since z1 and z2 are orthogonal, z1 · z2 = 0 and the Law of Phythagorus is valid.

Using (7.2.1) and (7.2.2), we can rephrase the minimum distance problem as follows.

Lemma 7.2.1. The vector w0 ∈ W is the closest vector to x0 ∈ Rn if the vector x0 −w0 is orthogonal
to every vector in W . (See Figure 7.1.)

Proof: Write x0 − w = z1 + z2 where z1 = x0 − w0 and z2 = w0 − w. By assumption, x0 − w0 is


orthogonal to every vector in W ; so z1 and z2 ∈ W are orthogonal. It follows from (7.2.2) that

||x0 − w||2 = ||x0 − w0||2 + ||w0 − w||2.

Since ||w0 − w||2 ≥ 0, (7.2.1) is valid, and w0 is the vector nearest to x0 in W .

Least Squares Distance to a Line

Suppose W is as simple a subspace as possible; that is, suppose W is one dimensional with basis
vector w. Since W is one dimensional, a vector w0 ∈ W that is closest to x0 must be a multiple of
w; that is, w0 = aw. Suppose that we can find a scalar a so that x0 − aw is orthogonal to every
vector in W . Then it follows from Lemma 7.2.1 that w0 is the closest vector in W to x0 . To find a,
calculate
0 = (x0 − aw) · w = x0 · w − aw · w.

Then
x0 · w
a=
||w||2
and
x0 · w
w0 = w. (7.2.3)
||w||2
Observe that ||w||2 #= 0 since w is a basis vector.

For example, if x0 = (1, 2, −1, 3) ∈ R4 and w = (0, 1, 2, 3). The the vector w0 in the space
spanned by w that is nearest to x0 is
9
w0 = w
14
since x0 · w = 9 and ||w||2 = 14.

Least Squares Distance to a Subspace

Similarly, using Lemma 7.2.1 we can solve the general least squares problem by solving a system of
linear equations. Let w1, . . . , wk be a basis for W and suppose that

w0 = α1w1 + · · · + αk wk

for some scalars αi. We now show how to find these scalars.

229
Theorem 7.2.2. Let x0 ∈ Rn be a vector, and let {w1, . . . , wk} be a basis for the subspace W ⊂ Rn.
Then
w0 = α1w1 + · · · + αk wk

is the nearest vector in W to x0 when


 
α1
 . 
 .  = (AtA)−1 At x0, (7.2.4)
 . 
αk

where A = (w1| · · · |wk ) is the n × k matrix whose columns are the basis vectors of W .

Proof: Observe that the vector x0 − w0 is orthogonal to every vector in W precisely when x0 − w0
is orthogonal to each basis vector wj . It follows from Lemma 7.2.1 that w0 is the closest vector to
x0 in W if
(x0 − w0) · wj = 0

for every j. That is, if


w0 · wj = x0 · wj

for every j. These equations can be rewritten as a system of equations in terms of the αi , as follows:

w1 · w1α1 + · · · + w1 · wk αk = w1 · x0
..
. (7.2.5)
wk · w1α1 + · · · + wk · wk αk = wk · x0 .

Note that if u, v ∈ Rn are column vectors, then u · v = ut v. Therefore, we can rewrite (7.2.5) as
 
α1
 . 
At A  . 
 .  = A x0 ,
t

αk

where A is the matrix whose columns are the wj and x0 is viewed as a column vector. Note that
the matrix AtA is a k × k matrix.

We claim that At A is invertible. To verify this claim, it suffices to show that the null space of
A A is zero; that is, if At Az = 0 for some z ∈ Rk , then z = 0. First, calculate
t

||Az||2 = Az · Az = (Az)t Az = z t AtAz = z t 0 = 0.

It follows that Az = 0. Now if we let z = (z1 , . . . , zk )t , then the equation Az = 0 may be rewritten
as
z1 w1 + · · · + zk wk = 0.

Since the wj are linearly independent, it follows that the zj = 0. In particular, z = 0. Since At A is
invertible, (7.2.4) is valid, and the theorem is proved.

230
Gram-Schmidt Orthonormalization Process

Suppose that W = {w1 , . . ., wk } is a basis for the subspace V ⊂ Rn. There is a natural process by
which the W basis can be transformed into an orthonormal basis V of V . This process proceeds
inductively on the wj ; the orthonormal vectors v1 , . . . , vk can be chosen so that

span{v1, . . . , vj } = span{w1, . . . , wj }

for each j ≤ k. Moreover, the vj are chosen using the theory of least squares that we have just
discussed.

The Case j = 2

To gain a feeling for how the induction process works, we verify the case j = 2. Set
1
v1 = w1 ; (7.2.6)
||w1||

so v1 points in the same direction as w1 and has unit length, that is, v1 · v1 = 1. The normalization
is shown in Figure 7.2.
,
v2

w2

v2

v w0 = (w2 ,v1 )v1


1

w1

Figure 7.2: Planar illustration of Gram-Schmidt orthonormalization.

Next, we find a unit length vector v2! in the plane spanned by w1 and w2 that is perpendicular
to v1 . Let w0 be the vector on the line generated by v1 that is nearest to w2. It follows from (7.2.3)
that
w2 · v1
w0 = v1 = (w2 · v1 )v1 .
||v1||2
The vector w0 is shown on Figure 7.2 and, as Lemma 7.2.1 states, the vector v2! = w2 − w0 is
perpendicular to v1 . That is,
v2! = w2 − (w2 · v1 )v1 (7.2.7)

is orthogonal to v1 .

Finally, set
1 !
v2 = v (7.2.8)
||v2! || 2

231
so that v2 has unit length. Since v2 and v2! point in the same direction, v1 and v2 are orthogonal.
Note also that v1 and v2 are linear combinations of w1 and w2. Since v1 and v2 are orthogonal, they
are linearly independent. It follows that

span{v1, v2} = span{w1, w2}.

In summary: computing v1 and v2 using (7.2.6), (7.2.7) and (7.2.8) yields an orthonormal basis
for the plane spanned by w1 and w2.

The General Case

Theorem 7.2.3. (Gram-Schmidt Orthonormalization) Let w1, . . . , wk be a basis for the subspace
W ⊂ Rn . Define v1 as in (7.2.6) and then define inductively

!
vj+1 = wj+1 − (wj+1 · v1 )v1 − · · · − (wj+1 · vj )vj (7.2.9)
1
vj+1 = v! . (7.2.10)
||vj+1|| j+1
!

Then v1 , . . . , vk is an orthonormal basis of W such that for each j

span{v1, . . . , vj } = span{w1, . . . , wj }

Proof: We assume that we have constructed orthonormal vectors v1 , . . . , vj such that

span{v1, . . . , vj } = span{w1, . . ., wj }.

Our purpose is to find a unit vector vj+1 that is orthogonal to each vi and that satisfies

span{v1, . . . , vj+1} = span{w1, . . ., wj+1}.

We construct vj+1 in two steps. First we find a vector vj+1 !


that is orthogonal to each of the vi using
least squares. Let w0 be the vector in span{v1, . . . , vj } that is nearest to wj+1. Theorem 7.2.2 tells
us how to make this construction. Let A be the matrix whose columns are v1 , . . . , vj . Then (7.2.4)
states that the coordinates of w0 in the vi basis is given by (At A)−1 Atwj+1 . But since the vi ’s are
orthonormal, the matrix At A is just Ik . Hence

w0 = (wj+1 · v1)v1 + · · · + (wj+1 · vj )vj .

Note that vj+1


!
= wj+1 − w0 is the vector defined in (7.2.9). We claim that vj+1 !
= wj+1 − w0 is
orthogonal to vk for k ≤ j and hence to every vector in span{v1, . . . , vj }. Just calculate

!
vj+1 · vk = wj+1 · vk − w0 · vk = wj+1 · vk − wj+1 · vk = 0.

Define vj+1 as in (7.2.10). It follows that v1 , . . . , vj+1 are orthonormal and that each vector is a
linear combination of w1, . . . , wj+1.

232
An Example of Orthonormalization

Let W ⊂ R4 be the subspace spanned by the vectors

w1 = (1, 0, −1, 0), w2 = (2, −1, 0, 1), w3 = (0, 0, −2, 1). (7.2.11)

We find an orthonormal basis for W using Gram-Schmidt orthonormalization.

Step 1: Set
1 1
v1 = w1 = √ (1, 0, −1, 0).
||w1|| 2
Step 2: Following the Gram-Schmidt process, use (7.2.9) to define
√ 1
v2! = w2 − (w2 · v1)v1 = (2, −1, 0, 1) − 2 √ (1, 0, −1, 0) = (1, −1, 1, 1).
2
Normalization using (7.2.10) yields
1 ! 1
v2 = ! v2 = (1, −1, 1, 1).
||v2|| 2

Step 3: Using (7.2.9) set

v3! = w3 − (w3 · v1 )v1 − (w3 · v2 )v2


* +
√ 1 1 1
= (0, 0, −2, 1) − 2 √ (1, 0, −1, 0) − − (1, −1, 1, 1)
2 2 2
1
= (−3, −1, −3, 5).
4
Normalization using (7.2.10) yields
1 ! 4
v3 = v = √ (−3, −1, −3, 5).
||v3! || 3 44

Hence we have constructed an orthonormal basis {v1 , v2, v3} for W , namely

v1 = √1 (1, 0, −1, 0)
2
≈ (0.7071, 0, −0.7071, 0)
v2 = 2 (1, −1, 1, 1)
1
= (0.5, −0.5, 0.5, 0.5) (7.2.12)
v3 = √ (−3, −1, −3, 5) ≈
4
44
(−0.4523, −0.1508, −0.4523, 0.7538)

Hand Exercises

1. Find an orthonormal basis of R2 by applying Gram-Schmidt orthonormalization to the vectors


w1 = (3, 4) and w2 = (1, 5).

2. Find an orthonormal basis of the plane W ⊂ R3 spanned by the vectors w1 = (1, 2, 3) and
w2 = (2, 5, −1) by applying Gram-Schmidt orthonormalization.

233
3. Let W = {w1 , . . ., wk } be an orthonormal basis of the subspace W ⊂ Rn. Prove that W can be
extended to an orthonormal basis {w1 , . . ., wn} of Rn .

Computer Exercises

4. Use Gram-Schmidt orthonormalization to find an orthonormal basis for the subspace of R5


spanned by the vectors

w1 = (2, 1, 3, 5, 7) w2 = (2, −1, 5, 2, 3) and w3 = (10, 1, −23, 2, 3).

Extend this basis to an orthonormal basis of R5 .

7.3 Least Squares Fitting of Data

We begin this section by using the method of least squares to find the best straight line fit to a set
of data. Later in the section we will discuss best fits to other curves.

An Example of Best Linear Fit to Data

Suppose that we are given n data points (xi, yi ) for i = 1, . . ., 10. For example, consider the ten
points
(2.0, 0.1) (3.0, 2.7) (1.5, −1.1) (−1.0, −5.5) (0.0, −3.4)
(3.6, 3.0) (0.7, −2.8) (4.1, 4.0) (1.9, −1.9) (5.0, 5.5)
The ten points (xi, yi ) are plotted in Figure 7.3 using the commands

e10_3_1
plot(X,Y,’o’)
axis([-3,7,-8,8])
xlabel(’x’)
ylabel(’y’)

Next, suppose that there is a linear relation between the xi and the yi ; that is, we assume that
there are constants b1 and b2 (that do not depend on i) for which yi = b1 + b2xi for each i. But
these points are just data; errors may have been made in their measurement. So we ask: Find b01
and b02 so that the error made in fitting the data to the line y = b01 + b02x is minimal, that is, the
error that is made in that fit is less than or equal to the error made in fitting the data to the line
y = b1 + b2 x for any other choice of b1 and b2 .

We begin by discussing what that error actually is. Given constants b1 and b2 and given a data
point xi, the difference between the data value yi and the hypothesized value b1 + b2xi is the error

234
8

y
−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

Figure 7.3: Scatter plot of data in (7.3).

that is made at that data point. Next, we combine the errors made at all of the data points; a
standard way to combine the errors is to use the Euclidean distance
4 51
E(b) = (y1 − (b1 + b2x1))2 + · · · + (y10 − (b1 + b2x10))2 2 .

Rewriting E(b) in vector notation leads to an economy in notation and to a conceptual advantage.
Let
X = (x1 , . . . , x10)t Y = (y1 , . . . , y10)t and F1 = (1, 1, . . ., 1)

be vectors in R10. Then in coordinates


 
y1 − (b1 + b2x1 )
 .. 
Y − (b1 F1 + b2X) = 
 .
.

y10 − (b1 + b2x10)

It follows that
E(b) = ||Y − (b1 F1 + b2X)||.

The problem of making a least squares fit is to minimize E over all b1 and b2.

To solve the minimization problem, note that the vectors b1F1 + b2 X form a two dimensional
subspace W = span{F1 , X} ⊂ R10 (at least when X is not a scalar multiple of F1, which is almost
always). Minimizing E is identical to finding a vector w0 = b01F1 + b02X ∈ W that is nearest to the
vector Y ∈ R10. This is the least squares question that we solved in the Section 7.2.

We can use MATLAB to compute the values of b01 and b02 that give the best linear approximation
to Y . If we set the matrix A = (F1 |X), then Theorem 7.2.2 implies that the values of b01 and b02 are
obtained using (7.2.4). In particular, type e10 3 1 to call the vectors X, Y, F1 into MATLAB, and
then type

A = [F1 X];
b0 = inv(A’*A)*A’*Y

to obtain

235
b0(1) = -3.8597
b0(2) = 1.8845

Superimposing the line y = −3.8597+1.8845x on the scatter plot in Figure 7.3 yields the plot in Fig-
ure 7.4. The total error is E(b0) = 1.9634 (obtained in MATLAB by typing norm(Y-(b0(1)*F1+b0(2)*X)).
Compare this with the error E(2, −4) = 2.0928.
8

0
y

−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

Figure 7.4: Scatter plot of data in (7.3) with best linear approximation.

General Linear Regression

We can summarize the previous discussion, as follows. Given n data points

(x1 , y1), . . . , (xn, yn);

form the vectors

X = (x1 , . . . , xn)t Y = (y1 , . . . , yn )t and F1 = (1, . . . , 1)t

in Rn. Find constants b01 and b02 so that b01F1 + b02X is a vector in W = span{F1, X} ⊂ Rn that is
nearest to Y . Let
A = (F1|X)

be the n × 2 matrix. This problem is solved by least squares in (7.2.4) as


4 0 05
b1, b2 = (AtA)−1 At Y. (7.3.1)

Least Squares Fit to a Quadratic Polynomial

Suppose that we want to fit the data (xi , yi) to a quadratic polynomial

y = b1 + b2 x + b3 x2

236
by least squares methods. We want to find constants b01, b02, b03 so that the error made is using
the quadratic polynomial y = b01 + b02x + b03x2 is minimal among all possible choices of quadratic
polynomials. The least squares error is
, -
E(b) = ||Y − b1F1 + b2 X + b3 X (2) ||

where
4 5t
X (2) = x21, . . . , x2n
and, as before, F1 is the n vector with all components equal to 1.

We solve the minimization problem as before. In this case, the space of possible approximations
to the data W is three dimensional; indeed, W = span{F1, X, X (2)}. As in the case of fits to lines
we try to find a point in W that is nearest to the vector Y ∈ Rn . By (7.2.4), the answer is:

b = (At A)−1 AtY,

where A = (F1|X|X (2) ) is an n × 3 matrix.

Suppose that we try to fit the data in (7.3) with a quadratic polynomial rather than a linear
one. Use MATLAB as follows

e10_3_1
A = [F1 X X.*X];
b = inv(A’*A)*A’*Y;

to obtain

b0(1) = 0.0443
b0(2) = 1.7054
b0(3) = -3.8197

So the best parabolic fit to this data is y = −3.8197 + 1.7054x + 0.0443x2. Note that the coefficient
of x2 is small suggesting that the data was well fit by a straight line. Note also that the error is
E(b0) = 1.9098 which is only marginally smaller than the error for the best linear fit. For comparison,
in Figure 7.5 we superimpose the equation for the quadratic fit onto Figure 7.4.

General Least Squares Fit

The approximation to a quadratic polynomial shows that least squares fits can be made to any finite
dimensional function space. More precisely, Let C be a finite dimensional space of functions and let

f1 (x), . . . , fm (x)

be a basis for C. We have just considered two such spaces: C = span{f1 (x) = 1, f2(x) = x} for
linear regression and C = span{f1 (x) = 1, f2(x) = x, f3(x) = x2} for least squares fit to a quadratic
polynomial.

237
8

y
−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

Figure 7.5: Scatter plot of data in (7.3) with best linear and quadratic approximations. The
best linear fit is plotted with a dashed line.

The general least squares fit of a data set

(x1, y1 ), . . . , (xn, yn)

is the function g0(x) ∈ C that is nearest to the data set in the following sense. Let

X = (x1, . . . , xn)t and Y = (y1 , . . . , yn)t

be column vectors in Rn . For any function g(x) define the column vector

G = (g(x1 ), . . . , g(xn))t ∈ Rn.

So G is the evaluation of g(x) on the data set. Then the error

E(g) = ||Y − G||

is minimal for g = g0 .

More precisely, we think of the data Y as representing the (approximate) evaluation of a function
on the xi . Then we try to find a function g0 ∈ C whose values on the xi are as near as possible to
the vector Y . This is just a least squares problem. Let W ⊂ Rn be the vector subspace spanned by
the evaluations of function g ∈ C on the data points xi , that is, the vectors G. The minimization
problem is to find a vector in W that is nearest to Y . This can be solved in general using (7.2.4).
That is, let A be the n × m matrix
A = (F1| · · · |Fm )
where Fj ∈ Rn is the column vector associated to the j th basis element of C, that is,

Fj = (fj (x1 ), . . . , fj (xn))t ∈ Rn .

The minimizing function g0(x) ∈ C is a linear combination of the basis functions f1 (x), . . . , fn(x),
that is,
g0(x) = b1f1 (x) + · · · + bm fm (x)
for scalars bi . If we set
b = (b1, . . . , bm ) ∈ Rm ,

238
then least squares minimization states that

b = (A! A)−1 A!Y. (7.3.2)

This equation can be solved easily in MATLAB. Enter the data as column n-vectors X and Y.
Compute the column vectors Fj = fj (X) and then form the matrix A = [F1 F2 · · · Fm]. Finally
compute

b = inv(A’*A)*A’*Y

Least Squares Fit to a Sinusoidal Function

We discuss a specific example of the general least squares formulation by considering the weather.
It is reasonable to expect monthly data on the weather to vary periodically in time with a period of
one year. In Table 7.1 we give average daily high and low temperatures for each month of the year
for Paris and Rio de Janeiro. We attempt to fit this data with curves of the form:
* + * +
2π 2π
g(T ) = b1 + b2 cos T + b3 sin T ,
12 12
where T is time measured in months and b1, b2, b3 are scalars. These functions are 12 periodic,
which seems appropriate for weather data, and form a three dimensional function space C. Recall
the trigonometric identity
a cos(ωt) + c sin(ωt) = d sin(ω(t − ϕ))
where .
d= a2 + c2.
Based on this identity we call C the space of sinusoidal functions. The number d is called the
amplitude of the sinusoidal function g(T ).

Paris Rio de Janeiro Paris Rio de Janeiro


Month High Low High Low Month High Low High Low
1 55 39 84 73 7 81 64 75 63
2 55 41 85 73 8 81 64 76 64
3 59 45 83 72 9 77 61 75 65
4 64 46 80 69 10 70 54 77 66
5 68 55 77 66 11 63 46 79 68
6 75 61 76 64 12 55 41 82 71

Table 7.1: Monthly Average of Daily High and Low Temperatures in Paris and Rio de
Janeiro.

Note that each data set consists of twelve entries — one for each month. Let T = (1, 2, . . ., 12)t
be the vector X ∈ R12 in the general presentation. Next let Y be the data in one of the data sets
— say the high temperatures in Paris.

239
Now we turn to the vectors representing basis functions in C. Let

F1=[1 1 1 1 1 1 1 1 1 1 1 1]’

be the vector associated with the basis function f1 (T ) = 1. Let F2 and F3 be the column vectors
associated to the basis functions
* + * +
2π 2π
f2 (T ) = cos T and f3 (T ) = sin T .
12 12
These vectors are computed by typing

F2 = cos(2*pi/12*T);
F3 = sin(2*pi/12*T);

By typing temper, we enter the temperatures and the vectors T, F1, F2 and F3 into MATLAB.

To find the best fit to the data by a sinusoidal function g(T ), we use (7.2.4). Let A be the 12 × 3
matrix

A = [F1 F2 F3];

The table data is entered in column vectors ParisH and ParisL for the high and low Paris
temperatures and RioH and RioL for the high and low Rio de Janeiro temperatures. We can find
the best least squares fit of the Paris high temperatures by a sinusoidal function g0(T ) by typing

b = inv(A’*A)*A’*ParisH

obtaining

b(1) = 66.9167
b(2) = -9.4745
b(3) = -9.3688

The result is plotted in Figure 7.6 by typing

plot(T,ParisH,’o’)
axis([0,13,0,100])
xlabel(’time (months)’)
ylabel(’temperature (Fahrenheit)’)
hold on
xx = linspace(0,13);
yy = b(1) + b(2)*cos(2*pi*xx/12) + b(3)*sin(2*pi*xx/12);
plot(xx,yy)

240
100 100

90 90

80 80

70 70

temperature (Farenheit)
temperature (Farenheit)
60 60

50 50

40 40

30 30

20 20

10 10

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
time (months) time (months)

Figure 7.6: Monthly averages of daily high temperatures in Paris (left) and Rio de Janeiro
(right) with best sinusoidal approximation.

A similar exercise allows us to compute the best approximation to the Rio de Janeiro high
temperatures obtaining

b(1) = 79.0833
b(2) = 3.0877
b(3) = 3.6487

The value of b(1) is just the mean high temperature and not surprisingly that value is much higher
in Rio than in Paris. There is yet more information contained in these approximations. For the high
temperatures in Paris and Rio

dP = 13.3244 and dR = 4.7798.

The amplitude d measures the variation of the high temperature about its mean. It is much greater
in Paris than in Rio, indicating that the difference in temperature between winter and summer is
much greater in Paris than in Rio.

Least Squares Fit in MATLAB

The general formula for a least squares fit of data (7.3.2) has been preprogrammed in MATLAB.
After setting up the matrix A whose columns are the vectors Fj just type

b = A\Y

This MATLAB command can be checked on the sinusoidal fit to the high temperature Rio de Janeiro
data by typing

b = A\RioH

241
and obtaining

b =
79.0833
3.0877
3.6487

Computer Exercises

1. World population data for each decade of this century (except for 1910) is given in Table 7.2.
Assume that population growth is linear P = mT + b where time T is measured in decades since the
year 1900 and P is measured in billions of people. This data can be recovered by typing e10 3 po.

(a) Find m and b to give the best linear fit to this data.

(b) Use this linear approximation to the data to make predictions of the world populations in the
year 1910 and 2000.

(c) Do you expect the prediction for the year 2000 to be high or low or on target? Explain why by
graphing the data with the best linear fit superimposed and by using the differential equation
population model discussed in Section ??.

Year Population (in millions) Year Population (in millions)


1900 1625 1950 2516
1910 n.a. 1960 3020
1920 1813 1970 3698
1930 1987 1980 4448
1940 2213 1990 5292

Table 7.2: Twentieth Century World Population Data by Decades.

2. Find the best sinusoidal approximation to the monthly average low temperatures in Paris and
Rio de Janeiro. How does the variation of these temperatures about the mean compare to the high
temperature calculations? Was this the result you expected?

3. In Table 7.3 we present weather data from ten U.S. cities. The data is the average number of days
in the year with precipitation and the percentage of sunny hours to hours when it could be sunny.
Find the best linear fit to this data.

7.4 Symmetric Matrices

Symmetric matrices have some remarkable properties that can be summarized by:

242
City Rainy Days Sunny (%) City Rainy Days Sunny (%)
Charleston 92 72 Kansas City 98 59
Chicago 121 54 Miami 114 85
Dallas 82 65 New Orleans 103 61
Denver 82 67 Phoenix 28 88
Duluth 136 52 Salt Lake City 99 59

Table 7.3: Precipitation Days Versus Sunny Time for Selected U.S. Cities.

Theorem 7.4.1. Let A be an n × n symmetric matrix. Then

(a) every eigenvalue of A is real, and


(b) there is an orthonormal basis of Rn consisting of eigenvectors of A.

As a consequence of Theorem 7.4.1, let V = {v1 , . . ., vn} be an orthonormal basis for Rn con-
sisting of eigenvectors of A. Indeed, suppose

Avj = λj vj

where λj ∈ R. Note that 9


λj i=j
Avj · vi =
0 i #= j
It follows from (7.1.1) that  
λ1 0
 .. 
[A]V = 
 . 

0 λn
is a diagonal matrix. So every symmetric matrix is similar to a diagonal matrix.

Hermitian Inner Products

The proof of Theorem 7.4.1 uses the Hermitian inner product — a generalization of dot product to
complex vectors. Let v, w ∈ Cn be two complex n-vectors. Define

3v, w4 = v1 w1 + · · · + vn wn .

Note that the coordinates wi of the second vector enter this formula with a complex conjugate.
However, if v and w are real vectors, then

3v, w4 = v · w.

A more compact notation for the Hermitian inner product is given by matrix multiplication. Suppose
that v and w are column n-vectors. Then

3v, w4 = vt w.

243
The properties of the Hermitian inner product are similar to those of dot product. We note
three. Let c ∈ C be a complex scalar. Then

3v, v4 = ||v||2 ≥ 0
3cv, w4 = c3v, w4
3v, cw4 = c3v, w4

Note the complex conjugation of the complex scalar c in the previous formula.

Let C be a complex n × n matrix. Then the main observation concerning Hermitian inner
products that we shall use is:
t
3Cv, w4 = 3v, C w4.
This fact is verified by calculating
t t
3Cv, w4 = (Cv)t w = (vt C t)w = vt (C tw) = vt (C w) = 3v, C w4.

So if A is a n × n real symmetric matrix, then

3Av, w4 = 3v, Aw4, (7.4.1)


t
since A = At = A.

Proof of Theorem 7.4.1(a): Let λ be an eigenvalue of A and let v be the associated eigenvector.
Since Av = λv we can use (7.4.1) to compute

λ3v, v4 = 3Av, v4 = 3v, Av4 = λ3v, v4.

Since 3v, v4 = ||v||2 > 0, it follows that λ = λ and λ is real.

Proof of Theorem 7.4.1(b): Let A be a real symmetric n × n matrix. We want to show that
there is an orthonormal basis of Rn consisting of eigenvectors of A. The proof proceeds inductively
on n. The theorem is trivially valid for n = 1; so we assume that it is valid for n − 1.

Theorem 4.3.4 of Chapter 4 implies that A has an eigenvalue λ1 and Theorem 7.4.1(a) states
that this eigenvalue is real. Let v1 be a unit length eigenvector corresponding to the eigenvalue λ1.
Extend v1 to an orthonormal basis v1 , w2, . . . , wn of Rn and let P = (v1 |w2| · · · |wn) be the matrix
whose columns are the vectors in this orthonormal basis. Orthonormality and direct multiplication
implies that
P tP = In . (7.4.2)
Therefore P is invertible; indeed P −1 = P t.

Next, let
B = P −1AP.
By direct computation

Be1 = P −1AP e1 = P −1Av1 = λ1 P −1v1 = λ1 e1 .

It follows that that B has the form ' (


λ1 ∗
B=
0 C

244
where C is an (n − 1) × (n − 1) matrix. Since P −1 = P t, it follows that B is a symmetric matrix;
to verify this point compute

B t = (P tAP )t = P tAt(P t )t = P tAP = B.

It follows that ' (


λ1 0
B=
0 C
where C is a symmetric matrix. By induction we can choose an orthonormal basis z2, . . . , zn in
{0} × Rn−1 consisting of eigenvectors of C. It follows that e1 , z2 , . . . , zn is an orthonormal basis for
Rn consisting of eigenvectors of B.

Finally, let vj = P −1zj for j = 2, . . . , n. Since v1 = P −1e1 , it follows that v1 , v2, . . . , vn is a basis
of Rn consisting of eigenvectors of A. We need only show that the vj form an orthonormal basis of
Rn . This is done using (7.4.1). For notational convenience let z1 = e1 and compute

3vi , vj 4 = 3P −1zi , P −1zj 4 = 3P tzi , P tzj 4 = 3zi , P P tzj 4 = 3zi , zj 4,

since P P t = In . Thus the vectors vj form an orthonormal basis since the vectors zj form an
orthonormal basis.

Hand Exercises

1. Let ' (
a b
A=
b d
be the general real 2 × 2 symmetric matrix.

(a) Prove directly using the discriminant of the characteristic polynomial that A has real eigen-
values.
(b) Show that A has equal eigenvalues only if A is a scalar multiple of I2 .

2. Let ' (
1 2
A= .
2 −2
Find the eigenvalues and eigenvectors of A and verify that the eigenvectors are orthogonal.

Computer Exercises

In Exercises 3 – 5 compute the eigenvalues and the eigenvectors of the 2 × 2 matrix. Then load
the matrix into the program map in MATLAB and iterate. That is, choose an initial vector v0 and
use map to compute v1 = Av0, v2 = Av1 , . . . . How does the result of iteration compare with the
eigenvectors and eigenvalues that you have found? Hint: You may find it convenient to use the
feature Rescale in the MAP Options. Then the norm of the vectors is rescaled to 1 after each use of
the command Map and the vectors vj will not escape from the viewing screen.

245
' (
1 3
3. A =
3 1
' (
11 9
4. B =
9 11
' (
0.005 −2.005
5. C =
−2.005 0.005

6. Perform
' the
( same computational experiment as described in Exercises 3 – 5 using the matrix
0 2
A= and the program map. How do your results differ from the results in those exercises
2 0
and why?

7.5 Orthogonal Matrices and QR Decompositions

In this section we describe an alternative approach to Gram-Schmidt orthonormalization for con-


structing an orthonormal basis of a subspace W ⊂ Rn . This method is called the QR decomposition
and is numerically superior to Gram-Schmidt. Indeed, the QR decomposition is the method used
by MATLAB to compute orthonormal bases. To discuss this decomposition we need to introduce a
new type of matrices, the orthogonal matrices.

Orthogonal Matrices

Definition 7.5.1. An n × n matrix Q is orthogonal if its columns form an orthonormal basis of


Rn .

The following lemma states elementary properties of orthogonal matrices:

Lemma 7.5.2. Let Q be an n × n matrix. Then

(a) Q is orthogonal if and only if Qt Q = In ;

(b) Q is orthogonal if and only if Q−1 = Qt ;

(c) If Q1 , Q2 are orthogonal matrices, then Q1 Q2 is an orthogonal matrix.

Proof: (a) Let Q = (v1 | · · · |vn). Since Q is orthogonal, the vj form an orthonormal basis. By
direct computation note that QtQ = {(vi · vj )} = In, since the vj are orthonormal. Note that (b) is
simply a restatement of (a).

(c) Now let Q1 , Q2 be orthogonal. Then (a) implies

(Q1 Q2)t (Q1Q2 ) = Qt2 Qt1Q1 Q2 = Qt2 Q2 = In,

thus proving (c).

246
The previous lemma together with (7.4.2) in the proof of Theorem 7.4.1(b) lead to the following
result:

Proposition 7.5.3. For each symmetric n × n matrix A, there exists an orthogonal matrix P such
that P tAP is a diagonal matrix.

Reflections Across Hyperplanes: Householder Matrices

Useful examples of orthogonal matrices are reflections across hyperplanes. An n − 1 dimensional


subspace of Rn is called a hyperplane. Let V be a hyperplane and let u be a nonzero vector normal
to V . Then a reflection across V is a linear map H : Rn → Rn such that

(a) Hv = v for all v ∈ V .

(b) Hu = −u.

We claim that the matrix of a reflection across a hyperplane is orthogonal and there is a simple
formula for that matrix.

Definition 7.5.4. A Householder matrix is an n × n matrix of the form


2
H = In − uut (7.5.1)
ut u
where u ∈ Rn is a nonzero vector. .

This definition makes sense since utu = ||u||2 is a number while the product uut is an n × n
matrix.

Lemma 7.5.5. Let u ∈ Rn be a nonzero vector and let V be the hyperplane orthogonal to u. Then
the Householder matrix H is a reflection across V and is orthogonal.

Proof: By definition every vector v ∈ V satisfies ut v = u · v = 0. Therefore,


2
Hv = v − uutv = v,
ut u
and
2
Hu = u − uutu = u − 2u = −u.
ut u
Hence H is a reflection across the hyperplane V . It also follows that H 2 = In since H 2 v = H(Hv) =
Hv = v for all v ∈ V and H 2 u = H(−u) = u. So H 2 acts like the identity on a basis of Rn and
H 2 = In .

To show that H is orthogonal, we first calculate


2 2
H t = Int − t
(uut )t = In − t uut = H.
uu uu
Therefore In = HH = HH t and H t = H −1 . Now apply Lemma 7.5.2(b).

247
QR Decompositions

The Gram-Schmidt process is not used in practice to find orthonormal bases as there are other
techniques available that are preferable for orthogonalization on a computer. One such procedure
for the construction of an orthonormal basis is based on QR decompositions using Householder
transformations. This method is the one implemented in MATLAB .

An n × k matrix R = {rij } is upper triangular if rij = 0 whenever i > j.

Definition 7.5.6. An n × k matrix A has a QR decomposition if

A = QR. (7.5.2)

where Q is an n × n orthogonal matrix and R is an n × k upper triangular matrix R.

QR decompositions can be used to find orthonormal bases as follows. Suppose that W =


{w1, . . . , wk} is a basis for the subspace W ⊂ Rn . Then define the n × k matrix A which has the wj
as columns, that is
A = (w1t | · · · |wkt ).

Suppose that A = QR is a QR decomposition. Since Q is orthogonal, the columns of Q are


orthonormal. So write
Q = (v1t | · · ·|vnt ).

On taking transposes we arrive at the equation At = RtQt:


 
  r11 0 · · · 0 ··· 0  
w1   v1
 .   r12 r22 · · · 0 ··· 0  .. 
 . = . . .. ..  
 .    . .
 .. .. ··· . ··· . 
wk vn
r1k r2k · · · rkk · · · 0

By equating rows in this matrix equation we arrive at the system

w1 = r11v1
w2 = r12v1 + r22v2
.. (7.5.3)
.
wk = r1k v1 + r2k v2 + · · · + rkkvk .

It now follows that the W = span{v1, . . . , vk } and that {v1, . . . , vk } is an orthonormal basis for W .
We have proved:

Proposition 7.5.7. Suppose that there exist an orthogonal n × n matrix Q and an upper triangular
n × k matrix R such that the n × k matrix A has a QR decomposition

A = QR.

Then the first k columns v1, . . . , vk of the matrix Q form an orthonormal basis of the subspace
W = span{w1, . . ., wk }, where the wj are the columns of A. Moreover, rij = vi · wj is the coordinate
of wj in the orthonormal basis.

248
Conversely, we can also write down a QR decomposition for a matrix A, if we have computed an
orthonormal basis for the columns of A. Indeed, using the Gram-Schmidt process, Theorem 7.2.3,
we have shown that QR decompositions always exist. In the remainder of this section we discuss a
different way for finding QR decompositions using Householder matrices.

Construction of a QR Decomposition Using Householder Matrices

The QR decomposition by Householder transformations is based on the following observation :

Proposition 7.5.8. Let z = (z1 , . . . , zn ) ∈ Rn be nonzero and let


)
r = zj2 + · · · + zn2 .

Define u = (u1 , . . ., un) ∈ Rn by


   
u1 0

 ..  
  .. 

 .   . 
   
 uj−1   0 
   
 uj = zj − r .
   
   
 uj+1   zj+1 
 ..   .. 
   
 .   . 
un zn
Then
2utz = utu
and  
z1
 .. 
 . 
 
 
 zj−1 
 
Hz = 
 r

 (7.5.4)
 
 0 
 . 
 . 
 . 
0
holds for the Householder matrix H = In − 2
ut u
uut.

Proof: Begin by computing

ut z = uj zj + zj+1
2
+ · · · + zn2
= zj2 − rzj + zj+1
2
+ · · · + zn2
= −rzj + r2.

Next, compute

ut u = (zj − r)(zj − r) + zj+1


2
+ · · · + zn2
= zj2 − 2rzj + r2 + zj+1
2
+ · · · + zn2
= 2(−rzj + r2).

249
Hence 2ut z = utu, as claimed.

Note that z − u is the vector on the right hand side of (7.5.4). So, compute
* +
2 2utz
Hz = In − t uut z = z − t u = z − u
uu uu

to see that (7.5.4) is valid.

An inspection of the proof of Proposition 7.5.8 shows that we could have chosen

uj = zj + r

instead of uj = zj − r. Therefore, the choice of H is not unique.

Proposition 7.5.8 allows us to determine inductively a QR decomposition of the matrix

A = (w10| · · · |wk0 ),

where each wj0 ∈ Rn. So, A is an n × k matrix and k ≤ n.

First, set z = w10 and use Proposition 7.5.8 to construct the Householder matrix H1 such that
 
r11
 
 0 
H1w10 = 
 ..  ≡ r1 .

 . 
0

Then the matrix A1 = H1A can be written as

A1 = (r1|w21| · · · |wk1 ),

where wj1 = H1wj0 for j = 2, . . ., k.

Second, set z = w21 in Proposition 7.5.8 and construct the Householder matrix H2 such that
 
r12
 r22 
 
 
H2w2 = 
1 0  ≡ r2 .
 
 .. 
 . 
0

Then the matrix A2 = H2A1 = H2H1A can be written as

A2 = (r1 |r2|w32| · · · |wk2 )

where wj2 = H2wj1 for j = 3, . . . , k. Observe that the 1st column r1 is not affected by the matrix
multiplication, since H2 leaves the first component of a vector unchanged.

Proceeding inductively, in the ith step, set z = wii−1 and use Proposition 7.5.8 to construct the

250
Householder matrix Hi such that:
 
r1i
 .. 
 
 . 
 
 rii 
Hiwii−1 =

 ≡ ri

 0 
 .. 
 
 . 
0

and the matrix Ai = HiAi−1 = Hi · · · H1A can be written as

Ai = (r1 | · · · |ri|wi+1
i
| · · · |wki ),

where wi2 = Hiwji−1 for j = i + 1, . . ., k.

After k steps we arrive at


Hk · · · H1A = R,

where R = (r1| · · · |rk ) is an upper triangular n × k matrix. Since the Householder matrices
H1, . . . , Hk are orthogonal, it follows from Lemma 7.5.2(c) that the Qt = Hk · · · H1 is orthogonal.
Thus, A = QR is a QR decomposition of A.

Orthonormalization with MATLAB

Given a set w1 , . . . , wk of linearly independent vectors in Rn the MATLAB command qr allows


us to compute an orthonormal basis of the spanning set of these vectors. As mentioned earlier,
the underlying technique MATLAB uses for the computation of the QR decomposition is based on
Householder transformations.

The syntax of the QR decomposition in MATLAB is quite simple. For example, let w1 =
(1, 0, −1, 0), w2 = (2, −1, 0, 1) and w3 = (0, 0, −2, 1) be the three vectors in (7.2.11). In Section 5.5
we computed an orthonormal basis for the subspace of R4 spanned by w1 , w2, w3. Here we use the
MATLAB command qr to find an orthonormal basis for this subspace. Let A be the matrix having
the vectors w1t , w2t and w3t as columns. So, A is:

A = [1 2 0; 0 -1 0; -1 0 -2; 0 1 1]

The command

[Q R] = qr(A,0)

leads to the answer

Q =
-0.7071 0.5000 -0.4523

251
0 -0.5000 -0.1508
0.7071 0.5000 -0.4523
0 0.5000 0.7538
R =
-1.4142 -1.4142 -1.4142
0 2.0000 -0.5000
0 0 1.6583

A comparison with (7.2.12) shows that the columns of the matrix Q are the elements in the or-
thonormal basis. The only difference is that the sign of the first vector is opposite. However, this is
not surprising since we know that there is some freedom in the choice of Householder matrices, as
remarked after Proposition 7.5.8.

In addition, the command qr produces the matrix R whose entries rij are the coordinates of the
vectors wj in the new orthonormal basis as in (7.5.3). For instance, the second column of R tells us
that
w2 = r12v1 + r22v2 + r32v3 = −1.4142v1 + 2.0000v2.

Hand Exercises

In Exercises 1 – 5 decide whether or not the given matrix is orthogonal.


' (
2 0
1. .
0 1
 
0 1 0
 
2.  0 0 1 .
1 0 0
 
0 −1 0
 
3.  0 0 1 .
−1 0 0
' (
cos(1) − sin(1)
4. .
sin(1) cos(1)
' (
1 0 4
5. .
0 1 0

6. Let Q be an orthogonal n × n matrix. Show that Q preserves the length of vectors, that is

5Qv5 = 5v5 for all v ∈ Rn.

In Exercises 7 – 10, compute the Householder matrix H corresponding to the given vector u.

252
' (
1
7. u = .
1
' (
0
8. u = .
−2
 
−1
 
9. u =  1 .
5
 
1
 0 
 
10. u =  .
 4 
−2

11. Find the matrix that reflects the plane across the line generated by the vector (1, 2).

12. Prove that the rows of an n × n orthogonal matrix form an orthonormal basis for Rn .

Computer Exercises

In Exercises 13 – 16, use the MATLAB command qr to compute an orthonormal basis for each
of the subspaces spanned by the given set of vectors.

13. w1 = (1, −1), w2 = (1, 2).

14. w1 = (1, −2, 3), w2 = (0, 1, 1).

15. w1 = (1, −2, 3), w2 = (0, 1, 1), w3 = (2, 2, 0).

16. v1 = (1, 0, −2, 0, −1), v2 = (2, −1, 4, 2, 0), v3 = (0, 3, 5, 1, −1).

17. Find the 4 × 4 Householder matrices H1 and H2 corresponding to the vectors

u1 = (1.04, 2, 0.76, −0.32)


u2 = (1.4, −1.3, 0.6, 1.2).

Compute H = H1H2 and verify that H is an orthogonal matrix.

253
Chapter 8

Matrix Normal Forms

In this chapter we generalize to n × n matrices the theory of matrix normal forms presented
in Chapter ?? for 2 × 2 matrices. In this theory we ask: What is the simplest form that
a matrix can have up to similarity. After first presenting several preliminary results, the
theory culminates in the Jordan normal form theorem, Theorem 8.4.2.

The first of the matrix normal form results — every matrix with n distinct real eigen-
values can be diagonalized — is presented in Section 8.1. The basic idea is that when a
matrix has n distinct real eigenvalues, then it has n linearly independent eigenvectors. In
Section 8.2 we discuss matrix normal forms when the matrix has n distinct eigenvalues
some of which are complex. When an n × n matrix has fewer than n linearly independent
eigenvectors, it must have multiple eigenvalues and generalized eigenvectors. This topic is
discussed in Section 8.3. The Jordan normal form theorem is introduced in Section 8.4 and
describes similarity of matrices when the matrix has fewer than n independent eigenvectors.
The proof is given in Appendix 8.6.

We introduced Markov matrices in Section 4.5. One of the theorems discussed there
has a proof that relies on the Jordan normal form theorem, and we prove this theorem in
Appendix 8.5.

8.1 Real Diagonalizable Matrices

An n × n matrix is real diagonalizable if it is similar to a diagonal matrix. More precisely,


an n × n matrix A is real diagonalizable if there exists an invertible n × n matrix S such
that
D = S −1AS

254
is a diagonal matrix. In this section we investigate when a matrix is diagonalizable. In this
discussion we assume that all matrices have real entries.

We begin with the observation that not all matrices are real diagonalizable. We saw
in Example 4.3.2 that the diagonal entries of the diagonal matrix D are the eigenvalues of
D. Theorem 4.3.8 states that similar matrices have the same eigenvalues. Thus if a matrix
is real diagonalizable, then it must have real eigenvalues. It follows, for example, that the
2 × 2 matrix ' (
0 −1
1 0
is not real diagonalizable, since its eigenvalues are ±i.

However, even if a matrix A has real eigenvalues, it need not be diagonalizable. For
example, the only matrix similar to the identity matrix In is the identity matrix itself. To
verify this point, calculate
S −1 In S = S −1S = In .

Suppose that A is a matrix all of whose eigenvalues are equal to 1. If A is similar to a


diagonal matrix D, then D must have all of its eigenvalues equal to 1. Since the identity
matrix is the only diagonal matrix with all eigenvalues equal to 1, D = In . So, if A is
similar to a diagonal matrix, it must itself be the identity matrix. Consider, however, the
2 × 2 matrix ' (
1 1
A= .
0 1
Since A is triangular, it follows that both eigenvalues of A are equal to 1. Since A is not
the identity matrix, it cannot be diagonalizable. More generally, if N is a nonzero strictly
upper triangular n × n matrix, then the matrix In + N is not diagonalizable.

These examples show that complex eigenvalues are always obstructions to real diag-
onalization and multiple real eigenvalues are sometimes obstructions to diagonalization.
Indeed,

Theorem 8.1.1. Let A be an n × n matrix with n distinct real eigenvalues. Then A is real
diagonalizable.

There are two ideas in the proof of Theorem 8.1.1, and they are summarized in the
following lemmas.

Lemma 8.1.2. Let λ1, . . . , λk be distinct real eigenvalues for an n × n matrix A. Let vj be
eigenvectors associated with the eigenvalue λj . Then {v1, . . . , vk } is a linearly independent
set.

255
Proof: We prove the lemma by using induction on k. When k = 1 the proof is simple,
since v1 %= 0. So we can assume that {v1 , . . . , vk−1} is a linearly independent set.

Let α1 , . . . , αk be scalars such that

α1 v1 + · · · + αk vk = 0. (8.1.1)

We must show that all αj = 0.

Begin by multiplying both sides of (8.1.1) by A, to obtain:

0 = A(α1 v1 + · · · + αk vk )
= α1 Av1 + · · · + αk Avk (8.1.2)
= α1 λ1v1 + · · · + αk λk vk .

Now subtract λk times (8.1.1) from (8.1.2), to obtain:

α1 (λ1 − λk )v1 + · · · + αk−1 (λk−1 − λk )vk−1 = 0.

Since {v1, . . . , vk−1 } is a linearly independent set, it follows that

αj (λj − λk ) = 0,

for j = 1, . . . , k − 1. Since all of the eigenvalues are distinct, λj − λk %= 0 and αj = 0 for


j = 1, . . . , k − 1. Substituting this information into (8.1.1) yields αk vk = 0. Since vk =
% 0,
αk is also equal to zero.

Lemma 8.1.3. Let A be an n × n matrix. Then A is real diagonalizable if and only if A


has n real linearly independent eigenvectors.

Proof: Suppose that A has n linearly independent eigenvectors v1 , . . . , vn . Let λ1, . . . , λn


be the corresponding eigenvalues of A; that is, Avj = λj vj . Let S = (v1 | · · · |vn ) be the n×n
matrix whose columns are the eigenvectors vj . We claim that D = S −1 AS is a diagonal
matrix. Compute

D = S −1AS = S −1 A(v1 | · · ·|vn ) = S −1(Av1 | · · · |Avn ) = S −1 (λ1v1 | · · ·|λn vn ).

It follows that
D = (λ1S −1v1 | · · · |λnS −1 vn ).

Note that
S −1vj = ej ,

since
Sej = vj .

256
Therefore,
D = (λ1e1 | · · · |λn en )
is a diagonal matrix.

Conversely, suppose that A is a real diagonalizable matrix. Then there exists an invert-
ible matrix S such that D = S −1 AS is diagonal. Let vj = Sej . We claim that {v1 , . . . , vn}
is a linearly independent set of eigenvectors of A.

Since D is diagonal, Dej = λj ej for some real number λj . It follows that

Avj = SDS −1vj = SDS −1 Sej = SDej = λj Sej = λj vj .

So vj is an eigenvector of A. Since the matrix S is invertible, its columns are linearly


independent. Since the columns of S are vj , the set {v1, . . . , vn } is a linearly independent
set of eigenvectors of A, as claimed.

Proof of Theorem 8.1.1: Let λ1, . . . , λn be the distinct eigenvalues of A and let v1, . . . , vn
be the corresponding eigenvectors. Lemma 8.1.2 implies that {v1 , . . . , vn} is a linearly
independent set in Rn and therefore a basis. Lemma 8.1.3 implies that A is diagonalizable.

Diagonalization Using MATLAB

Let  
−6 12 4
 
A= 8 −21 −8  .
−29 72 27
We use MATLAB to answer the questions: Is A real diagonalizable and, if it is, can we find
the matrix S such that S −1 AS is diagonal? We can find the eigenvalues of A by typing
eig(A). MATLAB’s response is:

ans =
-2.0000
-1.0000
3.0000

Since the eigenvalues of A are real and distinct, Theorem 8.1.1 states that A can be diago-
nalized. That is, there is a matrix S such that
 
−1 0 0
 
S −1 AS =  0 −2 0 
0 0 3

257
The proof of Lemma 8.1.3 tells us how to find the matrix S. We need to find the eigenvectors
v1 , v2, v3 associated with the eigenvalues −1, −2, 3, respectively. Then the matrix (v1 |v2|v3)
whose columns are the eigenvectors is the matrix S. To verify this construction we first find
the eigenvectors of A by typing

v1 = null(A+eye(3));
v2 = null(A+2*eye(3));
v3 = null(A-3*eye(3));

Now type S = [v1 v2 v3] to obtain

S =
0.8729 0.7071 0
0.4364 0.0000 0.3162
-0.2182 0.7071 -0.9487

Finally, check that S −1 AS is the desired diagonal matrix by typing inv(S)*A*S to obtain

ans =
-1.0000 0.0000 0
0.0000 -2.0000 -0.0000
0.0000 0 3.0000

It is cumbersome to use the null command to find eigenvectors and MATLAB has been
preprogrammed to do these computations automatically. We can use the eig command to
find the eigenvectors and eigenvalues of a matrix A, as follows. Type

[S,D] = eig(A)

and MATLAB responds with

S =
-0.7071 0.8729 -0.0000
-0.0000 0.4364 -0.3162
-0.7071 -0.2182 0.9487

D =
-2.0000 0 0
0 -1.0000 0
0 0 3.0000

258
The matrix S is the transition matrix whose columns are the eigenvectors of A and the
matrix D is a diagonal matrix whose j th diagonal entry is the eigenvalue of A corresponding
to the eigenvector in the j th column of S.

Hand Exercises
' (
0 3
1. Let A = .
3 0

(a) Find the eigenvalues and eigenvectors of A.

(b) Find an invertible matrix S such that S −1 AS is a diagonal matrix D. What is D?

2. The eigenvalues of  
−1 2 −1
 
A= 3 0 1 
−3 −2 −3
are 2, −2, −4. Find the eigenvectors of A for each of these eigenvalues and find a 3 × 3 invertible
matrix S so that S −1 AS is diagonal.

3. Let  
−1 4 −2
 
A= 0 3 −2  .
0 4 −3
Find the eigenvalues and eigenvectors of A, and find an invertible matrix S so that S −1 AS is
diagonal.

4. Let A and B be similar n × n matrices.

(a) Show that if A is invertible, then B is invertible.

(b) Show that A + A−1 is similar to B + B −1 .

5. Let A and B be n × n matrices. Suppose that A is real diagonalizable and that B is similar to
A. Show that B is real diagonalizable.

6. Let A be an n × n real diagonalizable matrix. Show that A + αIn is also real diagonalizable.

7. Let A be an n × n matrix with a real eigenvalue λ and associated eigenvector v. Assume that all
other eigenvalues of A are different from λ. Let B be an n × n matrix that commutes with A; that
is, AB = BA. Show that v is also an eigenvector for B.

8. Let A be an n × n matrix with distinct real eigenvalues and let B be an n × n matrix that
commutes with A. Using the result of Exercise 7, show that there is a matrix S that simultaneously
diagonalizes A and B; that is, S −1 AS and S −1 BS are both diagonal matrices.

9. Let A be an n × n matrix all of whose eigenvalues equal ±1. Show that if A is diagonalizable,
the A2 = In.

259
10. Let A be an n×n matrix all of whose eigenvalues equal 0 and 1. Show that if A is diagonalizable,
the A2 = A.

Computer Exercises

11. Consider the 4 × 4 matrix


 
12 48 68 88
 −19 −54 −57 −68 
 
C= .
 22 52 66 96 
−11 −26 −41 −64

Use MATLAB to show that the eigenvalues of C are real and distinct. Find a matrix S so that
S −1 CS is diagonal.

In Exercises 12 – 13 use MATLAB to decide whether or not the given matrix is real diagonalizable.

12.  
−2.2 4.1 −1.5 −0.2
 −3.4 4.8 −1.0 0.2 
 
A= .
 −1.0 0.4 1.9 0.2 
−14.5 17.8 −6.7 0.6

13.  
1.9 2.2 1.5 −1.6 −2.8
 
 0.8 2.6 1.5 −1.8 −2.0 
 
B=
 2.6 2.8 1.6 −2.1 −3.8 .

 
 4.8 3.6 1.5 −3.1 −5.2 
−2.1 1.2 1.7 −0.2 0.0

8.2 Simple Complex Eigenvalues

Theorem 8.1.1 states that a matrix A with real unequal eigenvalues may be diagonalized. It follows
that in an appropriately chosen basis (the basis of eigenvectors), matrix multiplication by A acts
as multiplication by these real eigenvalues. Moreover, geometrically, multiplication by A stretches
or contracts vectors in eigendirections (depending on whether the eigenvalue is greater than or less
than 1 in absolute value).

The purpose of this section is to show that a similar kind of diagonalization is possible when
the matrix has distinct complex eigenvalues. See Theorem 8.2.1 and Theorem 8.2.2. We show
that multiplication by a matrix with complex eigenvalues corresponds to multiplication by complex
numbers. We also show that multiplication by complex eigenvalues correspond geometrically to
rotations as well as expansions and contractions.

260
The Geometry of Complex Eigenvalues: Rotations and Dilatations

Real 2 × 2 matrices are the smallest real matrices where complex eigenvalues can possibly occur. In
Chapter ??, Theorem ??(b) we discussed the classification of such matrices up to similarity. Recall
that if the eigenvalues of a 2 × 2 matrix A are σ ± iτ , then A is similar to the matrix
' (
σ −τ
. (8.2.1)
τ σ

Moreover, the basis in which A has the form (8.2.1) is found as follows. Let v = w1 + iw2 be the
eigenvector of A corresponding to the eigenvalue σ − iτ . Then {w1, w2} is the desired basis.

Geometrically, multiplication of vectors in R2 by (8.2.1) is the same as a rotation followed by



a dilatation. More specifically, let r = σ2 + τ 2. So the point (σ, τ ) lies on the circle of radius r
about the origin, and there is an angle θ such that (σ, τ ) = (r cos θ, r sin θ). Now we can rewrite
(8.2.1) as ' ( ' (
σ −τ cos θ − sin θ
=r = rRθ ,
τ σ sin θ cos θ
where Rθ is rotation counterclockwise through angle θ. From this discussion we see that geometrically
complex eigenvalues are associated with rotations followed either by stretching (r > 1) or contracting
(r < 1).

As an example, consider the matrix


' (
2 1
A= . (8.2.2)
−2 0

The characteristic polynomial of A is pA (λ) = λ2 − 2λ + 2. Thus the eigenvalues of A are 1 ± i,


and σ = 1 and τ = 1 for this example. An eigenvector associated to the eigenvalue 1 − i is
v = (1, −1 − i)t = (1, −1)t + i(0, −1)t . Therefore,
' ( ' (
1 −1 1 0
B = S AS =
−1
where S = ,
1 1 −1 −1

as can be checked by direct calculation. Moreover, we can rewrite


' √ √ (
√ 2
− 2 √
B= 2 √2
2
√2
2
= 2R π4 .
2 2

So, in an appropriately chosen coordinate system, multiplication by A rotates vectors counterclock-



wise by 45◦ and then expands the result by a factor of 2. See Exercise 3.

The Algebra of Complex Eigenvalues: Complex Multiplication

We have shown that the normal form (8.2.1) can be interpreted geometrically as a rotation followed
by a dilatation. There is a second algebraic interpretation of (8.2.1), and this interpretation is based
on multiplication by complex numbers.

261
Let λ = σ + iτ be a complex number and consider the matrix associated with complex multipli-
cation, that is, the linear mapping
z )→ λz (8.2.3)
on the complex plane. By identifying real and imaginary parts, we can rewrite (8.2.3) as a real 2 × 2
matrix in the following way. Let z = x + iy. Then
λz = (σ + iτ )(x + iy) = (σx − τ y) + i(τ x + σy).
Now identify z with the vector (x, y); that is, the vector whose first component is the real part of z
and whose second component is the imaginary part. Using this identification the complex number
λz is identified with the vector (σx−τ y, τ x+σy). So, in real coordinates and in matrix form, (8.2.3)
becomes ' (
σ −τ
(x, y) )→ (σx − τ y, τ x + σy) = (x, y) .
τ σ
That is, the matrix corresponding to multiplication of z = x + iy by the complex number λ = σ + iτ
is the one that multiplies the vector (x, y)t by the normal form matrix (8.2.1).

Direct Agreement Between the Two Interpretations of (8.2.1)

We have shown that matrix multiplication by (8.2.1) may be thought of either algebraically as
multiplication by a complex number (an eigenvalue) or geometrically as a rotation followed by a
dilatation. We now show how to go directly from the algebraic interpretation to the geometric
interpretation.

Euler’s formula (Chapter ??, (??)) states that


eiθ = cos θ + i sin θ
for any real number θ. It follows that we can write a complex number λ = σ + iτ in polar form as
λ = reiθ
where r2 = λλ = σ2 + τ 2 , σ = r cos θ, and τ = r sin θ.

Now consider multiplication by λ in polar form. Write z = seiϕ in polar form, and compute
λz = reiθ seiϕ = rsei(ϕ+θ) .
It follows from polar form that multiplication of z by λ = reiθ rotates z through an angle θ and
dilates the result by the factor r. Thus Euler’s formula directly relates the geometry of rotations
and dilatations with the algebra of multiplication by a complex number.

Normal Form Matrices with Distinct Complex Eigenvalues

In the first parts of this section we have discussed a geometric and an algebraic approach to matrix
multiplication by 2 × 2 matrices with complex eigenvalues. We now turn our attention to classifying
n × n matrices that have distinct eigenvalues, whether these eigenvalues are real or complex. We will
see that there are two ways to frame this classification — one algebraic (using complex numbers)
and one geometric (using rotations and dilatations).

262
Algebraic Normal Forms: The Complex Case

Let A be an n × n matrix with real entries and n distinct eigenvalues λ1 , . . . , λn. Let vj be an
eigenvector associated with the eigenvalue λj . By methods that are entirely analogous to those in
Section 8.1 we can diagonalize the matrix A over the complex numbers. The resulting theorem is
analogous to Theorem 8.1.1.

More precisely, the n × n matrix A is complex diagonalizable if there is a complex n × n matrix


T such that  
λ1 0 · · · 0
 
 0 λ2 · · · 0 
T AT =  .
−1  .. .. 
.. .
 .. . . . 
0 0 · · · λn
Theorem 8.2.1. Let A be an n × n matrix with n distinct eigenvalues. Then A is complex diago-
nalizable.

The proof of Theorem 8.2.1 follows from a theoretical development virtually word for word the
same as that used to prove Theorem 8.1.1 in Section 8.1. Beginning from the theory that we have
developed so far, the difficulty in proving this theorem lies in the need to base the theory of linear
algebra on complex scalars rather than real scalars. We will not pursue that development here.

As in Theorem 8.1.1, the proof of Theorem 8.2.1 shows that the complex matrix T is the matrix
whose columns are the eigenvectors vj of A; that is,
T = (v1 | · · · |vn).

Finally, we mention that the computation of inverse matrices with complex entries is the same
as that for matrices with real entries. That is, row reduction of the n × 2n matrix (T |In ) leads,
when T is invertible, to the matrix (In |T −1).

Two Examples

As a first example, consider the normal form 2 × 2 matrix (8.2.1) that has eigenvalues λ and λ,
where λ = σ + iτ . Let ' ( ' (
σ −τ λ 0
B= and C = .
τ σ 0 λ
Since the eigenvalues of B and C are identical, Theorem 8.2.1 implies that there is a 2 × 2 complex
matrix T such that
C = T −1 BT. (8.2.4)
Moreover, the columns of T are the complex eigenvectors v1 and v2 associated to the eigenvalues λ
and λ.

It can be checked that the eigenvectors of B are v1 = (1, −i)t and v2 = (1, i)t . On setting
' (
1 1
T = ,
−i i

263
it is a straightforward calculation to verify that C = T −1 BT .

As a second example, consider the matrix


 
4 2 1
 
A =  2 −3 1 .
1 −1 −3

Using MATLAB we find the eigenvalues of A by typing eig(A). They are:

ans =
4.6432
-3.3216 + 0.9014i
-3.3216 - 0.9014i

We can diagonalize (over the complex numbers) using MATLAB — indeed MATLAB is programmed
to do these calculations over the complex numbers. Type [T,D] = eig(A) and obtain

T =
0.9604 -0.1299 + 0.1587i -0.1299 - 0.1587i
0.2632 0.0147 - 0.5809i 0.0147 + 0.5809i
0.0912 0.7788 - 0.1173i 0.7788 + 0.1173i

D =
4.6432 0 0
0 -3.3216 + 0.9014i 0
0 0 -3.3216 - 0.9014i

This calculation can be checked by typing inv(T)*A*T to see that the diagonal matrix D appears.
One can also check that the columns of T are eigenvectors of A.

Note that the development here does not depend on the matrix A having real entries. Indeed,
this diagonalization can be completed using n × n matrices with complex entries — and MATLAB
can handle such calculations.

Geometric Normal Forms: Block Diagonalization

There is a second normal form theorem based on the geometry of rotations and dilatations for real
n × n matrices A. In this normal form we determine all matrices A that have distinct eigenvalues
— up to similarity by real n × n matrices S. The normal form results in matrices that are block
diagonal with either 1 × 1 blocks or 2 × 2 blocks of the form (8.2.1) on the diagonal.

264
A real n × n matrix is in real block diagonal form if it is a block diagonal matrix
 
B1 0 ··· 0
 
 0 B2 ··· 0 
 . .. ..  (8.2.5)
 . .. ,
 . . . . 
0 0 · · · Bm

where each Bj is either a 1 × 1 block


Bj = λj

for some real number λj or a 2 × 2 block


' (
σj −τj
Bj = (8.2.6)
τj σj

where σj and τj #= 0 are real numbers. A matrix is real block diagonalizable if it is similar to a real
block diagonal form matrix.

Note that the real eigenvalues of a real block diagonal form matrix are just the real numbers λj
that occur in the 1 × 1 blocks. The complex eigenvalues are the eigenvalues of the 2 × 2 blocks Bj
and are σj ± iτj .

Theorem 8.2.2. Every n × n matrix A with n distinct eigenvalues is real block diagonalizable.

We need two preliminary results.

Lemma 8.2.3. Let λ1 , . . . , λq be distinct (possible complex) eigenvalues of an n × n matrix A. Let


vj be a (possibly complex) eigenvector associated with the eigenvalue λj . Then v1 , . . . , vq are linearly
independent in the sense that if
α1 v1 + · · · + αq vq = 0 (8.2.7)

for (possibly complex) scalars αj , then αj = 0 for all j.

Proof: The proof is identical in spirit with the proof of Lemma 8.1.2. Proceed by induction on q.
When q = 1 the lemma is trivially valid, as αv = 0 for v #= 0 implies that α = 0, even when α ∈ C
and v ∈ Cn .

By induction assume the lemma is valid for q − 1. Now apply A to (8.2.7) obtaining

α1λ1 v1 + · · · + αq λq vq = 0.

Subtract this identity from λq times (8.2.7), and obtain

α1(λ1 − λq )v1 + · · · + αq−1(λq−1 − λq )vq−1 = 0.

By induction
αj (λj − λq ) = 0

for j = 1, . . . , q − 1. Since the λj are distinct it follows that αj = 0 for j = 1, . . . , q − 1. Hence


(8.2.7) implies that αq vq = 0; since vq #= 0, αq = 0.

265
Lemma 8.2.4. Let µ1, . . . , µk be distinct real eigenvalues of an n×n matrix A and let ν1, ν 1 . . . , ν!, ν !
be distinct complex conjugate eigenvalues of A. Let vj ∈ Rn be eigenvectors associated to µj and let
wj = wjr + iwji be eigenvectors associated with the eigenvalues νj . Then the k + 2# vectors

v1 , . . . , vk , w1r , w1i , . . . , w!r , w!i

in Rn are linearly independent.

Proof: Let w = wr + iwi be a vector in Cn and let β r and β i be real scalars. Then

β r wr + β i wi = βw + βw, (8.2.8)

where β = 12 (β r − iβ i ). Identity (8.2.8) is verified by direct calculation.

Suppose now that

α1v1 + · · · + αk vk + β1r w1r + β1i w1i + · · · + β!r w!r + β!i w!i = 0 (8.2.9)

for real scalars αj , βjr and βji . Using (8.2.8) we can rewrite (8.2.9) as

α1v1 + · · · + αk vk + β1 w1 + β 1 w1 + · · · + β! w! + β ! w! = 0,

where βj = 12 (βjr − iβji ). Since the eigenvalues

µ1 , . . ., µk , ν1, ν 1 . . . , ν!, ν !

are all distinct, we may apply Lemma 8.2.3 to conclude that αj = 0 and βj = 0. It follows that
βjr = 0 and βji = 0, as well, thus proving linear independence. .

Proof of Theorem 8.2.2: Let µj for j = 1, . . ., k be the real eigenvalues of A and let νj , ν j for
j = 1, . . ., # be the complex eigenvalues of A. Since the eigenvalues are all distinct, it follows that
k + 2# = n.

Let vj and wj = wjr + iwji be eigenvectors associated with the eigenvalues µj and ν j . It follows
from Lemma 8.2.4 that the n real vectors

v1 , . . . , vk , w1r , w1i , . . . , w!r , w!i (8.2.10)

are linearly independent and hence form a basis for Rn.

We now show that A is real block diagonalizable. Let S be the n × n matrix whose columns are
the vectors in (8.2.10). Since these vectors are linearly independent, S is invertible. We claim that
S −1 AS is real block diagonal. This statement is verified by direct calculation.

First, note that Sej = vj for j = 1, . . . , k and compute

(S −1 AS)ej = S −1 Avj = µj S −1 vj = µj ej .

It follows that the first k columns of S −1 AS are zero except for the diagonal entries, and those
diagonal entries equal µ1, . . . , µk .

266
Second, note that Sek+1 = w1r and Sek+2 = w1i . Write the complex eigenvalues as

νj = σj + iτj .

Since Aw1 = ν 1 w1, it follows that

Aw1r + iAw1i = (σ1 − iτ1 )(w1r + iw1i )


= (σ1w1r + τ1 w1i ) + i(−τ1 w1r + σ1w1i ).

Equating real and imaginary parts leads to

Aw1r = σ1w1r + τ1 w1i


(8.2.11)
Aw1i = −τ1 w1r + σ1 w1i .

Using (8.2.11), compute

(S −1 AS)ek+1 = S −1 Aw1r = S −1 (σ1w1r + τ1 w1i ) = σ1ek+1 + τ1ek+2 .

Similarly,
(S −1 AS)ek+2 = S −1 Aw1i = S −1 (−τ1 w1r + σ1w1i ) = −τ1 ek+1 + σ1ek+2 .

Thus, the kth and (k+1)st columns of S −1 AS have the desired diagonal block in the kth and (k+1)st
rows, and have all other entries equal to zero.

The same calculation is valid for the complex eigenvalues ν2, . . . , ν!. Thus, S −1 AS is real block
diagonal, as claimed.

MATLAB Calculations of Real Block Diagonal Form

Let C be the 4 × 4 matrix  


1 0 2 3
 2 1 4 6 
 
C= .
 −1 −5 1 3 
1 4 7 10
Using MATLAB enter C by typing e13 2 14 and find the eigenvalues of C by typing eig(C) to
obtain

ans =
0.5855 + 0.8861i
0.5855 - 0.8861i
-0.6399
12.4690

We see that C has two real and two complex conjugate eigenvalues. To find the complex eigenvectors
associated with these eigenvalues, type

[T,D] = eig(C)

267
MATLAB responds with

T =
-0.0787 + 0.0899i -0.0787 - 0.0899i 0.0464 0.2209
0.0772 + 0.2476i 0.0772 - 0.2476i 0.0362 0.4803
-0.5558 - 0.5945i -0.5558 + 0.5945i -0.8421 -0.0066
0.3549 + 0.3607i 0.3549 - 0.3607i 0.5361 0.8488
D =
0.5855 + 0.8861i 0 0 0
0 0.5855 - 0.8861i 0 0
0 0 -0.6399 0
0 0 0 12.4690

The 4×4 matrix T has the eigenvectors of C as columns. The j th column is the eigenvector associated
with the j th diagonal entry in the diagonal matrix D.

To find the matrix S that puts C in real block diagonal form, we need to take the real and imag-
inary parts of the eigenvectors corresponding to the complex eigenvalues and the real eigenvectors
corresponding to the real eigenvalues. In this case, type

S = [real(T(:,1)) imag(T(:,1)) T(:,3) T(:,4)]

to obtain

S =
-0.0787 0.0899 0.0464 0.2209
0.0772 0.2476 0.0362 0.4803
-0.5558 -0.5945 -0.8421 -0.0066
0.3549 0.3607 0.5361 0.8488

Note that the 1st and 2nd columns are the real and imaginary parts of the complex eigenvector.
Check that inv(S)*C*S is the matrix in complex diagonal form

ans =
0.5855 0.8861 0.0000 0.0000
-0.8861 0.5855 0.0000 -0.0000
0.0000 0.0000 -0.6399 0.0000
-0.0000 -0.0000 -0.0000 12.4690

as proved in Theorem 8.2.2.

Hand Exercises

268
1. Consider the 2 × 2 matrix ' (
3 1
A= ,
−2 1
whose eigenvalues are 2 ± i and whose associated eigenvectors are:
' ( ' (
1−i 1+i
and
2i −2i

Find a complex 2 × 2 matrix T such that C = T −1AT is complex diagonal and a real 2 × 2 matrix
S so that B = S −1 AS is in real block diagonal form.

2. Let ' (
2 5
A= .
−2 0
Find a complex 2 × 2 matrix T such that T −1 AT is complex diagonal and a real 2 × 2 matrix S so
that S −1 AS is in real block diagonal form.

Computer Exercises

3. Use map to verify that the normal form matrices (8.2.1) are just rotations followed by dilatations.
In particular, use map to study the normal form matrix
' (
1 −1
A= .
1 1

Then compare your results with the similar matrix


' (
2 1
B= .
−2 0

4. Consider the 2 × 2 matrix ' (


−0.8318 −1.9755
A= .
0.9878 1.1437

(a) Use MATLAB to find the complex conjugate eigenvalues and eigenvectors of A.

(b) Find the real block diagonal normal form of A and describe geometrically the motion of this
normal form on the plane.

(c) Using map describe geometrically how A maps vectors in the plane to vectors in the plane.

In Exercises 5 – 8 find a square real matrix S so that S −1 AS is in real block diagonal form and a
complex square matrix T so that T −1 AT is in complex diagonal form.

5.  
1 2 4
 
A =  2 −4 −5  .
1 10 −15

269
6.  
−15.1220 12.2195 13.6098 14.9268
 −28.7805 21.8049 25.9024 28.7317 
 
A= .
 60.1951 −44.9512 −53.9756 −60.6829 
−44.5122 37.1220 43.5610 47.2927
7.  
2.2125 5.1750 8.4250 15.0000 19.2500 0.5125
 
 −1.9500 −3.9000 −6.5000 −7.4000 −12.0000 −2.9500 
 
 2.2250 3.9500 6.0500 0.9000 1.5000 1.0250 
A=

.
 −0.2000 −0.4000 0 0.1000 0 −0.2000 

 
 −0.7875 −0.8250 −1.5750 1.0000 2.2500 0.5125 
1.7875 2.8250 4.5750 0 4.7500 5.4875
8.  
−12 15 0
 
A= 1 5 2 .
−5 1 5

8.3 Multiplicity and Generalized Eigenvectors

The difficulty in generalizing the results in the previous two sections to matrices with multiple
eigenvalues stems from the fact that these matrices may not have enough (linearly independent)
eigenvectors. In this section we present the basic examples of matrices with a deficiency of eigen-
vectors, as well as the definitions of algebraic and geometric multiplicity. These matrices will be the
building blocks of the Jordan normal form theorem — the theorem that classifies all matrices up to
similarity.

Deficiency in Eigenvectors for Real Eigenvalues

An example of deficiency in eigenvectors is given by the following n × n matrix


 
λ0 1 0 ··· 0 0
 
 0 λ0 1 · · · 0 0 
 
 0 0 λ0 · · · 0 0 
 
Jn (λ0 ) =  . .. .. . . .. ..  (8.3.1)
 .. . . . . . 
 
 
 0 0 0 · · · λ0 1 
0 0 0 · · · 0 λ0

where λ0 ∈ R. Note that Jn(λ0 ) has all diagonal entries equal to λ0 , all superdiagonal entries equal
to 1, and all other entries equal to 0. Since Jn (λ0 ) is upper triangular, all n eigenvalues of Jn (λ0 ) are
equal to λ0 . However, Jn(λ0 ) has only one linearly independent eigenvector. To verify this assertion
let
N = Jn (λ0 ) − λ0 In.
Then v is an eigenvector of Jn (λ0 ) if and only if N v = 0. Therefore, Jn (λ0 ) has a unique linearly
independent eigenvector if

270
Lemma 8.3.1. nullity(N ) = 1.

Proof: In coordinates the equation N v = 0 is:


    
0 1 0 ··· 0 0 v1 v2
    
 0 0 1 ··· 0 0  v2   v3 
    
 0 0 0 ··· 0 0   v3   v4 
    
 . .. .. . . .. ..   .. = ..  = 0.
 .. . . . . .   .   . 
    
    
 0 0 0 ··· 0 1  vn−1   vn 
0 0 0 ··· 0 0 vn 0

Thus v2 = v3 = · · · vn = 0, and the solutions are all multiples of e1 . Therefore, the nullity of N is
1.

Note that we can express matrix multiplication by N as

N e1 = 0
(8.3.2)
N ej = ej−1 j = 2, . . . , n.

Note that (8.3.2) implies that N n = 0.

The n × n matrix N motivates the following definitions.

Definition 8.3.2. Let λ0 be an eigenvalue of A. The algebraic multiplicity of λ0 is the number of


times that λ0 appears as a root of the characteristic polynomial pA (λ). The geometric multiplicity
of λ0 is the number of linearly independent eigenvectors of A having eigenvalue equal to λ0.

Abstractly, the geometric multiplicity is:

nullity(A − λ0 In ).

Our previous calculations show that the matrix Jn(λ0 ) has an eigenvalue λ0 with algebraic
multiplicity equal to n and geometric multiplicity equal to 1.

Lemma 8.3.3. The algebraic multiplicity of an eigenvalue is greater than or equal to its geometric
multiplicity.

Proof: For ease of notation we prove this lemma only for real eigenvalues, though the proof for
complex eigenvalues is similar. Let A be an n × n matrix and let λ0 be a real eigenvalue of A. Let
k be the geometric multiplicity of λ0 and let v1 , . . . , vk be k linearly independent eigenvectors of A
with eigenvalue λ0 . We can extend {v1 , . . ., vk } to be a basis V = {v1, . . . , vn} of Rn. In this basis,
the matrix of A is ' (
λ0 Ik (∗)
[A]V = .
0 B
The matrices A and [A]V are similar matrices. Therefore, they have the same characteristic polyno-
mials and the same eigenvalues with the same algebraic multiplicities. It follows from Lemma 4.1.9
that the characteristic polynomial of A is:

pA (λ) = p[A]V (λ) = (λ − λ0)k pB (λ).

271
Hence λ0 appears as a root of pA (λ) at least k times and the algebraic multiplicity of λ0 is greater
than or equal to k. The same proof works when λ0 is a complex eigenvalue — but all vectors chosen
must be complex rather than real.

Deficiency in Eigenvectors with Complex Eigenvalues

An example of a real matrix with complex conjugate eigenvalues having geometric multiplicity less
than algebraic multiplicity is the 2n × 2n block matrix
 
B I2 0 · · · 0 0
 
 0 B I2 · · · 0 0 
 
 0 0 B ··· 0 0 
 
J:n(λ0 ) =  . .. .. . . .. ..  (8.3.3)
 .. . . . . . 
 
 
 0 0 0 · · · B I2 
0 0 0 ··· 0 B

where λ0 = σ + iτ and B is the 2 × 2 matrix


' (
σ −τ
B= .
τ σ

Lemma 8.3.4. Let λ0 be a complex number. Then the algebraic multiplicity of the eigenvalue λ0
in the 2n × 2n matrix J:n (λ0 ) is n and the geometric multiplicity is 1.

Proof: We begin by showing that the eigenvalues of J = J:n(λ0 ) are λ0 and λ0 , each with algebraic
multiplicity n. The characteristic polynomial of J is pJ (λ) = det(J − λI2n). From Lemma 4.1.9 of
Chapter 4 and induction, we see that pJ (λ) = pB (λ)n . Since the eigenvalues of B are λ0 and λ0 , we
have proved that the algebraic multiplicity of each of these eigenvalues in J is n.

Next, we compute the eigenvectors of J. Let Jv = λ0 v and let v = (v1 , . . ., vn ) where each
vj ∈ C2 . Observe that (J − λ0 I2n)v = 0 if and only if

Qv1 + v2 = 0
..
.
Qvn−1 + vn = 0
Qvn = 0,

where Q = B − λ0I2 . Using the fact that λ0 = σ + iτ , it follows that


' (
i 1
Q = B − λ0 I2 = −τ .
−1 i

Hence ' (
i 1
Q = 2τ i
2 2
= −2τ iQ.
−1 i
Thus
0 = Q2vn−1 + Qvn = −2τ iQvn−1 ,

272
from which it follows that Qvn−1 + vn = vn = 0. Similarly, v2 = · · · = vn−1 = 0. Since there is only
one nonzero complex vector v1 (up to a complex scalar multiple) satisfying

Qv1 = 0,

it follows that the geometric multiplicity of λ0 in the matrix J:n (λ0) equals 1.

Definition 8.3.5. The real matrices Jn(λ0 ) when λ0 ∈ R and J:n(λ0 ) when λ0 ∈ C are real Jordan
blocks. The matrices Jn(λ0 ) when λ0 ∈ C are (complex) Jordan blocks.

Generalized Eigenvectors and Generalized Eigenspaces

What happens when n × n matrices have fewer that n linearly independent eigenvectors? Answer:
The matrices gain generalized eigenvectors.

Definition 8.3.6. A vector v ∈ Cn is a generalized eigenvector for the n×n matrix A with eigenvalue
λ if
(A − λIn )k v = 0 (8.3.4)

for some positive integer k. The smallest integer k for which (8.3.4) is satisfied is called the index
of the generalized eigenvector v.

Note: Eigenvectors are generalized eigenvectors with index equal to 1.

Let λ0 be a real number and let N = Jn (λ0 ) − λ0 In . Recall that (8.3.2) implies that N n = 0.
Hence every vector in Rn is a generalized eigenvector for the matrix Jn(λ0 ). So Jn (λ0 ) provides a
good example of a matrix whose lack of eigenvectors (there is only one independent eigenvector) is
made up for by generalized eigenvectors (there are n independent generalized eigenvectors).

Let λ0 be an eigenvalue of the n × n matrix A and let A0 = A − λ0 In . For simplicity, assume


that λ0 is real. Note that

null space(A0 ) ⊂ null space(A20 ) ⊂ · · · ⊂ null space(Ak0 ) ⊂ · · · ⊂ Rn.

Therefore, the dimensions of the null spaces are bounded above by n and there must be a smallest
k such that
dim null space(Ak0 ) = dim null space(Ak+1
0 ).

It follows that
null space(Ak0 ) = null space(Ak+1
0 ). (8.3.5)

Lemma 8.3.7. Let λ0 be a real eigenvalue of the n × n matrix A and let A0 = A − λ0 In . Let k be
the smallest integer for which (8.3.5) is valid. Then

null space(Ak0 ) = null space(Ak+j


0 )

for every interger j > 0.

273
Proof: We can prove the lemma by induction on j if we can show that

null space(Ak+1
0 ) = null space(Ak+2
0 ).

Since null space(Ak+1


0 ) ⊂ null space(Ak+2
0 ), we need to show that

null space(Ak+2
0 ) ⊂ null space(Ak+1
0 ).

Let w ∈ null space(Ak+2


0 ). It follows that

Ak+1Aw = Ak+2 w = 0;

so Aw ∈ null space(Ak+1
0 ) = null space(Ak0 ), by (8.3.5). Therefore,

Ak+1 w = Ak (Aw) = 0,

which verifies that w ∈ null space(Ak+1


0 ).

Let Vλ0 be the set of all generalized eigenvectors of A with eigenvalue λ0 . Let k be the smallest
integer satisfying (8.3.5), then Lemma 8.3.7 implies that

Vλ0 = null space(Ak0 ) ⊂ Rn

is a subspace called the generalized eigenspace of A associated to the eigenvalue λ0 . It will follow
from the Jordan normal form theorem (see Theorem 8.4.2) that the dimension of Vλ0 is the algebraic
multiplicity of λ0 .

An Example of Generalized Eigenvectors

Find the generalized eigenvectors of the 4 × 4 matrix


 
−24 −58 −2 −8
 15 35 1 4 
 
A= .
 3 5 7 4 
3 6 0 6
and their indices. When finding generalized eigenvectors of a matrix A, the first two steps are:

(i) Find the eigenvalues of A.


(ii) Find the eigenvectors of A.

After entering A into MATLAB by typing e13 3 6, we type eig(A) and find that all of the eigen-
values of A equal 6. Without additional information, there could be 1,2,3 or 4 linearly independent
eigenvectors of A corresponding to the eigenvalue 6. In MATLAB we determine the number of
linearly independent eigenvectors by typing null(A-6*eye(4)) and obtaining

ans =
0.8892 0
-0.4446 0.0000
-0.0262 0.9701
-0.1046 -0.2425

274
We now know that (numerically) there are two linearly independent eigenvectors. The next step
is find the number of independent generalized eigenvectors of index 2. To complete this calculation,
we find a basis for the null space of (A − 6I4)2 by typing null((A-6*eye(4))^2) obtaining

ans =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

Thus, for this example, all generalized eigenvectors that are not eigenvectors have index 2.

Hand Exercises

In Exercises 1 – 4 determine the eigenvalues and their geometric and algebraic multiplicities for the
given matrix.
 
2 0 0 0
 0 3 1 0 
 
1. A =  .
 0 0 3 0 
0 0 0 4
 
2 0 0 0
 0 2 0 0 
 
2. B =  .
 0 0 3 1 
0 0 0 3
 
−1 1 0 0
 0 −1 0 0 
 
3. C =  .
 0 0 −1 0 
0 0 0 1
 
2 −1 0 0
 1 2 0 0 
 
4. D =  .
 0 0 2 1 
0 0 0 2

In Exercises 5 – 8 find a basis consisting of the eigenvectors for the given matrix supplemented by
generalized eigenvectors. Choose the generalized eigenvectors with lowest index possible.
' (
1 −1
5. A = .
1 3
 
−2 0 −2
 
6. B =  −1 1 −2 .
0 1 −1

275
 
−6 31 −14
 
7. C =  −1 6 −2 .
0 2 1
 
5 1 0
 
8. D =  −3 1 1 .
−12 −4 0

Computer Exercises

In Exercises 9 – 10, use MATLAB to find the eigenvalues and their algebraic and geometric multi-
plicities for the given matrix.

9.  
2 3 −21 −3
 2 7 −41 −5 
 
A= .
 0 1 −5 −1 
0 0 4 4

10.  
179 −230 0 10 −30
 
 144 −185 0 8 −24 
 
B=
 30 −39 −1 3 −9 .
 
 192 −245 0 9 −30 
40 −51 0 2 −7

8.4 The Jordan Normal Form Theorem

The question that we discussed in Sections 8.1 and 8.2 is: Up to similarity, what is the simplest
form that a matrix can have? We have seen that if A has real distinct eigenvalues, then A is
real diagonalizable. That is, A is similar to a diagonal matrix whose diagonal entries are the real
eigenvalues of A. Similarly, if A has distinct real and complex eigenvalues, then A is complex
diagonalizable; that is, A is similar either to a diagonal matrix whose diagonal entries are the real
and complex eigenvalues of A or to a real block diagonal matrix.

In this section we address the question of simplest form when a matrix has multiple eigenval-
ues. In much of this discussion we assume that A is an n × n matrix with only real eigenvalues.
Lemma 8.1.3 shows that if the eigenvectors of A form a basis, then A is diagonalizable. Indeed,
for A to be diagonalizable, there must be a basis of eigenvectors of A. It follows that if A is not
diagonalizable, then A must have fewer than n linearly independent eigenvectors.

The prototypical examples of matrices having fewer eigenvectors than eigenvalues are the ma-
trices Jn (λ) for λ real (see (8.3.1)) and J:n (λ) for λ complex (see (8.3.3)).

276
Definition 8.4.1. A matrix is in Jordan normal form if it is block diagonal and the matrix in each
block on the diagonal is a Jordan block, that is, J! (λ) for some integer # and some real or complex
number λ.

A matrix is in real Jordan normal form if it is block diagonal and the matrix in each block on
the diagonal is a real Jordan block, that is, either J! (λ) for some integer # and some real number λ
or J:! (λ) for some integer # and some complex number λ.

The main theorem about Jordan normal form is:

Theorem 8.4.2 (Jordan normal form). Let A be an n×n matrix. Then A is similar to a Jordan
normal form matrix and to a real Jordan normal form matrix.

This theorem is proved by constructing a basis V for Rn so that the matrix S −1 AS is in Jordan
normal form, where S is the matrix whose columns consists of vectors in V. The algorithm for
finding the basis V is complicated and is found in Appendix 8.6. In this section we construct V only
in the special and simpler case where each eigenvalue of A is real and is associated with exactly one
Jordan block.

More precisely, let λ1 , . . ., λs be the distinct eigenvalues of A and let

Aj = A − λj In.

The eigenvectors corresponding to λj are the vectors in the null space of Aj and the generalized
eigenvectors are the vectors in the null space of Akj for some k. The dimension of the null space of
Aj is precisely the number of Jordan blocks of A associated to the eigenvalue λj . So the assumption
that we make here is
nullity(Aj ) = 1
for j = 1, . . . , s.

Let kj be the integer whose existence is specified by Lemma 8.3.7. Since, by assumption, there is
only one Jordan block associated with the eigenvalue λj , it follows that kj is the algebraic multiplicity
of the eigenvalue λj .

To find a basis in which the matrix A is in Jordan normal form, we proceed as follows. First,
let wjkj be a vector in
k k −1
null space(Aj j ) – null space(Aj j ).
Define the vectors wji by

wj,kj −1 = Aj wj,kj
..
.
wj,1 = Aj wj,2.

Second, when λj is real, let the kj vectors vji = wji, and when λj is complex, let the 2kj vectors vji
be defined by

vj,2i−1 = Re(wji)
vj,2i = Im(wji).

277
Let V be the set of vectors vji ∈ Rn . We will show in Appendix 8.6 that the set V consists of n
vectors and is a basis of Rn . Let S be the matrix whose columns are the vectors in V. Then S −1 AS
is in Jordan normal form.

The Cayley Hamilton Theorem

As a corollary of the Jordan normal form theorem, we prove the Cayley Hamilton theorem which
states that a square matrix satisfies its characteristic polynomial. More precisely:
Theorem 8.4.3 (Cayley Hamilton). Let A be a square matrix and let pA (λ) be its characteristic
polynomial. Then
pA(A) = 0.

Proof: Let A be an n × n matrix. The characteristic polynomial of A is

pA (λ) = det(A − λIn ).

Suppose that B = P −1AP is a matrix similar to A. Theorem 4.3.8 states that pB = pA . Therefore

pB (B) = pA (P −1 AP ) = P −1pA(A)P.

So if the Cayley Hamilton theorem holds for a matrix similar to A, then it is valid for the matrix A.
Moreover, using the Jordan normal form theorem, we may assume that A is in Jordan normal form.

Suppose that A is block diagonal, that is


' (
A1 0
A= ,
0 A2
where A1 and A2 are square matrices. Then

pA (λ) = pA1 (λ)pA2 (λ).

This observation follows directly from Lemma 4.1.9. Since


' (
Ak1 0
A =
k
,
0 Ak2
it follows that
' ( ' (
pA (A1 ) 0 pA1 (A1 )pA2 (A1 ) 0
pA (A) = = .
0 pA (A2 ) 0 pA1 (A2 )pA2 (A2 )
It now follows from this calculation that if the Cayley Hamilton theorem is valid for Jordan blocks,
then pA1 (A1 ) = 0 = pA2 (A2 ). So pA (A) = 0 and the Cayley Hamilton theorem is valid for all
matrices.

A direct calculation shows that Jordan blocks satisfy the Cayley Hamilton theorem. To begin,
suppose that the eigenvalue of the Jordan block is real. Note that the characteristic polynomial of
the Jordan block Jn (λ0) in (8.3.1) is (λ − λ0 )n . Indeed, Jn (λ0 ) − λ0 In is strictly upper triangular
and (Jn (λ0 ) − λ0 In)n = 0. If λ0 is complex, then either repeat this calculation using the complex
Jordan form or show by direct calculation that (A − λ0In )(A − λ0 In ) is strictly upper triangular
when A = J:n(λ0 ) is the real Jordan form of the Jordan block in (8.3.3).

278
An Example

Consider the 4 × 4 matrix


 
−147 −106 −66 −488
 604 432 271 1992 
 
A= .
 621 448 279 2063 
−169 −122 −76 −562

Using MATLAB we can compute the characteristic polynomial of A by typing

poly(A)

The output is

ans =
1.0000 -2.0000 -15.0000 -0.0000 -0.0000

Note that since A is a matrix of integers we know that the coefficients of the characteristic polynomial
of A must be integers. Thus the characteristic polynomial is exactly:

pA (λ) = λ4 − 2λ3 − 15λ2 = λ2(λ − 5)(λ + 3).

So λ1 = 0 is an eigenvalue of A with algebraic multiplicity two and λ2 = 5 and λ3 = −3 are simple


eigenvalues of multiplicity one.

We can find eigenvectors of A corresponding to the simple eigenvalues by typing

v2 = null(A-5*eye(4));
v3 = null(A+3*eye(4));

At this stage we do not know how many linearly independent eigenvectors have eigenvalue 0. There
are either one or two linearly independent eigenvectors and we determine which by typing null(A)
and obtaining

ans =
-0.1818
0.6365
0.7273
-0.1818

So MATLAB tells us that there is just one linearly independent eigenvector having 0 as an eigenvalue.
There must be a generalized eigenvector in V0. Indeed, the null space of A2 is two dimensional and
this fact can be checked by typing

279
null2 = null(A^2)

obtaining

null2 =
0.2193 -0.2236
-0.5149 -0.8216
-0.8139 0.4935
0.1561 0.1774

Choose one of these vectors, say the first vector, to be v12 by typing

v12 = null2(:,1);

Since the algebraic multiplicity of the eigenvalue 0 is two, we choose the fourth basis vector be
v11 = Av12. In MATLAB we type

v11 = A*v12

obtaining

v11 =
-0.1263
0.4420
0.5051
-0.1263

Since v11 is nonzero, we have found a basis for V0 . We can now put the matrix A in Jordan normal
form by setting

S = [v11 v12 v2 v3];


J = inv(S)*A*S

to obtain

J =
-0.0000 1.0000 0.0000 -0.0000
0.0000 0.0000 0.0000 -0.0000
-0.0000 -0.0000 5.0000 0.0000
0.0000 -0.0000 -0.0000 -3.0000

280
We have only discussed a Jordan normal form example when the eigenvalues are real and mul-
tiple. The case when the eigenvalues are complex and multiple first occurs when n = 4. A sample
complex Jordan block when the matrix has algebraic multiplicity two eigenvalues σ ±iτ of geometric
multiplicity one is
 
σ −τ 1 0
 τ σ 0 1 
 
 .
 0 0 σ −τ 
0 0 τ σ

Numerical Difficulties

When a matrix has multiple eigenvalues, then numerical difficulties can arise when using the MAT-
LAB command eig(A), as we now explain.

Let p(λ) = λ2 . Solving p(λ) = 0 is very easy — in theory — as λ = 0 is a double root of p.


Suppose, however, that we want to solve p(λ) = 0 numerically. Then, numerical errors will lead to
solving the equation
λ2 = +

where + is a small number. Note that if + > 0, the solutions are ± +; while if + < 0, the solutions
.
are ±i |+|. Since numerical errors are machine dependent, + can be of either sign. The numerical
process of finding double roots of a characteristic polynomial (that is, double eigenvalues of a matrix)
is similar to numerically solving the equation λ2 = 0, as we shall see.

For example, on a Sun SPARCstation 10 using MATLAB version 4.2c, the eigenvalues of the
4 × 4 matrix A in (8.4) (in format long) obtained using eig(A) are:

ans =
5.00000000001021
-0.00000000000007 + 0.00000023858927i
-0.00000000000007 - 0.00000023858927i
-3.00000000000993

That is, MATLAB computes two complex conjugate eigenvalues

±0.00000023858927i

which corresponds to an + of -5.692483975913288e-14. On a IBM compatible 486 computer using


MATLAB version 4.2 the same computation yields eigenvalues

ans=
4.99999999999164
0.00000057761008
-0.00000057760735
-2.99999999999434

281
That is, on this computer MATLAB computes two real, near zero, eigenvalues

±0.00000057761

that corresponds to an + of 3.336333121e-13. These errors are within round off error in double
precision computation.

A consequence of these kinds of error, however, is that when a matrix has multiple eigenvalues,
we cannot use the command [V,D] = eig(A) with confidence. On the Sun SPARCstation, this
command yields a matrix

V =
-0.1652 0.0000 - 0.1818i 0.0000 + 0.1818i -0.1642
0.6726 -0.0001 + 0.6364i -0.0001 - 0.6364i 0.6704
0.6962 -0.0001 + 0.7273i -0.0001 - 0.7273i 0.6978
-0.1888 0.0000 - 0.1818i 0.0000 + 0.1818i -0.1915

that suggests that A has two complex eigenvectors corresponding to the ‘complex’ pair of near zero
eigenvalues. The IBM compatible yields the matrix

V =
-0.1652 0.1818 -0.1818 -0.1642
0.6726 -0.6364 0.6364 0.6704
0.6962 -0.7273 0.7273 0.6978
-0.1888 0.1818 -0.1818 -0.1915

indicating that MATLAB has found two real eigenvectors corresponding to the near zero real eigen-
values. Note that the two eigenvectors corresponding to the eigenvalues 5 and −3 are correct on
both computers.

Hand Exercises

1. Write two different 4 × 4 Jordan normal form matrices all of whose eigenvalues equal 2 for which
the geometric multiplicity is two.

2. How many different 6 × 6 Jordan form matrices have all eigenvalues equal to 3? (We say that two
Jordan form matrices are the same if they have the same number and type of Jordan block, though
not necessarily in the same order along the diagonal.)

3. A 5 × 5 matrix A has three eigenvalues equal to 4 and two eigenvalues equal to −3. List the
possible Jordan normal forms for A (up to similarity). Suppose that you can ask your computer to
compute the nullity of precisely two matrices. Can you devise a strategy for determining the Jordan
normal form of A? Explain your answer.

282
4. An 8 × 8 real matrix A has three eigenvalues equal to 2, two eigenvalues equal to 1 + i, and one
zero eigenvalue. List the possible Jordan normal forms for A (up to similarity). Suppose that you
can ask your computer to compute the nullity of precisely two matrices. Can you devise a strategy
for determining the Jordan normal form of A? Explain your answer.

In Exercises 5 – 10 find the Jordan normal forms for the given matrix.
' (
2 4
5. A = .
1 1
' (
9 25
6. B = .
−4 −11
 
−5 −8 −9
 
7. C =  5 9 9 .
−1 −2 −1
 
0 1 0
 
8. D =  0 0 1 .
1 1 −1
 
2 0 −1
 
9. E =  2 1 −1 .
1 0 0
 
3 −1 2
 
10. F =  −1 2 −1 .
−1 1 0
 
2 0 0
 
11. Compute etJ where J =  0 −1 1 .
0 0 −1
 
2 1 0 0 0
 
 0 2 0 0 0 
 
12. Compute etJ where J = 
 0 0 3 1 0 .

 
 0 0 0 3 1 
0 0 0 0 3

13. An n × n matrix N is nilpotent if N k = 0 for some positive integer k.

(a) Show that the matrix N defined in (8.3.2) is nilpotent.

(b) Show that all eigenvalues of a nilpotent matrix equal zero.

(c) Show that any matrix similar to a nilpotent matrix is also nilpotent.

(d) Let N be a matrix all of whose eigenvalues are zero. Use the Jordan normal form theorem to
show that N is nilpotent.

283
14. Let A be a 3 × 3 matrix. Use the Cayley-Hamilton theorem to show that A−1 is a linear
combination of I3 , A, A2. That is, there exist real scalars a, b, c such that

A−1 = aI3 + bA + cA2.

Computer Exercises

In Exercises 15 – 19, (a) determine the real Jordan normal form for the given matrix A, and (b) find
the matrix S so that S −1 AS is in real Jordan normal form.

15.  
−3 −4 −2 0
 −9 −39 −16 −7 
 
A= .
 18 64 27 10 
15 86 34 18

16.  
9 45 18 8
 0 −4 −1 −1 
 
A= .
 −16 −69 −29 −12 
25 123 49 23

17.  
−5 −13 17 42
 −10 −57 66 187 
 
A= .
 −4 −23 26 77 
−1 −9 9 32

18.  
1 0 −9 18
 12 −7 −26 77 
 
A= .
 5 −2 −13 32 
2 −1 −4 11

19.  
−1 −1 1 0
 −3 1 1 0 
 
A= .
 −3 2 −1 1 
−3 2 0 0

20.  
0 0 −1 2 2
 
 1 −2 0 2 2 
 
A=
 1 −1 −1 2 2 .

 
 0 0 0 1 2 
0 0 0 −1 3

284
8.5 *Appendix: Markov Matrix Theory

In this appendix we use the Jordan normal form theorem to study the asymptotic dynamics of
transition matrices such as those of Markov chains introduced in Section 4.5.

The basic result is the following theorem.

Theorem 8.5.1. Let A be an n × n matrix and assume that all eigenvalues λ of A satisfy |λ| < 1.
Then for every vector v0 ∈ Rn
lim Ak v0 = 0. (8.5.1)
k→∞

Proof: Suppose that A and B are similar matrices; that is, B = SAS −1 for some invertible matrix
S. Then B k = SAk S −1 and for any vector v0 ∈ Rn (8.5.1) is valid if and only if

lim B k v0 = 0.
k→∞

Thus, when proving this theorem, we may assume that A is in Jordan normal form.

Suppose that A is in block diagonal form; that is, suppose


' (
C 0
A= ,
0 D

where C is an # × # matrix and D is a (n − #) × (n − #) matrix. Then


' (
Ck 0
A =
k
.
0 Dk

So for every vector v0 = (w0, u0) ∈ R! × Rn−! (8.5.1) is valid if and only if

lim C k v0 = 0 and lim Dk v0 = 0.


k→∞ k→∞

So, when proving this theorem, we may assume that A is a Jordan block.

Consider the case of a simple Jordan block. Suppose that n = 1 and that A = (λ) where λ is
either real or complex. Then
Ak v0 = λk v0 .

It follows that (8.5.1) is valid precisely when |λ| < 1. Next, suppose that A is a nontrivial Jordan
block. For example, let ' (
λ 1
A= = λI2 + N
0 λ
where N 2 = 0. It follows by induction that
1
Ak v0 = λk v0 + kλk−1N v0 = λk v0 + kλk N v0.
λ
Thus (8.5.1) is valid precisely when |λ| < 1. The reason for this convergence is as follows. The first
term converges to 0 as before but the second term is the product of three terms k, λk , and λ1 N v0.

285
The first increases to infinity, the second decreases to zero, and the third is constant independent of
k. In fact, geometric decay (λk , when |λ| < 1) always beats polynomial growth. Indeed,

lim mj λm = 0 (8.5.2)
m→∞

for any integer j. This fact can be proved using l’Hôspital’s rule and induction.

So we see that when A has a nontrivial Jordan block, convergence is subtler than when A has
only simple Jordan blocks, as initially the vectors Av0 grow in magnitude. For example, suppose
that λ = 0.75 and v0 = (1, 0)t. Then A8 v0 = (0.901, 0.075)t is the first vector in the sequence Ak v0
whose norm is less than 1; that is, A8 v0 is the first vector in the sequence closer to the origin than
v0 .

It is also true that (8.5.1) is valid for any Jordan block A and for all v0 precisely when |λ| < 1.
To verify this fact we use the binomial theorem. We can write a nontrivial Jordan block as λIn + N
where N k+1 = 0 for some integer k. We just discussed the case k = 1. In this case
' ( ' (
m m
(λIn + N ) = λ In + mλ
m m m−1
N+ λ N + ···+
m−2 2
λm−k N k ,
2 k

where ' (
m m! m(m − 1) · · · (m − j + 1)
= = .
j j!(m − j)! j!
To verify that
lim (λIn + N )m = 0
m→∞

we need only verify that each term


' (
m
lim λm−j N j = 0
m→∞ j

Such terms are the product of three terms


1
m(m − 1) · · · (m − j + 1) and λm and Nj.
j!λj

The first term has polynomial growth to infinity dominated by mj , the second term decreases to
zero geometrically, and the third term is constant independent of m. The desired convergence to
zero follows from (8.5.2).

Definition 8.5.2. The n×n matrix A has a dominant eigenvalue λ0 > 0 if λ0 is a simple eigenvalue
and all other eigenvalues λ of A satisfy |λ| < λ0 .

Theorem 8.5.3. Let P be a Markov matrix. Then 1 is a dominant eigenvalue of P .

Proof: Recall from Chapter 3, Definition 4.5.1 that a Markov matrix is a square matrix P whose
entries are nonnegative, whose rows sum to 1, and for which a power P k that has all positive entries.
To prove this theorem we must show that all eigenvalues λ of P satisfy |λ| ≤ 1 and that 1 is a simple
eigenvalue of P .

286
Let λ be an eigenvalue of P and let v = (v1 , . . . , vn)t be an eigenvector corresponding to the
eigenvalue λ. We prove that |λ| ≤ 1. Choose j so that |vj | ≥ |vi | for all i. Since P v = λv, we can
equate the j th coordinates of both sides of this equality, obtaining

pj1v1 + · · · + pjnvn = λvj .

Therefore,
|λ||vj | = |pj1v1 + · · · + pjnvn | ≤ pj1|v1| + · · · + pjn|vn|,
since the pij are nonnegative. It follows that

|λ||vj | ≤ (pj1 + · · · + pjn)|vj | = |vj |,

since |vi| ≤ |vj | and rows of P sum to 1. Since |vj | > 0, it follows that λ ≤ 1.

Next we show that 1 is a simple eigenvalue of P . Recall, or just calculate directly, that the vector
(1, . . . , 1)t is an eigenvector of P with eigenvalue 1. Now let v = (v1 , . . . , vn)t be an eigenvector of P
with eigenvalue 1. Let Q = P k so that all entries of Q are positive. Observe that v is an eigenvector
of Q with eigenvalue 1, and hence that all rows of Q also sum to 1.

To show that 1 is a simple eigenvalue of Q, and therefore of P , we must show that all coordinates
of v are equal. Using the previous estimates (with λ = 1), we obtain

|vj | = |qj1v1 + · · · + qjnvn| ≤ qj1|v1| + · · · + qjn|vn| ≤ |vj |. (8.5.3)

Hence
|qj1v1 + · · · + qjnvn| = qj1|v1| + · · · + qjn|vn|.
This equality is valid only if all of the vi are nonnegative or all are nonpositive. Without loss of
generality, we assume that all vi ≥ 0. It follows from (8.5.3) that

vj = qj1v1 + · · · + qjnvn .

Since qji > 0, this inequality can hold only if all of the vi are equal.

Theorem 8.5.4. (a) Let Q be an n × n matrix with dominant eigenvalue λ > 0 and associated
eigenvector v. Let v0 be any vector in Rn . Then
1 k
lim Q v0 = cv,
k→∞ λk
for some scalar c.

(b) Let P be a Markov matrix and v0 a nonzero vector in Rn with all entries nonnegative. Then

lim (P t)k v0 = V
k→∞

where V is the eigenvector of P t with eigenvalue 1 such that the sum of the entries in V is equal to
the sum of the entries in v0 .

Proof: (a) After a similarity transformation, if needed, we can assume that Q is in Jordan normal
form. More precisely, we can assume that
' (
1 1 0
Q=
λ 0 A

287
where A is an (n − 1) × (n − 1) matrix with all eigenvalues µ satisfying |µ| < 1. Suppose v0 =
(c0 , w0) ∈ R × Rn−1. It follows from Theorem 8.5.1 that
' (
1 k 1 k c0 0
lim Q v0 = lim ( Q) v0 = lim = c0e1 .
k→∞ λk k→∞ λ k→∞ 0 Ak w0

Since e1 is the eigenvector of Q with eigenvalue λ Part (a) is proved.

(b) Theorem 8.5.3 states that a Markov matrix has a dominant eigenvalue equal to 1. The
Jordan normal form theorem implies that the eigenvalues of P t are equal to the eigenvalues of P
with the same algebraic and geometric multiplicities. It follows that 1 is also a dominant eigenvalue
of P t. It follows from Part (a) that
lim (P t)k v0 = cV
k→∞

for some scalar c. But Theorem 4.5.3 in Chapter 3 implies that the sum of the entries in v0 equals the
sum of the entries in cV which, by assumption equals the sum of the entries in V . Thus, c = 1.

Hand Exercises

1. Let A be an n × n matrix. Suppose that

lim Ak v0 = 0.
k→∞

for every vector v0 ∈ Rn . Then the eigenvalues λ of A all satisfy |λ| < 1.

8.6 *Appendix: Proof of Jordan Normal Form

We prove the Jordan normal form theorem under the assumption that the eigenvalues of A are all
real. The proof for matrices having both real and complex eigenvalues proceeds along similar lines.

Let A be an n × n matrix, let λ1 , . . ., λs be the distinct eigenvalues of A, and let Aj = A − λj In.

Lemma 8.6.1. The linear mappings Ai and Aj commute.

Proof: Just compute

Ai Aj = (A − λi In )(A − λj In) = A2 − λi A − λj A + λi λj In ,

and
Aj Ai = (A − λj In)(A − λi In) = A2 − λj A − λi A + λj λi In .

So Ai Aj = Aj Ai , as claimed.

Let Vj be the generalized eigenspace corresponding to eigenvalue λj .

Lemma 8.6.2. Ai : Vj → Vj is invertible when i #= j.

288
Proof: Recall from Lemma 8.3.7 that Vj = null space(Akj ) for some k ≥ 1. Suppose that v ∈ Vj .
We first verify that Aiv is also in Vj . Using Lemma 8.6.1, just compute

Akj Ai v = Ai Akj v = Ai 0 = 0.

Therefore, Ai v ∈ null space(Akj ) = Vj .

Let B be the linear mapping Ai |Vj . It follows from Chapter 6, Theorem 6.2.3 that

nullity(B) + dim range(B) = dim(Vj ).

Now w ∈ null space(B) if w ∈ Vj and Ai w = 0. Since Ai w = (A − λi In)w = 0, it follows that


Aw = λi w. Hence
Aj w = (A − λj In )w = (λi − λj )w

and
Akj w = (λi − λj )k w.

Since λi #= λj , it follows that Akj w = 0 only when w = 0. Hence the nullity of B is zero. We conclude
that
dim range(B) = dim(Vj ).

Thus, B is invertible, since the domain and range of B are the same space.

Lemma 8.6.3. Nonzero vectors taken from different generalized eigenspaces Vj are linearly inde-
pendent. More precisely, if wj ∈ Vj and

w = w1 + · · · + ws = 0,

then wj = 0.

k
Proof: Let Vj = null space(Aj j ) for some integer kj . Let C = Ak22 ◦ · · · ◦Aks s . Then

0 = Cw = Cw1,
k
since Aj j wj = 0 for j = 2, . . . , s. But Lemma 8.6.2 implies that C|V1 is invertible. Therefore,
w1 = 0. Similarly, all of the remaining wj have to vanish.

Lemma 8.6.4. Every vector in Rn is a linear combination of vectors in the generalized eigenspaces
Vj .

Proof: Let W be the subspace of Rn consisting of all vectors of the form z1 + · · · + zs where
zj ∈ Vj . We need to verify that W = Rn. Suppose that W is a proper subspace. Then choose a
basis w1 , . . ., wt of W and extend this set to a basis W of Rn . In this basis the matrix [A]W has
block form, that is, ' (
A11 A12
[A]W = ,
0 A22
where A22 is an (n − t) × (n − t) matrix. The eigenvalues of A22 are eigenvalues of A. Since all of
the distinct eigenvalues and eigenvectors of A are accounted for in W (that is, in A11 ), we have a
contradiction. So W = Rn, as claimed.

289
Lemma 8.6.5. Let Vj be a basis for the generalized eigenspaces Vj and let V be the union of the
sets Vj . Then V is a basis for Rn .

Proof: We first show that the vectors in V span Rn . It follows from Lemma 8.6.4 that every
vector in Rn is a linear combination of vectors in Vj . But each vector in Vj is a linear combination
of vectors in Vj . Hence, the vectors in V span Rn .

Second, we show that the vectors in V are linearly independent. Suppose that a linear combi-
nation of vectors in V sums to zero. We can write this sum as

w1 + · · · + ws = 0,

where wj is the linear combination of vectors in Vj . Lemma 8.6.3 implies that each wj = 0. Since
Vj is a basis for Vj , it follows that the coefficients in the linear combinations wj must all be zero.
Hence, the vectors in V are linearly independent.

Finally, it follows from Theorem 5.5.3 of Chapter 5 that V is a basis.

Lemma 8.6.6. In the basis V of Rn guaranteed by Lemma 8.6.5, the matrix [A]V is block diagonal,
that is,  
A11 0 0
 .. 
[A]V = 
 0 . 
0 ,
0 0 Ass
where all of the eigenvalues of Ajj equal λj .

Proof: It follows from Lemma 8.6.1 that A : Vj → Vj . Suppose that vj ∈ Vj . Then Avj is in Vj
and Avj is a linear combination of vectors in Vj . The block diagonalization of [A]V follows. Since
k
Vj = null space(Aj j ), it follows that all eigenvalues of Ajj equal λj .

Lemma 8.6.6 implies that to prove the Jordan normal form theorem, we must find a basis in
which the matrix Ajj is in Jordan normal form. So, without loss of generality, we may assume that
all eigenvalues of A equal λ0 , and then find a basis in which A is in Jordan normal form. Moreover,
we can replace A by the matrix A − λ0 In , a matrix all of whose eigenvalues are zero. So, without
loss of generality, we assume that A is an n × n matrix all of whose eigenvalues are zero. We now
sketch the remainder of the proof of Theorem 8.4.2.

Let k be the smallest integer such that Rn = null space(Ak ) and let

s = dim null space(Ak ) − dim null space(Ak−1 ) > 0.

Let z1 , . . . , zn−s be a basis for null space(Ak−1) and extend this set to a basis for null space(Ak ) by
adjoining the linearly independent vectors w1, . . . , ws. Let

Wk = span{w1, . . ., ws}.

It follows that Wk ∩ null space(Ak−1) = {0}.

We claim that the ks vectors W = {wj! = A! (wj )} where 0 ≤ # ≤ k − 1 and 1 ≤ j ≤ s are


linearly independent. We can write any linear combination of the vectors in W as yk + · · · + y1 ,

290
where yj ∈ Ak−j (Wk ). Suppose that

yk + · · · + y1 = 0.

Then Ak−1(yk + · · · + y1 ) = Ak−1yk = 0. Therefore, yk is in Wk and in null space(Ak−1). Hence,


yk = 0. Similarly, Ak−2 (yk−1 + · · · + y1 ) = Ak−2yk−1 = 0. But yk−1 = Aŷk where ŷk ∈ Wk and
ŷk ∈ null space(Ak−1 ). Hence, ŷk = 0 and yk−1 = 0. Similarly, all of the yj = 0. It follows from
yj = 0 that a linear combination of the vectors Ak−j (w1), . . . , Ak−j (ws) is zero; that is

0 = β1 Ak−j (w1) + · · · + βs Ak−j (ws ) = Ak−j (β1 w1 + · · · + βs ws ).

Applying Aj−1 to this expression, we see that

β1 w1 + · · · + βs ws

is in Wk and in the null space(Ak−1). Hence,

β1 w1 + · · · + βs ws = 0.

Since the wj are linearly independent, each βj = 0, thus verifying the claim.

Next, we find the largest integer m so that

t = dim null space(Am ) − dim null space(Am−1 ) > 0.

Proceed as above. Choose a basis for null space(Am−1 ) and extend to a basis for null space(Am )
by adjoining the vectors x1, . . . , xt. Adjoin the mt vectors A! xj to the set V and verify that these
vectors are all linearly independent. And repeat the process. Eventually, we arrive at a basis for
Rn = null space(Ak ).

In this basis the matrix [A]V is block diagonal; indeed, each of the blocks is a Jordan block, since
9
wj(!−1) 0 < # ≤ k − 1
A(wj!) = .
0 #=1

Note the resemblance with (8.3.2).

291
MATLAB Commands

† indicates an laode toolbox command not found in MATLAB

Chapter 1: Preliminaries
Editing and Number Commands

quit Ends MATLAB session


; (a) At end of line the semicolon suppresses echo printing
(b) When entering an array the semicolon indicates a new row
↑ Displays previous MATLAB command
[] Brackets indicating the beginning and the end of a vector or a matrix
x=y Assigns x the value of y
x(j) Recalls j th entry of vector x
A(i,j) Recalls ith row, j th column of matrix A
A(i,:) Recalls ith row of matrix A
A(:,j) Recalls j th column of matrix A

Vector Commands

norm(x) The norm or length of a vector x


dot(x,y) Computes the dot product of vectors x and y
†addvec(x,y) Graphics display of vector addition in the plane
†addvec3(x,y) Graphics display of vector addition in three dimensions

Matrix Commands

A! (Conjugate) transpose of matrix


zeros(m,n) Creates an m × n matrix all of whose entries equal 0
zeros(n) Creates an n × n matrix all of whose entries equal 0
diag(x) Creates an n × n diagonal matrix whose diagonal entries
are the components of the vector x ∈ Rn
eye(n) Creates an n × n identity matrix

292
Special Numbers in MATLAB

pi The number π = 3.1415 . . .


acos(a) The inverse cosine of the number a

Chapter 2: Solving Linear Equations


Editing and Number Commands

format Changes the numbers display format to standard five digit format
format long Changes display format to 15 digits
format rational Changes display format to rational numbers
format short e Changes display to five digit floating point numbers

Vector Commands

x.*y Componentwise multiplication of the vectors x and y


x./y Componentwise division of the vectors x and y
x.^y Componentwise exponentiation of the vectors x and y

Matrix Commands

A([i j],:) = A([j i],:)


Swaps ith and j th rows of matrix A
A\b Solves the system of linear equations associated with
the augmented matrix (A|b)
x = linspace(xmin,xmax,N)
Generates a vector x whose entries are N equally spaced points
from xmin to xmax
x = xmin:xstep:xmax
Generates a vector whose entries are equally spaced points from xmin to xmax
with stepsize xstep
[x,y] = meshgrid(XMIN:XSTEP:XMAX,YMIN:YSTEP:YMAX);
Generates two vectors x and y. The entries of x are values from XMIN to XMAX
in steps of XSTEP. Similarly for y.
rand(m,n) Generates an m × n matrix whose entries are randomly and uniformly chosen
from the interval [0, 1]
rref(A) Returns the reduced row echelon form of the m × n matrix A
the matrix after each step in the row reduction process
rank(A) Returns the rank of the m × n matrix A

Graphics Commands

293
plot(x,y) Plots a graph connecting the points (x(i), y(i)) in sequence
xlabel(’labelx’) Prints labelx along the x axis
ylabel(’labely’) Prints labely along the y axis
surf(x,y,z) Plots a three dimensional graph of z(j) as a function of x(j) and y(j)
hold on Instructs MATLAB to add new graphics to the previous figure
hold off Instructs MATLAB to clear figure when new graphics are generated
grid Toggles grid lines on a figure
axis(’equal’) Forces MATLAB to use equal x and y dimensions
view([a b c]) Sets viewpoint from which an observer sees the current 3-D plot
zoom Zoom in and out on 2-D plot. On each mouse click, axes change by a factor of 2

Special Numbers and Functions in MATLAB

exp(x) The number ex where e = exp(1) = 2.7182 . . .



sqrt(x) The number x

i The number −1

Chapter 3: Matrices and Linearity


Matrix Commands

A*x Performs the matrix vector product of the matrix A with the vector x
A*B Performs the matrix product of the matrices A and B
size(A) Determines the numbers of rows and columns of a matrix A
inv(A) Computes the inverse of a matrix A

Program for Matrix Mappings

†map Allows the graphic exploration of planar matrix mappings

Special Functions in MATLAB

sin(x) The number sin(x)


cos(x) The number cos(x)

Matrix Commands

eig(A) Computes the eigenvalues of the matrix A


null(A) Computes the solutions to the homogeneous equation Ax = 0

294
Chapter 4: Determinants and Eigenvalues
Matrix Commands

det(A) Computes the determinant of the matrix A


poly(A) Returns the characteristic polynomial of the matrix A
sum(v) Computes the sum of the components of the vector v
trace(A) Computes the trace of the matrix A
[V,D] = eig(A) Computes eigenvectors and eigenvalues of the matrix A

Chapter 6: Linear Maps and Changes of Coordinates


Vector Commands

†bcoord Geometric illustration of planar coordinates by vector addition


†ccoord Geometric illustration of coordinates relative to two bases

Chapter 7: Orthogonality
Matrix Commands

orth(A) Computes an orthonormal basis for the column space of the matrix A
[Q,R] = qr(A,0) Computes the QR decomposition of the matrix A

Graphics Commands

axis([xmin,xmax,ymin,ymax])
Forces MATLAB to use in a twodimensional plot the intervals
[xmin,xmax] resp. [ymin,ymax] labeling the x- resp. y-axis
plot(x,y,’o’) Same as plot but now the points (x(i), y(i)) are marked by
circles and no longer connected in sequence

Matrix Commands

[V,D] = eig(A) Computes eigenvectors and eigenvalues of the matrix A

Chapter 8: Matrix Normal Forms


Vector Commands

real(v) Returns the vector of the real parts of the components


of the vector v
imag(v) Returns the vector of the imaginary parts of the components
of the vector v

295
Answers to Selected
Odd-Numbered Problems

Chapter 1: Preliminaries

Section 1.1: Vectors and Matrices


' (
4 −4
1 (3, 2, 2) 3 (1, −1, 9) 5 (5, 0) 7 not possible 9 not possible 11
−11 11

Section 1.2: MATLAB


   
15 15
   
1 (a) 11; (b)  3 ; (c) (−6, −2, −6, −4); (d)  0 
24 18
' (
−14.0300 −5.8470 7.0600
3 (23.1640, −3.5620, −12.8215) 5
−9.7600 11.0570 −9.6600

Section 1.3: Special Kinds of Matrices

1 symmetric 3 symmetric 5 symmetric 7 strictly upper triangular 9 not upper triangular 11 3 13


mn 15 n(n+1)
2
17 true 19 false 21 At = (3)

Section 1.4: The Geometry of Vector Operations



1 ||x|| = 3 3 ||x|| = 3 5 perpendicular 7 not perpendicular 9 a = 10
3 11 x · y = 4; cos θ = √2
5
13
x · y = 13; cos θ = 613
√ 15 x · y = 31; cos θ = √ 31
5 1410
≈ 0.8256

Chapter 2: Solving Linear Equations

Section 2.1: Systems of Linear Equations and Matrices

1 (x, y) = (2, 4) 3 (x, y) = (−4, 1) 5 (a) has an infinite number of solutions; (b) has no solutions. 7
(a) p(x) = −x2 + 5x + 1 11

296
ans =
-12.0495
-0.8889
7.8384

Section 2.2 The Geometry of Low-Dimensional Solutions

1 2x + 3y + z = −5 3 z = x 5 (a) u = (2, 2, 1) (b) v = (1, 1, 2) (c) cos θ = √26 ; θ = 35.2644◦. 7


(x, y) ≈ (2.15, −1.54) 9 (x, y, z) = (1, 3, −1) 11 The function has three relative maxima.

Section 2.3: Gaussian Elimination

1 not in reduced echelon form 3 not in reduced echelon form 5 The 1st, 3rd , and 5th columns contain
pivots. The system is inconsistent; no solutions. 9 inconsistent; no solutions 11 (a) Infinitely many
solutions; (b) one variable can be assigned arbitrary values. 13 unique solution 15 linear 17 not
linear 19 not linear 21 consistent

23 The row echelon form is:

A =
0 1.0000 2.0000 1.0000 14.0000 21.0000 0 -1.0000
0 0 0 1.0000 3.0000 5.0000 0 9.0000
0 0 0 0 1.0000 -0.5000 0 -4.7143
0 0 0 0 0 1.0000 0 0.3457
0 0 0 0 0 0 0 1.0000
0 0 0 0 0 0 0 0

Section 2.4 Reduction to Echelon Form


 
1 2 0 4
 
1 The reduced echelon form of the matrix A is  0 0 1 2 ; rank(A) = 2.
0 0 0 0

3 four 5 consistent; three parameters 7 inconsistent 9 1 11 2

Section 2.5 Linear Equations with Special Coefficients


' ( ' ( ' ( ' (
3
x1 − 12 i x1 1
1 = 2 3 = 2 5
x2 − 12 − 12 i x2 − 12

A\b =
0.3006+ 0.2462i
-0.6116+ 0.0751i

Chapter 3: Matrices and Linearity

297
Section 3.1: Matrix Multiplication of Vectors

' ( x1 ' ( ' (
2 3 −2   4 3 1
1 (4, −11) 3 (6, −10) 5 (13) 9  x2  = 11 A = 13 No
6 0 −5 1 −5 4
x3
' (
1 2
upper triangular matrix satisfies (3.1.6), but any symmetric matrix of the form satisfies
2 a22
(3.1.6). 15

b =
103.5000
175.8000
-296.9000
-450.1000
197.4000
656.6000
412.4000

17

A\b =
-2.3828
-1.0682
0.1794

Section 3.2: Matrix Mappings


' ( ' ( ' ( ' (
√1 √1 1 0 0 1 0 −1
1 (x1 , 0) 3 (x1, 3x1) 5 R(−45◦) =
t t 2 2 7 9 11 R90◦ =
− √12 √1
2
0 −1 1 0 1 0
15 A maps x = (1, 1) to twice its length and x = s(0, 1) to half its length. 17 C maps x = (1, 0) to
t t

twice its length and x = (1, 2) to − 12 times its length. 19 Matrix B rotates the plane by θ =≈ 3.0585

counterclockwise and dilatates it by a factor of c = 5.8 ≈ 2.4083. 21 A rotates the plane 30◦ clock-
wise. 23 C reflects the plane across the line y = x. 25 E maps (x, y) to a point on the line y = x;
that point is ( x+y
2
, x+y
2
).

Section 3.3: Linearity

1 (a) (−5, 11); (b) (6, 8, −16); (c) (21, 7, −10, −2) 3 α = − 75 ; β = − 35 5 α = 13
5
γ + 13
7
and
   
0 −1 −1 0 1 0
   
β = − 13 γ − 13 7 not linear 9 not linear 11 A =  1
14 4
0 2  13 A =  0 0 1  17
1 −2 0 1 0 0
The mapping rotates a 2-vector 90◦ clockwise and then it halves its length.

Section 3.4: The Principle of Superposition

298
       
−1 −1 1 0
       
1 (a) v1 =  1  ; v2 =  0  (b) w1 =  0  ; w2 =  1 
0 1 −1 −1
       
−11 1 1 −11
       
3 (a) s  7 ; (b)  1 ; (c)  1  + s  7 
1 1 1 1

Section 3.5: Composition and Multiplication of Matrices


' ( ' ( ' (
−2 0 −2 0 −11 8
1 AB = ; BA = 3 Neither AB nor BA is defined. 5
7 −1 5 −1 −3 2
   
−4 13 3 ' ( 10 −5 15
  b11 0  
7  −12 11 −11  9 B = 11  −5 6 −4  13 Neither AB nor BA is
0 b22
3 −1 4 15 −4 26
defined.

Section 3.6: Properties of Matrix Multiplication


   2 
1 1 12 1 t t2
   
3 B =  0 1 1 ; C =  0 1 t 
0 0 1 0 0 1

Section 3.7: Solving Linear Systems and Inverses


 
−8 32 −9
1  
3 a #= 0 and b #= 0 5 A−1 =  2 2 1  9 A is invertible for any a, b, and c, and
10
2 −8 1
 
1 −a −b + ac
 
A = 0
−1
1 −c .
0 0 1

11 Type N = [B eye(4)] in MATLAB and then row reduce N to obtain

ans =
1.0000 0 0 0 -1.5714 -0.4286 0 1.4286
0 1.0000 0 0 0.7429 0.0571 0.2000 -0.4571
0 0 1.0000 0 -0.9143 0.3143 -0.4000 0.4857
0 0 0 1.0000 -0.6000 -0.2000 -0.2000 0.6000

Section 3.8: Determinants of 2 × 2 Matrices


' (
2 −1
1 9 y = 29
11 11 A is invertible; det(A) = 4. 13 A is not invertible; det(A) = 0
−3 2

Chapter 5: Vector Spaces

299
Section 5.1: Vector Spaces and Subspaces

3 V1 and V3 are identical. 5 not a subspace 7 subspace 9 subspace 11 not a subspace 13 subspace
15 not a subspace 17 subspace when c = 0; not a subspace when c #= 0

Section 5.2: Construction of Subspaces

1 span{(1, 0, −4)t, (0, 1, 2)t} 3 span{(1, 0, −1)t, (0, 1, −1)t}


' (
1 1 0 0
5 span{(−2, 1, 0, 0, 0) , (−1, 0, −4, 1, 0) } 7 span{(−2, −1, 1) } 9
t t t
11 (2, 20, 0)t =
0 0 1 1
−4w1 + 6w2 13 t4 #∈ W 15 y(t) = 0.5t2 ∈ W , but {y(t), x2(t)} does not span W .

Section 5.3: Spanning Sets and MATLAB

1 span{(0.3225, 0.8931, −0.0992, 0.2977)t, (0, −0.1961, 0.5883, 0.7845)t}

3 span{(−0.8452, −0.1690, 0.5071)t}. 5 span{(−1, −3, 1, 0)t, (3/4, 2, 0, 1)t} 7 v2 #∈ W

Section 5.4: Linear Dependence and Linear Independence

3 linearly dependent. 9 linearly dependent

Section 5.5: Dimension and Bases

3 {(1, 1, 1, 0), (−2, −2, 0, 1)} is a basis; the dimension is two. 5 dim P2 = 3; dim Pn = n + 1

Section 5.6: The Proof of the Main Theorem

1 A plane with N = n3(− 32 , 1, 1). 3 A plane with N = n3(0, 0, 1). 5 (a) 5; (c) 5 − r; (d) 5 − r 9 (a)
λ #= 2; (b) λ = 2 11 {(1, 0, 0, − 21 , 32 ), (0, 1, 0, 12 , − 21 ), (0, 0, 1, 12 , 32 )}

Chapter 4: Determinants and Eigenvalues

Section 4.1: Determinants

1 −28 3 14 7 −4 9 (a) 1 and −1/3; (b) yes


     
7 −2 10 0 2 5 0 2 5
     
13 B11 =  0 0 −1 ; B23 =  0 0 −1 ; B43 =  −1 7 10 
4 2 −10 3 4 −10 0 0 −1

Section 4.3: Eigenvalues

1 pA (λ) = −λ3 + 2λ2 + λ − 2; the eigenvalues are 1, −1, and 2. 3 {(−1, 1, 0)t, (1, 0, 1)t} 5 (a) The
eigenvalues are 3 and −2 with corresponding eigenvectors (1, −1)t and (1, −2)t. (c) (2x1 + x2 , −x1 −
x2). 9 false 11 (a) The eigenvalues are −0.5861 ± 20.2517, −12.9416, −9.1033, and 5.2171. The
trace is −18. The characteristic polynomial is λ5 + 18λ4 + 433λ3 + 6296λ2 + 429λ − 252292. 13 B

300
is the zero matrix.

Chapter 6: Linear Maps and Changes of Coordinates

Section 6.1: Linear Mappings and Bases


' (
−7 −11 3
1A=
−4 −7 2

Section 6.2: Row Rank Equals Column Rank

1 The possible choices are α3(−1, −1, 1) and β3 (− 75 , − 95 , 1).

3 (a) {(1, 0, 1, 0), (0, 1, −1, 0), (0, 0, 0, 1)} is a basis for the row space of A; the row rank of A is 3.
(b) The column rank of A is 3; {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is a basis for the column space of A. (c)
{(−1, 1, 1, 0)} is a basis for the null space; the nullity of A is 1. (d) The null space is trivial and the
nullity of At is 0.

Section 6.3: Vectors and Matrices in Coordinates

1 [v]W = (7, 4) 5 [v]W = (−2, 2, −1) 7 [L]W is diagonal in the basis W = {(1, 2) , (2, 3)}

Section 6.4: Matrices of Linear Maps on a Vector Space


' (
2 3
1 CWZ = 5 (a) A fixes w1, moves w2 to w3 , and moves w3 to −w2 . (b) [LA ]W =
−1 −2
 
1 0 0
 
 0 0 −1  (c) [LA ]W fixes e1 , moves e2 to e3 , and moves e3 to −e2 .
0 1 0

Chapter 7: Orthogonality

Section 7.1: Orthonormal Bases

1 √1 (1, 1, −1) and √1 (0, 1, 1)


3 2

Section 7.2: Least Squares Approximations

1 15 (3, 4) and 15 (−4, 3)

Section 7.3: Least Squares Fitting of Data

1 (a) m ≈ 0.4084 and b ≈ 0.9603, where m and b are in billions. (b) In 1910 P ≈ 1369 million
people. (c) The prediction for 2000 is likely to be low. 3 Let R be the number of days in the year
with precipitation and let s be the percentage of sunny hours to daylight hours. Then the best linear
estimate of the relationship between the two is R ≈ 199.2 − 156.6s.

301
Section 7.4: Symmetric Matrices

3 The eigenvectors are (1, 1) and (1, −1); the eigenvalues are 4 and −2;

5 The eigenvectors are (1, 1) and (1, −1); the eigenvalues of C are −2 and 2.01.

Section 7.5: Orthogonal Matrices and QR Decompositions


' (
0 −1
1 not orthogonal 3 orthogonal 5 not orthogonal 7 H =
−1 0
 
25 2 10 ' (
1   − 35 4
9 H = 27  2 25 −10  11 4
5
3
13 The orthonormal basis generated by the com-
10 −10 −50 5 5

mand [Q R] = qr(A,0) is:

v1 = v2 =
-0.7071 0.7071
0.7071 0.7071

15 The orthonormal basis generated by the command [Q R] = qr(A,0) is:

v1 = v2 = v3 =
-0.2673 0.0514 -0.9623
0.5345 -0.8230 -0.1925
-0.8018 -0.5658 0.1925

17

H1 = H2 =
0.6245 -0.7220 -0.2744 0.1155 0.2807 0.6679 -0.3083 -0.6165
-0.7220 -0.3885 -0.5276 0.2222 0.6679 0.3798 0.2862 0.5725
-0.2744 -0.5276 0.7995 0.0844 -0.3083 0.2862 0.8679 -0.2642
0.1155 0.2222 0.0844 0.9645 -0.6165 0.5725 -0.2642 0.4716

H =
-0.2935 0.1305 -0.6678 -0.6714
-0.4365 -0.6536 -0.4053 0.4669
-0.7279 -0.1065 0.6051 -0.3043
-0.4398 0.7378 -0.1536 0.4885

Chapter 8: Matrix Normal Forms

Section 8.1: Real Diagonalizable Matrices

302
1 (a) The eigenvalues are 3 and −3 with eigenvectors (1, 1)t and (1, −1)t. (b) S = (v1 |v2) 3 The
eigenvalues are λ1 = 1 and λ2 = −1. The eigenvector associated with λ1 is v1 = (1, 1, 1)t. There
are two eigenvectors associated with λ2 : v2 = (1, 0, 0)t and v3 = (0, 1, 2)t. S = (v1 |v2 |v3). 11 The
eigenvalues of C are

ans =
-4.0000
-12.0000
-8.0000
-16.0000

and

S =
0.5314 -0.5547 0.0000 0.4082
-0.4871 0.5547 -0.4082 -0.8165
0.6199 -0.5547 0.8165 0.4082
-0.3100 0.2774 -0.4082 0.0000

13 not real diagonalizable

Section 8.2: Simple Complex Eigenvalues


' ( ' (
1−i 1+i 1 −1
1T = ;S= 5 The matrices are:
2i −2i 0 2

T =
0.9690 0.0197 + 0.3253i 0.0197 - 0.3253i
0.1840 0.0506 - 0.5592i 0.0506 + 0.5592i
0.1647 -0.4757 - 0.5935i -0.4757 + 0.5935i
S =
0.9690 0.0197 0.3253
0.1840 0.0506 -0.5592
0.1647 -0.4757 -0.5935

7 The matrices are:

T =
Columns 1 through 4
-0.1933-0.2068i -0.1933+0.2068i -0.6791+0.5708i -0.6791-0.5708i
-0.0362+0.4192i -0.0362-0.4192i 0.2735-0.3037i 0.2735+0.3037i
0.4084+0.1620i 0.4084-0.1620i 0.0881+0.0243i 0.0881-0.0243i
-0.0000-0.0000i -0.0000+0.0000i -0.0000+0.0000i -0.0000-0.0000i
-0.1933-0.2068i -0.1933+0.2068i -0.1321-0.0365i -0.1321+0.0365i

303
0.2657-0.6317i 0.2657+0.6317i 0.1321+0.0365i 0.1321-0.0365i
Columns 5 through 6
0.4205-0.1238i 0.4205+0.1238i
0.0855+0.2601i 0.0855-0.2601i
-0.1639-0.1479i -0.1639+0.1479i
-0.5203+0.1710i -0.5203-0.1710i
0.4205-0.1238i 0.4205+0.1238i
-0.4205+0.1238i -0.4205-0.1238i
S =
-0.1933 -0.2068 -0.6791 0.5708 0.4205 -0.1238
-0.0362 0.4192 0.2735 -0.3037 0.0855 0.2601
0.4084 0.1620 0.0881 0.0243 -0.1639 -0.1479
-0.0000 -0.0000 -0.0000 0.0000 -0.5203 0.1710
-0.1933 -0.2068 -0.1321 -0.0365 0.4205 -0.1238
0.2657 -0.6317 0.1321 0.0365 -0.4205 0.1238

Section 8.3: Multiplicity and Generalized Eigenvectors

1 The eigenvalues of matrix A are:

Eigenvalue Algebraic Multiplicity Geometric Multiplicity


2 1 1
3 2 1
4 1 1

3 The eigenvalues of matrix C are:

Eigenvalue Algebraic Multiplicity Geometric Multiplicity


−1 3 2
1 1 1

5 v1 = (−1, 1) and v2 = (0, 1). 7 v1 = (9, 1, −1), v2 = (−2, 0, 1), and v3 = (9, 1, −2) 9 The eigenvalue
2 has algebraic multiplicity 4 and geometric multiplicity 1.

Section 8.4: The Jordan Normal Form Theorem


   
2 1 0 0 2 1 0 0
 0 2 0 0   0 2 1 0 
   
1 Two such matrices are   and   . 3 There are six different Jordan
 0 0 2 1   0 0 2 0 
0 0 0 2 0 0 0 2
     
' √ ( −1 0 0 1 1 0 et 0 0
3+ 17
0      
normal form matrices. 5 2 √ 7 0 2 1  9  0 1 1  11  0 e−t te−t 
0 3− 17
2 0 0 2 0 0 1 0 0 e−t

304
 
3 0 0 0
 0 1 0 0 
 
15 (a)  ; (b)
 0 0 0 0 
0 0 0 −1

S =
-0.1387 -0.1543 -0.0000 -0.5774
0.1387 -0.3086 -0.4082 0.0000
0.1387 0.9258 0.8165 0.5774
-0.9707 -0.1543 0.4082 -0.5774

 
2i 0 0 0
 0 −2i 0 0 
 
17 (a)  ; (b)
 0 0 −2 + i 0 
0 0 0 −2 − i

S =
-0.2118- 0.0456i -0.2118+ 0.0456i 0.2211+ 0.0060i 0.2211- 0.0060i
-0.8548- 0.2507i -0.8548+ 0.2507i 0.8762+ 0.1803i 0.8762- 0.1803i
-0.3555- 0.0988i -0.3555+ 0.0988i 0.3529+ 0.0669i 0.3529- 0.0669i
-0.1437- 0.0531i -0.1437+ 0.0531i 0.1440+ 0.0344i 0.1440- 0.0344i

 
−1 1 0 0
 0 −1 1 0 
 
19 (a)  ; (b)
 0 0 −1 0 
0 0 0 1

S =
-1 1 0 0
-1 1 0 1
-1 0 1 1
-1 0 0 1

305
Section 1.4 — ADD SOLUTIONS TO PROBLEMS 19, 21, 23, 25

19 (0.1244, 0.8397, −0.4167, 0.3253)

21 15.5570◦

23 124.7286◦

25 147 ≈ 12.1244

Section 4.1 — ADD SOLUTION TO PROBLEM 15

?? 11:23 am

Section 13.2 ADD SOLUTION TO Problem 3



3 A rotates plane 45◦ counterclockwise and expands by 2.

Section 13.4 NEW ANSWER TO Problem 17

 
0 2 0 0
 −2 0 0 0 
 
17 (a) J =  ;
 0 0 −2 1 
0 0 −1 −2

(b)

T =
-0.2118 -0.0456 0.2211 0.0060
-0.8548 -0.2507 0.8762 0.1803
-0.3555 -0.0988 0.3529 0.0669
-0.1437 -0.0531 0.1440 0.0344

306

You might also like