Differential Equations and Linear Algebra
Differential Equations and Linear Algebra
Contents
1 Matrices 2
1.1 First definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Addition, scalar multiplication and transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Combining operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Matrix inverses 9
2.1 Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Determinants and inverses of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Powers of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
A Appendix 47
A.1 Inverses and Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 Row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1
1 Matrices
In this Section we will
• Define matrices.
• Define matrix addition, scalar multiplication and multiplication and transposition.
• Look at different types of matrices: zero, square, diagonal, symmetric, upper / lower-triangular, identity.
In their simplest interpretation, matrices are arrays of numbers used to hold information in an orderly way. They
are a useful way to store data that can be organised into rows and columns. Can you think of everyday
examples?
These arrays can be thought of ‘multi-dimensional’ real numbers and we can define matrix addition, scalar
multiplication and matrix multiplication in much the same way as for ‘ordinary one-dimensional’ real numbers.
(In fact we can think about real numbers as one-dimensional matrices.) An advantage of using matrices is
precisely this; a multi-dimensional block of data can be handled and manipulated as a single entity. Many
problems that involve large amounts of data are reduced to matrix problems before being solved on a computer.
Think about modelling, computer graphics, medical imaging, economics, electrical networks, data encryption,
etc... These all rely on the mathematics of matrices.
• The real number aij represents the (i, j)-entry of A, that is, the entry in row i and column j.
• The ith row of A is represented by (ai1 · · · ain ) for i = 1, . . . m.
• The j th column of A is represented by
a1j
a2j
for j = 1, . . . n.
..
.
amj
Examples.
π
7 −1 0 1 −1
1 ∈ M2,4 (R), ∈ M4,1 (R).
2 0 1 −1 4
0
• A and B are equal if all their corresponding entries agree, that is aij = bij for all i = 1, . . . , m and
j = 1, . . . , n. We write A = B.
• The negative of matrix A is the matrix −A = (−aij ).
2
• The zero matrix, 0m,n , is the m × n matrix in which all entries are 0. We sometimes write 0n for 0n,n
or just 0 if we don’t need to know the size.
Definition 1.3. A matrix A is called square if it has the same number of rows and columns. The diagonal of a
square n × n matrix A = (aij ) consists of the entries aii where 1 ≤ i ≤ n, that is, the entries on the NorthWest
- SouthEast diagonal.
Definition 1.4. The following definitions only apply to square matrices.
• A diagonal matrix is a square matrix A = (aij ) such that aij = 0 for i 6= j, that is, each of whose
non-diagonal entries is equal to zero.
• A square matrix with each entry below the main diagonal equal to zero is an upper-triangular matrix.
• A square matrix with each entry above the main diagonal equal to zero is an lower-triangular matrix.
• A square matrix is symmetric if it is unchanged by reflection in the diagonal, that is, if aij = aji for
1 ≤ i, j ≤ n.
Examples. Let
2 0 0 −1 0 0 1 −1 0
0 0 1 2
A= , B= , C = −1 4 0 , D = 0 4 0 , E = −1 4 6 .
0 0 0 1
5 −2 1 0 0 8 0 6 1
Then A and D are both diagonal, upper-triangular, lower-triangular and symmetric. B is upper-triangular and
C is lower-triangular. E is symmetric.
Examples.
• Given two matrices A = (aij ), B = (bij ) ∈ M2,3 (R), we define their sum:
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
A= , B= , A+B = .
a21 a22 a23 b21 b22 b23 a21 + b11 a22 + b22 a23 + b23
• We have
1 4 −1 1 0 5
+ 5 −2 = 8 −2 .
3 0
−9 7 9 4 0 11
−2 5 −2 5 −4 10
• The sum
1 4
3 1 4
0
−9 + 3 0
7
−9 7
−2 5
makes no sense.
3
Remark. Only matrices of the same size can be added together.
Theorem 1.6. For all matrices A, B and C of the same size and their corresponding size zero matrix 0, that
is, A, B, C, 0 ∈ Mm,n (R) for some m, n, we have
Proof. These properties all depend on corresponding properties in R. For example, suppose r, s, t ∈ R. Then
r + s = s + r and (r + s) + t = r + (s + t). We will give the first two proofs. You should attempt the other
proofs yourself.
Examples.
• Given a matrix A = (aij ) ∈ M2,3 (R) and a scalar r ∈ R we can define the scalar multiple of A as
a11 a12 a13 ra11 ra12 ra13
A= , r ∈ R, rA =
a21 a22 a23 ra21 ra22 ra23
• We have
1 4 3 12
3 0= 9 0
3
−9
.
7 −27 21
−2 5 −6 15
4
Theorem 1.8. For all matrices A and B of the same size (that is, A, B ∈ Mm,n (R)) and for all scalars r, s ∈ R,
we have
• rA ∈ Mm×n (R) (Closure),
• (rs)A = r(sA) (Associative),
• (r + s)A = rA + sA (Distributive),
• r(A + B) = rA + rB (Distributive).
Proof. Again, these properties follow from the properties of the real numbers. The first property follows from
the definition. We will prove the last one here. You should try and prove the others yourself.
Let A = (aij ), B = (bij ) ∈ Mm,n (R) and r ∈ R. Let D = r(A + B) and E = rA + rB. Now A + B ∈ Mm,n (R)
by Theorem 1.6 and hence r(A + B) ∈ Mm,n (R) by the first part of Theorem 1.8. To show D and E are the
same, we need to show that the entry in row i and column j of D is the same as the entry in row i and column
j of E (for 1 ≤ i ≤ m and 1 ≤ j ≤ n). Write D = (dij ) and E = (eij ) so that dij is the entry in row i
and column j of D and eij is the entry in row i and column j of E. The (i, j)-entry of A + B is aij + bij so
the (i, j)-entry of D is r(aij + bij ). The (i, j)-entry of rA is raij and the (i, j)-entry of rB is rbij , hence the
(i, j)-entry of rA + rB is raij + rbij . But by the properties of real numbers, r(aij + bij ) = raij + rbij . Hence
D = E as required.
Definition 1.9. The transpose of an m × n matrix A = (aij ) is the n × m matrix AT whose (i, j)th entry is
the (j, i)th entry of A. That is AT = (aji ).
Example. The matrix
0 2 3
A=
4 1 0
has transpose
0 4
AT = 2 1 .
3 0
Theorem 1.10. Let A, B ∈ Mm×n (R). Then the following results hold:
• (AT )T = A,
• (A + B)T = AT + B T .
Proof. The first identity follows from the definition. Let A = (aij ) and B = (bij ) and set D = AT + B T
and E = (A + B)T . Then AT , B T ∈ Mn,m (R) so AT + B T ∈ Mm,n (R). Also, A + B ∈ Mm,n (R) so
(A + B)T ∈ Mn,m (R). Now let 1 ≤ k ≤ n and 1 ≤ l ≤ m. We want to show that the (k, l)-entry of D is equal
to the (k, l)-entry of E. The (k, l)-entry of AT is alk and the (k, l)-entry of B T is blk . So the (k, l)-entry of
D is alk + blk . The (k, l)-entry of E = (A + B)T is the (l, k)-entry of (A + B), that is alk + blk . So indeed
D = E.
Remark. Using this definition, we see that a matrix A is symmetric if and only if AT = A. This is often taken
to be the definition of a symmetric matrix.
d11 = a11 b11 + a12 b21 + a13 b31 , d12 = a11 b12 + a12 b22 + a13 b32 ,
d21 = a21 b11 + a22 b21 + a23 b31 , d22 = a21 b12 + a22 b22 + a23 b32 .
5
Definition 1.11. In general, suppose that A = (aij ) and B = (bij ) are matrices with A ∈ Mm,n (R) and
B ∈ Mn,p (R). We define the matrix product AB to be the m × p matrix where the entry in row i and column
j is given by
Xn
aik bkj .
k=1
Remark. It is absolutely essential that the sizes are compatible. The number of columns of A must be equal
to the number of rows of B. In the first example in this section, A is a 2 × 3 matrix and B is an 3 × 2 matrix,
yielding AB which is a 2 × 2 matrix. If the sizes are not compatible, the product is undefined.
Examples. Let
−2 2 3
A= , B= 2 −1 .
−1 1 0
Then A is a 2 × 3 matrix and B is a 1 × 2 matrix.
• The matrix D = AB is not defined. A ∈ M2,3 (R) and B ∈ M1,2 (R) so that A has three columns, but B
has one row.
• The matrix D = BA is defined, as B ∈ M1,2 (R) and A ∈ M2,3 (R) so that the number of columns of B
is equal to the number of rows of A. Since B is a 1 × 2 matrix and A is a 2 × 3 matrix, BA is a 1 × 3
matrix and
−2 2 3
BA = 2 −1 = −3 3 6 .
−1 1 0
The working behind this is as follows:
A visualisation
The following diagram can help us picture our calculations for matrix multiplication. We can interpret the
coefficient dij as the matrix product of row i of A ∈ Mm,n (R) and column j of B ∈ Mn,p (R):
b11 · · · b1j ··· b1p
.. .. ..
. . .
bn1 · · · bnj ··· bnp
a11 · · · a1n ∗ ··· ∗ ··· ∗
.. .. .. .. ..
. . . . .
ai1 · · · ain ∗ ··· dij ··· ∗
.. .. .. .. ..
. . . . .
am1 · · · amn ∗ ··· ∗ ··· ∗
where dij = ai1 b1j + . . . + ain bnj and D = AB = (dij ) ∈ Mm,p (R).
Examples.
• Repeating the previous example, we can use the following diagram to help us:
−2 2 3
−1 1 0
2 −1 −3 3 6
6
4 −1 3 −2
• Suppose we want to find . We use the diagram again:
−8 5 4 1
3 −2
4 1
4 −1 8 −10
−8 5 −4 21
Remark. Matrix multiplication is not commutative! - in general, AB 6= BA. In fact AB could be a different
size to BA. But even if AB and BA are the same size, it is unlikely that they are equal.
Examples.
• Let
1
A= 4 and B= 1 7 9 .
4
Then
1 7 9
AB = 4 28 36 while BA = (65).
4 28 36
• Let
3 4 0 1
A= and B= .
1 2 −1 −1
Then
−4 −1 1 2
AB = while BA = .
−2 −1 −4 −6
Remark. Another interesting property is that it is perfectly possible to have AB = 0, the zero matrix, without
having A = 0 or B = 0.
1 2 4 6 0 0 −2 −4
Example. Let A = and B = . Then AB = and BA = .
−1 −2 −2 −3 0 0 1 2
7
Observe that if A is an m × n matrix AT is and n × m matrix and then AAT is an m × m matrix and AT A is
an n × n matrix.
Example. The matrix
0 2 3
A=
4 1 0
has transpose
0 4
AT = 2 1 .
3 0
We have
16 4 0
T 13 2
AA = , AT A = 4 5 6 .
2 17
0 6 0
We leave the following theorem as an exercise - try to prove it yourself!
Theorem 1.13. Let A ∈ Mm×n (R) and B ∈ Mn×p (R). Then (AB)T = B T AT .
Theorem 1.14. Let B ∈ Mn,n (R). Then BB T is symmetric.
Proof. Let B = (bij ) and write B T = (cij ). That is, the (i, j)-entry of B T is cij = bji for 1 ≤ i, j ≤ n. The
(x, y)-entry of BB T is given by
Xn Xn
bxz czy = bxz byz .
z=1 z=1
Earlier we defined the zero matrix 0m,n which is the additive identity when considering matrix addition. We
can also define a matrix (corresponding to the number 1 for the real numbers) as follows:
Definition 1.15. The matrix In ∈ Mn,n (R) given by
1 0 0 0
0 1 0 0
In = (δij ) =
.. .. .. ..
. . . .
0 0 ... 0 1
where
1 if i = j,
δij =
0 otherwise,
is the n × n identity matrix.
Theorem 1.16. Let A ∈ Mm,n (R) and Im and In represent the m × m and n × n identity matrices respectively.
Then
Im A = AIn = A.
In particular, if A ∈ Mn,n (R) then In acts as a left and right multiplicative identity.
Proof. Let A = (aij ) and consider Im A. In particular, the (i, j)-entry of Im A is given by
m
X
δik akj = aij
k=1
8
1.4 Combining operations
It is clear that we could keep combining these operations above, provided the matrices in question have the
right size. For example, if A, B, C ∈ Mn,n (R) then we also have
(A + B T )C − 2B + (A + B T )(C + 5C T ) ∈ Mn,n (R).
The distributive rules that we’ve seen previously shows this is well-defined.
Note that we can define powers of square matrices. Suppose A ∈ Mn,n (R). We have
A0 = I n , A1 = A, and we recursively define Al+1 = (Al )A for l ≥ 1.
So A2 = AA, A3 = (AA)A, . . .. We will consider A−1 , A−2 , . . . later in the module.
Example. Let A ∈ M2,2 (R). Consider
(A + I2 )(A − I2 ) = A2 − AI2 + I2 A − I2 I2 = A2 − A + A − I2 = A2 − I2 .
2 Matrix inverses
In this Section we will
• Define what is meant by an invertible matrix.
• Define the determinant of a matrix and look at ways of computing it.
• Learn to determine whether or not a given matrix is invertible in terms of its determinant. Learn how to
compute the inverse of an invertible matrix in terms of its adjugate.
• Consider powers of square matrices, in particular powers of invertible matrices.
9
Remark. We will return later to the problem of deciding whether or not a matrix is invertible.
Theorem 2.2. If B and B 0 are inverses of A then B = B 0 . In other words, if an inverse of A exists then it is
unique and will be denoted by A−1 .
Proof. Suppose that B and B 0 are both inverses of A. Then
B = In B by Theorem 1.16
0
= (B A)B by definition of the inverse
0
= B (AB) by associativity
0
= B In by definition of the inverse
0
=B by Theorem 1.16
AX = AX 0 =⇒ X = X 0 and Y A = Y 0 A =⇒ Y = Y 0 .
Proof. Suppose AX = AX 0 . By the property of the identity matrix, and the associative property of matrix
multiplication we have
(AB)−1 = B −1 A−1 .
Proof. We have
by associativity and the property of the identity matrix. Similarly (B −1 A−1 )(AB) = In . Hence AB is invertible
with (AB)−1 = B −1 A−1 .
Remark. This result can be extended. Let A1 , A2 , . . . Ak be invertible matrices of the same size. Then the
product A1 A2 . . . Ak is invertible with
10
2.2 Determinants
The determinant of a square matrix assigns a number which characterises certain properties of the matrix. In
particular, we will prove that a matrix is invertible if and only if its determinant function is non-zero. We will
see another characterisation of the determinant in Section 6.1.
Definition 2.8. The determinant of a 1 × 1 matrix A = a is
det A = a = a.
a b
Definition 2.9. The determinant of a 2 × 2 matrix A = is
c d
a b
det A = = ad − bc.
c d
1 2
Example. If A = then
3 4
1 2
det A =
= (1 × 4) − (2 × 3) = −2.
3 4
Inverses of 2 × 2 matrices
We shall return to proving this later, but as many of you know, if a 2 × 2 has non-zero determinant then it is
invertible since we can calculate the inverseusingdeterminants as follows.
a b
To find the inverse of a 2 × 2 matrix A = with det A 6= 0:
c d
Example.
1 2 1
3 1 −1 = 1 1 −1 − 2 3 −1 + 1 3 1 = 1(2) − 2(1) + 1(5) = 5.
1 1 −2 1 −2 1
−2 1 1
11
Determinant of an n × n matrix
In order to formally define the determinant function for any square n × n matrix we need to introduce a couple
more things first. As you may have guessed from the definitions of 2×2 and 3×3 matrices, the definition can be
thought of inductive, in that the determinant of an n×n matrix is made up of a combination of (n−1)×(n−1)
matrices, which are each in turn made up of combinations of (n − 2) × (n − 2) matrices, and so on!
Definition 2.11. Let A = (aij ) ∈ Mn,n (R). For 1 ≤ i, j ≤ n, we define the ij th minor of A to be the
(n − 1) × (n − 1) matrix obtained by deleting the ith row and the j th column from A. We denote the minor by
Aij . In some texts the minor is called the submatrix.
1 2 3 4
2 3 4 1
Example. We find some minors for the matrix A =
3
. For example,
4 1 2
4 1 2 3
3 4 1 1 2 4
A11 = 4 1 2 and A23 = 3 4 2 .
1 2 3 4 1 3
Definition 2.12. Let A = (aij ) ∈ Mn,n (R). The cofactor Cij associated with the entry aij is
Definition 2.13. If A = (a) is a 1 × 1-matrix then its determinant is det A = |a| = a. Suppose n ≥ 2. The
determinant of an n × n-matrix A = (aij ) is
a11 a12 . . . a1n
n
a21 a22 . . . a2n X
det A = . .. = a1j C1j .
..
.. . .
j=1
an1 an2 . . . ann
Remark. In this definition we are expanding along the top row and using the cofactors of the entries in the top
row. There are alternative expansions for the determinant that collect the terms in different ways which we will
see shortly.
Examples.
a b
• Suppose A = . Then
c d
12
a11 a12 a13
• Suppose A = a21 a22 a23 . Then
a31 a32 a33
Example.
1 −1 0
1 0 5 0 5 1
− (−1) 0 −2 + 0 0 −1 = 1(−2) + 1(−10) + 0 = −12.
5 1 0 = 1
0 −1 −2 −1 −2
We now look at ways of simplfying the calculation of determinants. We are not yet in a position to prove the
second result below, so for the moment we shall simply accept that it is true. We shall give a proof later in the
lecture course when we are familiar with elementary matrices.
Theorem 2.14. Suppose A, B ∈ Mn,n (R) and let In denote the identity matrix. Then the following hold:
1. det In = 1
2. det AT = det A.
3. det(AB) = (det A)(det B)
Proof. The first statement is clear, using induction on n. The proof of (2) is quite lengthy so we give it as the
proof of Theorem A.7 in the Appendix. To prove (3), we need to consider elementary matrices which we shall
define later. We prove (3) in Theorem 5.13.
The following operations are the elementary row operations. We will be seeing quite a lot more of them when
we look at solving systems of linear equations and performing Gauss-Jordan elimination later in the course. Note
that in this definition, A does not need to be square.
Definition 2.15. Suppose that A ∈ Mm,n (R). There are three elementary row operations that we can perform
on A.
13
3. Change one row by adding to it a multiple of another.
Theorem 2.16. Let A be an n×n matrix and let B be the matrix obtained from A after applying an elementary
row operation.
1. If B is obtained from A after interchanging two rows of A, then det B = − det A.
2. If B is obtained from A after multiplying a row of A by 0 6= k ∈ R, then det B = k det A.
3. If B is obtained from A after adding k times one row of A to another, then det B = det A.
Proof. Again, these proofs are quite complicated. You can find them as Theorems A.4, A.5 and A.6 in the
Appendix.
By transposing the matrix, we have the same results for column operations.
Theorem 2.17. Let A be an n × n matrix.
1. If B is obtained from A after interchanging two columns of A, then det B = − det A.
2. If B is obtained from A after multiplying a column of A by 0 6= k ∈ R, then det B = k det A.
3. If B is obtained from A after adding k times one column of A to another, then det B = det A.
Proof. Combine Theorem 2.16 with Theorem 2.14 (2).
1 −1 1
Examples. Consider the matrix A = 2 0 −1. Calculating the determinant we get det A = 8. Consider
1 1 2
what happens when we consider the determinants of matrices that are obtained from A after performing an
elementary row operation.
1 −1 1
• Let B = 4 0 −2 be the matrix obtained from A after multiplying the second row by 2 (which we
1 1 2
write as 2r2 ). We have det B = 16 = 2 det A.
2 0 −1
• Let C = 1 −1 1 be the matrix obtained from A after the elementary row operation of swapping
1 1 2
rows 1 and 2 (which we write as r1 ↔ r2 ). We have det C = −8 = − det A.
1 −1 1
• Let D = 0 2 −3 be the matrix obtained from A after the elementary row operation of subtracting
1 1 2
2 copies of row 1 from row 2 (which we write as r2 − 2r1 ). We have det D = 8 = det A.
There are other rules for computing determinants. Although we don’t need them to compute determinants,
they often make our lives easier. First recall the recursive definition:
n
X
det A = a1j C1j .
j=1
Using Theorem 2.16 (1), we realise that we don’t need to expand along the first row, any row will do so long
as we change the sign appropriately. Futhermore, using Theorem 2.14 (2), we realise that we can expand down
columns instead.
Theorem 2.18. Let A ∈ Mn,n (R). Suppose 1 ≤ i ≤ n. Then
n
X
i+1
det A = (−1) aij Cij .
j=1
14
Example. Expanding along the second row:
2 −3 5
−3 5 2 5 2 −3
0 1 0 = −0
+ 1 3 −3 − 0 3 −1 = −21.
3 −1 −3 −1 −3
Note the change of sign because we are going along row 2. Also note that we don’t need to compute the
cofactors C21 and C23 because of the 0 in front of them.
Sometimes it is easy to tell if a matrix has zero determinant, which can save us a lot of work.
Theorem 2.19. Let A be a square matrix. Then det A = 0 if any of the following hold:
Proof. The first result follows from Theorem 2.18. To get the last, apply Theorem 2.16 (3) (or Theoerem 2.17
(3)) to get a matrix with a row (or column) or zeros.
Example. Subtracting the second row from the first row of a matrix does not change its determinant.
3 −2 3 1 0 0
−2 3
2 −2 3 = 2 −2 3 =
= −22.
1 0 11
0 11 1 0 11
Theorem 2.20. Let A be an n × n matrix. Then A is invertible if and only if det A 6= 0. Furthermore if A is
invertible and has inverse A−1 then det A−1 = det1 A .
Remark. We are not yet able to prove this result. The first part will be proved as Theorem 5.12 and the second
part is Corollary 5.14. But we can prove part of the Theorem, that is, if det A 6= 0, we will now show that A is
invertible.
Definition 2.21. For an n × n matrix A we can define the adjugate of A, denoted by adj A, as follows
C11 C21 . . . Cn1
C12 C22 . . . Cn2
adj A = (Cji ) = .
..
.. .
C1n C2n . . . Cnn
Remark. In the definition of adj A, note the transposition, that is, note that the (i, j)-entry of adj A is Aji .
Theorem 2.22. Suppose that A ∈ Mn,n (R) and that det A 6= 0. Then we can find the inverse of A as follows
1
A−1 = adj A.
det A
Proof. We give a self-contained proof of this in Theorem A.2 in the Appendix.
15
−2 3 2
Example. Find the inverse of A = 6 0 3 . We need to find adj A and det A. We have det A = 72.
4 1 −1
We calculate the first three cofactors and leave the rest for you to check.
0 3 6 3 6 0
C11 = = −3, C12 = − = 18, C13 = = 6,
1 −1 4 −1 4 1
and continuing with these calculations we get
−3 5 9
adj A = 18 −6 18 .
6 14 −18
Therefore,
−3 5 9
1
A−1 = 18 −6 18 .
72
6 14 −18
Note that the entries -3,18,6 which we calculated above which are the cofactors of the first row of A appear in
the first column of adj A. It is always a good idea to check your solution: Verify that
−3 5 9 −2 3 2 1 0 1
1
A−1 = 18 −6 18 6 0 3 = 0 1 0 .
72
6 14 −18 4 1 −1 0 0 1
We’ll see another way of computing the inverse of a matrix in Section 5.2.
A0 := In , A1 := A, A2 := AA, A3 := A2 A
and so on, that is, Ak+1 := Ak A, for any k ≥ 0. If in addition A is invertible and k ≥ 1 then define
A−k := (A−1 )k .
−1
Notice, as we have Ak A−k = A...A
| {z } |A {z...A−1} = In , we conclude that A−k is the inverse of Ak .
k k
Lemma 2.24. Suppose A is a square matrix and r and s are both integers. Then
Examples.
2
1 2 7 10
• = .
3 4 15 22
2
0 1 0 0
• = .
0 0 0 0
• Simplify (A + B)2 − (A − B)2 . We have
16
1 −1
• Let A = . Let us find Ak for all k ≥ 1. (Note that A is not invertible.)
−1 1
1 −1
A= ,
−1 1
1 −1 1 −1 2 −2 1 −1
A2 = = =2
−1 1 −1 1 −2 2 −1 1
3 2 −2 1 −1 4 −4 1 −1
A = = =4 .
−2 2 −1 1 −4 4 −1 1
So we conjecture that Ak = 2k−1 A. To give a proof, we use induction. For n ≥ 1, set P (n) to be the
statement: n
1 −1 n−1 1 −1
=2 .
−1 1 −1 1
1
1 −1 1 −1
Base Step: Suppose that k = 1. The left hand side is = and the right hand
−1 1 −1 1
1 −1 1 −1
side is 20 = . So P (1) is true.
−1 1 −1 1
Inductive Step: Suppose that P (k) holds. Then
k+1 k
1 −1 1 −1 1 −1
=
−1 1 −1 1 −1 1
k−1 1 −1 1 −1
=2
−1 1 −1 1
2 −2
= 2k−1
−2 2
(k+1)−1 1 −1
=2 .
−1 1
If a matrix satisfies certain equations, we can deduce facts about other matrices related to it.
Example. Let A be an n×n matrix and let In denote the n×n identity matrix. Suppose that A3 +2A2 +3A = In .
Then A is invertible, and
A−1 = A2 + 2A + 3In .
To see this, note that
Example. Let A be an n × n matrix and let In denote the n × n identity matrix and 0n the n × n zero matrix.
Suppose that A2 − 2A + In = 0n . Then A − In is not invertible.
To see this, note that
0n = A2 − 2A + In = (A − In )2 .
Now it is not necessarily true that A − In = 0n . (This is a very common misconception! - however, recall
from Section 1.3 that we can have CD = 0n even if C 6= 0n and D 6= 0n .) Suppose for a contradiction that
A − In is invertible with inverse B. Then
0n = 0n B 2 = (A − In )2 B 2 = In
17
3 Simultaneous Linear Equations
In this section we shall begin with familiar linear equations in two or three unknowns, that is, linear equations
in R2 and R3 . So for R2 we are thinking about straight lines we can draw on an x − y - axis in two dimensions,
and for R3 we are thinking about level planes that we (or a computer package like Maple!) can plot on an
x − y − z - axis in three dimensions. (The diagrams in this section were drawn using the LaTeX package Tikz.)
In this Section we will
• Use examples to understand the connection between the solutions of systems of simultaneous linear
equations and the intersection of lines and planes in R2 and R3 .
• Define what is meant by a system of simultaneous linear equations in Rn and its solution set.
• See examples of consistent, inconsistent, homogeneous and inhomogenous systems and find the solution
set in some easy cases.
• Define the matrix form of a system of linear equations.
• Learn Cramer’s rule for finding the solution to a consistent set of n linear equations in n unknowns.
Examples.
• The system
x − y = −1
x + 2y = 5
has the unique solution x = 1, y = 2, corresponding to the unique point of intersection (1, 2) of the two
lines in R2 .
18
y
• The system
−4x − 2y = −8
2x + y = 4
has infinitely many solutions, (k, 4 − 2k) where k ∈ R. The two equations represent the same line in R2 .
19
• The system
x + y + z = 1
2x + 2y + 2z = 2
has infinitely many solutions, as the two equations represent the same plane in R3 . Each set of values
for x, y and z satisfying x + y + z = 1 is a solution to this system, such as x = 1, y = 0, z = 0 and
x = −2, y = 4, z = −1.
• The system
x + y + z = 1
x + y = 0
has infinitely many solutions - the planes in R3 intersect in a line. The z-coordinate of each point is 1, so
the line lies above the (x, y)-plane. Each set of values for x, y and z satisfying x + y = 0 and z = 1 is a
solution to this system, such as x = 1, y = −1, z = 1 and x = −3, y = −3, z = 1.
Three equations
Given three linear equations ax + by + cz = d, ex + f y + gz = h and ix + jy + kz + l it is possible to solve
these simultaneously to find values for x, y and z. These values (or coordinates if you prefer) correspond to
the points of intersection of these three planes in R3 . We use the following notation to represent our system of
simultaneous linear equations:
ax + by + cz = d
ex + f y + gz = h
ix + jy + kz = l
Thinking about three planes in R3 , what are the situations that can occur?
Examples.
• The system
x + y + z = 1
x + y = 0
x − z = 0
has the unique solution x = 1, y = −1, z = 1 corresponding to the unique point of intersection (1, −1, 1)
of the three planes in R3 .
20
• The system
x + y + z = 1
2x + 2y + 2z = 2
−x − y − z = −1
has infinitely many solutions, as the three equations represent the same plane in R3 .
• The system
x + y + z = 1
x + y = 0
x + y − z = −1
• The system
x + y + z = 1
x + y + z = 2
x + y − z = −1
21
• The system
x + y = 1
x + z = 1
− y + z = 1
has no solutions. The planes intersect in pairs and so there are no points common to all three planes.
a1 x1 + a2 x2 + . . . an xn = b,
where a1 , a2 . . . an , b ∈ R and a1 , a2 . . . an are not all zero, is a linear equation in the n unknowns x1 , x2 . . . xn .
The coefficients of the equation are represented by ai ∈ R. The constant term of the equation is represented
by b ∈ R. The unknowns of the equation are represented by xi ∈ R.
• a1 x1 + a2 x22 + . . . + a5 x55 = b
• sin x1 + cos x2 = 1
22
Definition 3.2. A system of m simultaneous linear equations in n unknowns, x1 , . . . xn , consists of m
linear equations
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn
= b2
.. .. .. .. ..
. . . . = .
am1 x1 + am2 x2 + . . . + amn xn = bm
where aij , bi ∈ R for i = 1, . . . m and j = 1, . . . n. The coefficients are represented by aij ∈ R. The constant
terms are represented by bi ∈ R. The unknowns are represented by xj ∈ R.
Remark. As we have seen in our examples of systems of linear equations in R2 and R3 the number m of
equations need not be the same as the number n of unknowns.
Examples.
• The system
x1 − 3x2 = 2
4x1 − x2 = −4
−2x1 − 2x2 = 1
are the triples of values for x, y and z satisfying x + y = 1 and z = 0. We can rearrange the equation
to y = 1 − x, so given any value of x we can find the corresponding value for y. We write this general
solution as
x = k, y = 1 − k, z = 0, k ∈ R.
The solution set can be written more formally as
{(k, 1 − k, 0) : k ∈ R}
and has infinitely many members.
23
Definition 3.4. A system of simultaneous linear equations is consistent when it has at least one solution. The
system is inconsistent when it has no solutions.
So, in our examples above the first and third systems are consistent, the second system is inconsistent.
Definition 3.5. A homogeneous system of simultaneous linear equations is a system in which each constant
term is equal to zero, that is bi = 0 for all i = 1, . . . m. A system containing at least one non-zero constant
term is a non-homogeneous system.
Definition 3.6. The trivial solution to a homogeneous system of simultaneous linear equations is the solution
with each unknown equal to zero, that is, xj = 0 for 1 ≤ j ≤ n. A solution with at least one non-zero unknown
is a non-trivial solution.
Examples.
There are infinitely many solutions including the trivial solution x = y = z = 0 and the non-trivial
solutions x = 1, y = −1, z = 0 and x = −2, y = 2, z = 0.
Remark. Note that if the system is non-homogeneous, setting xj = 0 for all j cannot be a solution. So the
trivial solution is only possible for non-homogeneous systems. But if the system is homogeneous, the trivial
solution is always a solution.
Remark. Given any system of m linear equations in n unknowns, the solution set of the system will be of one
of three forms: it contains exactly one solution, it contains infinitely many solutions or it is the empty set.
Remark. If our equations are in m ≤ 3 unknowns, we can think about them as representing points (one
unknown), lines (two unknowns) or planes (3 unknowns). Then solving the simultaneous equations corresponds
to finding the intersection of these objects, as in the first examples in this section. However
The methods we will see in the next section for solving systems of linear equations work equally well if we work
over C instead of R. However, in this course we’ll just be working over R.
24
3.3 Matrix form of a system of linear equations
We shall now consider the matrix form of a system of linear equations.
Given any system of simultaneous linear equations
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn
= b2
.. .. .. .. ..
. . . . = .
am1 x1 + am2 x2 + . . . + amn xn = bm
we can express these as a matrix product Ax = b where we have the coefficient matrix denoted by
a11 a12 a13 · · · a1n
a21 a22 a23 · · · a2n
A= . .. = (aij ),
.. ..
.. . . .
am1 am2 am3 · · · amn
We can use matrix inverses to give us another method for solving certain systems of linear equations.
Example. Consider the system
2x + 4y = 10
4x + y = 6.
This system may be expressed in matrix form as
2 4 x 10
= .
4 1 y 6
1 2
2 4 − 14 7
The coefficient matrix is invertible, with inverse 2 .
4 1 7 − 17 .
Multiplying both sides of the matrix form of the system on the left by the inverse of the coefficient matrix we
obtain 1 2
1 2
x − 14 7 2 4 x − 14 7 10 1
= 2 1 = 2 1 =
y 7 −7 4 1 y 7 −7 6 2
which gives us the unique solution
x 1
= .
y 2
Theorem 3.7. Let A be an invertible matrix. Then the system of linear equations Ax = b has the unique
solution x = A−1 b.
Proof. Since A is invertible, there exists A−1 such that A−1 A = In . Then
25
3.4 Cramer’s Rule
Cramer’s rule is another method for solving n equations in n unknowns which works when the system has a
unique solution. So if you are solving the system Ax = b, you should check that det A = 0 before trying to
apply it.
We won’t prove that Cramer’s rule works, but if you are interested you can find a proof online.
Cramer’s Rule for systems of two linear equations in two unknowns
We shall state Cramer’s Rule in general as Theorem 3.9, but first let us consider it for simple systems. Let our
system be given by
ax + by = e
cx + dy = f
Theorem 3.8. If Ax = b is a system of two equations in two unknowns such that det A 6= 0, then the system
has a unique solution given by:
ed − bf af − ce
x= , y= .
ad − bc ad − bc
This is easy to verify - we know there is a unique solution and substititing the values of x and y above shows
that these values are indeed a solution. You will observe that the denominators
in bothcases are det A and the
e b a e
numerators are the determinants of the two matrices A1 = and A2 = respectively.
f d c f
We have
1 2 5 2 1 5
A= , A1 = , A2 =
3 4 9 4 3 9
Calculating det A = −2 and det A1 = 2 and det A2 = −6 we get
det A1 2 det A2 −6
x= = = −1 , y= = = 3.
det A −2 det A −2
We shall now extend Theorem 3.8 which stated Cramer’s Rule for 2 × 2 matrices. As mentioned previously,
Cramer’s Rule allows us to calculate the unique solution of a system of linear equations when the determinant
of the coefficient matrix A is non-zero.
Theorem 3.9. If Ax = b is a system of n linear equations in n unknowns such that det A 6= 0, then the system
has a unique solution. This solution is
where Aj is the matrix obtained by replacing the entries in the j th column of A by the entries in the matrix
b1
b2
b=.
..
bn
.
26
We have
1 0 2 6 0 2 1 6 2 1 0 6
A = −3 4 6 A1 = 30 4 6 A2 = −3 30 6 A3 = −3 4 30
−1 −2 3 8 −2 3 −1 8 3 −1 −2 8
• Look at performing elementary row operations to transform a system of linear equations and hence solve
them.
• Define the augmented matrix corresponding to a system of linear equations.
• Define when a matrix is in row-reduced echelon form.
• Describe an algorithm (Gauss-Jordan elimination) to turn a matrix into a matrix in row-reduced echelon
form using elementary row operations.
• Explain how to find the solution space of a system of linear equations by applying Gauss-Jordan elimination
to the augmented matrix.
• Define the row rank of a matrix and see how this relates to the solution set of a system of equations.
In this section, we shall explore ways of solving systems. We will look at a systematic method - an algorithm -
for solving systems in the next section.
Lemma 4.1. The following operations do not change the solution set of a system of linear equations.
1. It is clear that changing the order that we write the equations in does not change the solution set.
2. Let k ∈ R \ {0}. Consider the equations
27
If x1 , x2 , . . . , xn satisfy
Conversely if x1 , x2 , . . . , xn satisfy
We can check that x1 , . . . , xn satisfy (1) if and only if they satisfy (2).
Below are some examples: We will explain in the next section an algorithm that will always work to give the
solution set.
Systems with exactly one solution
We shall begin by using examples of systems which yield exactly one solution. These are the best systems to
start practising with as you will see.
Examples.
We have labelled the equations r1 , r2 in order that we can describe what we are doing at each stage.
Firstly, we want to eliminate x from equation r2 :
x − y = −1
3y = 6 r2 − 2r1
We have transformed our original system into an equivalent simpler system and we can read off our
unique solution x = 1, y = 2.
• Solve the following system of three linear equations in three unknowns:
x + y + z = 1 r1
x + y = 1 r2
x − z = 0 r3
28
x + y + z = 1
− z = 0 r2 − r1
− y − 2z = −1 r3 − r1
We interchange r2 and r3 :
x + y + z = 1
− y − 2z = −1 r2 ↔ r3
− z = 0
By using the elementary operations we have transformed our original system into an equivalent simple
system. The unique solution is x = 0, y = 1, z = 0.
Systems with infinitely many solutions
So far, the examples we have done have yielded exactly one solution and it has been easy to read off the solutions
for the unknowns. What happens when we perform the elementary operations on a system with infinitely many
solutions?
Examples.
• Solve the following system of two linear equations in two unknowns:
−4x − 2y = −8 r1
2x + y = 4 r2
We can multiply r1 by − 21 :
− 21 r1
2x + y = 4
2x + y = 4
Eliminate both x and y by subtracting r1 from r2 :
2x + y = 4
0x + 0y = 0 r2 − r1
We are left with a single equation in terms of x and y. Rearranging we get y = 4 − 2x. So for any value
of x we are able to calculate a corresponding value of y. We can write the general solution as:
x = k, y = 4 − 2k, k ∈ R.
Formally, the solution set to the system is written:
{(k, 4 − 2k) : k ∈ R}
29
• Solve the following system of three linear equations in three unknowns:
x + y + z = 1 r1
x + y = 1 r2
x + y − z = 1 r3
Add r1 to r3 :
z = 0
x + y = 1
0x + 0y + 0z = 0 r3 + r1
We see that we have z = 0 and an equation in terms of x and y. Rearranging this equation we get
y = 1 − x. So for any value of x we are able to calculate a corresponding value of y. We can write the
general solution as:
x = k, y = 1 − k, z = 0, k ∈ R.
Formally, the solution set to the system is written:
{(k, 1 − k, 0) : k ∈ R}
Examples.
Subtract r1 from r2 :
x − y = −1
0 = 2 r2 − r1
We are left with an impossible equation 0 = 2 in r2 which can never be true, hence there are no solutions
to this system.
• Solve the following system of three linear equations in three unknowns:
x + y + z = 1 r1
x + y + z = 2 r2
x + y − z = 0 r3
Again, we are left with an impossible equation 0 = 1 which can never be true, hence there are no solutions
to this system.
30
4.2 Solving systems of equations II
Using elementary operations on a small system of equations is doable, but I expect you have probably thought
already it is somewhat cumbersome and prone to error. Luckily we can abbreviate a system of linear equations
by writing its coefficients and constants in the form of a matrix, called the augmented matrix of the system.
Using this matrix allows us (and computers) to more efficiently solve systems of simultaneous linear equations.
This matrix is called the augmented matrix of the system. (Some texts refer to it as the extended matrix of
the system.) The word augmented reflects the fact that this is made up of a matrix formed by the coefficients
of the system augmented by a matrix formed by the constants of the system.
Remark. We draw a line down the matrix to separate the coefficients aij from the constants bi . This won’t
affect any of the maths that we do.
Examples.
• The system
−x + 7y − z = 1 −1 7 −1 1
x − y = −1 has augmented matrix 1 −1 0 −1 .
− 3y − z = 2 0 −3 −1 2
• The system
−x1 + 7x2 − x3 + 2x4 = 3 −1 7 −1 2 3
has augmented matrix .
x1 − x2 − x4 = 0 1 −1 0 −1 0
The three elementary operations that we use on a system of linear equations correspond exactly to the three
elementary row operations on the rows of the augmented matrix of the system. Recall Definition 2.15:
Suppose that A ∈ Mm,n (R). The elementary row operations on A are the following operations:
We use the notation ri ↔ rj to indicate that we are swapping row i and row j. We use the notation ri 7→ kri
to indicate that we are multiplying row i by k 6= 0. We use the notation ri 7→ ri + lrj to indicate that we are
adding l copies of row j to row i, for some l ∈ R. Sometimes we can perform multiple operations at once (but
be careful with this and go slowly if you’re not sure).
Definition 4.3. We say that two matrices A and B are row equivalent if matrix B can be obtained from
matrix A by a finite number of elementary row operations.
31
2 0 −1 2 −1 1
Example. The matrices 4 −4 2 and 4 −4 2 are row equivalent. We can see this by performing
6 −1 3 0 2 −2
the following elementary row operations:
2 0 −1 r 7→r − 1 r 0 2 −2 r 7→ 1 r 0 2 −2 2 −1 1
1 1 2 3 3 r1 ↔r3
4 −4 2 −− −−−−2−→ 4 −4 2 −−−−5−→ 4 −4 2 −− −−→ 4 −4 2
r3 7→r3 +r2
6 −1 3 10 −5 5 2 −1 1 0 2 −2
The elementary row-operations will come in useful when we want to solve systems of linear equations. Informally,
this is because
• Performing elementary row operations on an augmented matrix does not change the solution set.
• If an augmented matrix is in row-reduced echelon form then it is easy to read off the solutions.
• The first non-zero entry in any row of A is called the leading entry of that row. (Also referred to as the
leading coefficient or as the pivot of the row.)
• A is in row echelon form if
Examples. The following matrices are all in row-reduced echelon form with the leading entries in red.
0 1 0 2 0 −1 1 0 1 0
1 0 0
0 1 0 ,
0
0 1 −2 0 2 ,
0
1 −5 0
0 0 0 0 1 2 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 0
Examples. The following matrices are all in row echelon form but NOT in row-reduced echelon form, with the
leading entries in red.
0 1 0 2 −1 −1 1 0 1 −1
1 0 0
0 2 0 ,
0
0 2 −2 0 2 ,
0
1 −5 3
0 0 0 0 1 2 0 0 0 1
0 0 3
0 0 0 0 0 0 0 0 0 0
Examples. The following matrices are not in row echelon form (and therefore not in row-reduced echelon form
either) with the leading entries in red.
0 1 0 2 −1 −1 1 0 1 −1
1 0 0
1 0 0 ,
0
0 1 −2 0 2 ,
0
0 0 1
0 0 0 0 0 0 0 1 −5 3
0 0 1
0 0 0 0 1 2 0 0 0 0
32
Although the next result is simple, we shall use it many times in the next section.
Lemma 4.5. Suppose A ∈ Mn,n (R) is in row-reduced echelon form. Then A = In or A contains a row of zeros.
Proof. If every row of A contains a leading entry then A = In ; otherwise A contains a row of zeros.
Remark. Using elementary row operations to turn a matrix into a matrix in row-reduced echelon form is called
Gauss-Jordan elimination. Using elementary row operations to turn a matrix into a matrix in row echelon form
is called Gauss elimination (or sometimes Gaussian elimination).
The next theorem just restates Lemma 4.1 using our new language.
Theorem 4.6. Suppose that S1 and S2 are two systems of m equations in n unknowns. Let A1 be the
augmented matrix of S1 and let A2 be the augmented matrix of S2 . Then S1 and S2 have the same solution
set if and only if A1 is row equivalent to A2 .
Theorem 4.7. Every matrix is row equivalent to a unique matrix in row-reduced echelon form.
In Theorem 4.8 below, we will see an algorithm that shows that we can use row operations convert any matrix
into a matrix in row-reduced echelon form. So we need to consider uniqueness. The proof is quite technical, so
we have put it in the Appendix. You can find it as Theorem A.9.
Remark. By Theorem 4.7, we see that two matrices are row equivalent if and only if they are row-equivalent
to the same row-reduced echelon matrix.
Putting Theorem 4.7 and Theorem 4.6 together, we see that we can take a system of equations, write the
augmented matrix for that system and then use elementary row operations to get the matrix into a matrix in
row-reduced echelon form where the corresponding equations have the same solution set as the original system
of equations. We will soon see
• An algorithm for performing elementary row operations which will turn any matrix into a matrix in row-
reduced echelon form.
• How to read off the solution set from a system of equations where the augmented matrix is in row-reduced
echelon form.
that we solved in the last section. We’re going to solve it again. Compare the steps we went through last
time with the steps below. We begin by writing down the augmented matrix of the system and then we apply
elementary row operations until the matrix is in row-reduced echelon form.
1 1 1 1 1 1 1 1 1 1 1 1
r2 7→r2 −r1 r2 ↔r3
1 1 0 1 − −−−−−→ 0 0 −1 0 −− −−→ 0 −1 −2 −1
r3 7→r3 −r1
1 0 −1 0 0 −1 −2 −1 0 0 −1 0
1 1 1 1 1 0 −1 0 1 0 0 0
r2 7→−r2 r1 7→r1 −r2 r 7→r1 +r3
−− −−−→ 0 1 2 1 −− −−−−→ 0 1 2 1 −−1−−− −−→ 0 1 0 1
r3 7→−r3 r2 7→r2 −2r3
0 0 1 0 0 0 1 0 0 0 1 0
This augmented matrix clearly corresponds to the system of equations x = 0, y = 1, z = 0. So our original
system has (still) the unique solution {(0, 1, 0)}.
We’ve said that every matrix is row equivalent to a (unique) matrix in row-reduced echelon form. Now we
describe a method for finding that matirx.
Theorem 4.8. Carry out the following fours steps, first with row 1 as the current row, then with row 2, and so
on until EITHER every row has been the current row, OR step 1) is not possible.
33
1. Select the first column from the left that has at least one non-zero entry in or below the current row.
2. If the current row has a 0 in the selected column, interchange it with a row below which has a non-zero
entry in that column.
3. If the entry now in the current row and the selected column is c, multiply the current row by 1/c to create
a leading 1.
4. Add suitable multiples of the current row to the other rows to make each entry above and below the
leading 1 into a 0.
1 1 2 3
Example. We perform elementary row operations on the matrix 2 2 3 5 until it is in row-reduced
1 −1 0 5
echelon form.
1 1 2 3 1 1 2 3 1 1 2 3
r2 7→r2 −2r1 r2 ↔r3
2 2 3 5 −−−−−−−→ 0 0 −1 −1 −− −−→ 0 −2 −2 2
r3 7→r3 −r1
1 −1 0 5 0 −2 −2 2 0 0 −1 −1
1
r2 7→− 2 r2
1 1 2 3 1 0 1 4
r1 7→r1 −r2
−−−−−−→ 0 1 1 −1 −− −−−−→ −0 1 1 −1
0 0 −1 −1 0 0 −1 −1
1 0 1 4 1 0 0 3
r3 7→−r3 r1 7→r1 −r3
−−−−−→ 0 1 1 −1 −−−−−−→ 0 1 0 −2
r2 7→r2 −r3
0 0 1 1 0 0 1 1
• We start with the current row as row 1. The first column with a non-zero entry in row 1 or below is
column 1. Since the current row (row 1) does not have 0 in column 1, we do not need to do anything for
step 2. Similarly, because the entry is 1, we don’t need to do anything for step 3. We now perform step
4, adding multiples of row 1 to the other rows so that they have zeros in column 1.
• We now go back to step 1 with row 2 as the current row. The leftmost column in row 2 or 3 which has a
non-zero entry is column 2. But column 2 has zero in row 2, so we swap rows 2 and 3 so that there is a
non-zero entry in row 2 and column 2. Now the leading entry in row 2 and column 2 is -2, so we multiply
row 2 by −1/2. We now perform step 4, adding multiples of row 2 to the other rows so that they have
zero in column 2. (Row 3 already has zero in column 2 so we don’t need to do anything.)
• We now go back to step 1 with row 3 as the current row. The leftmost column in row 3 which has a
non-zero entry is column 3. The entry in the current row and column 3 is −1 so we do not need to do
anything for step 2 and for step 3 we multiply row 3 by −1. We now perform step 4, adding multiples of
row 3 to the other rows so that they have zeros in column 3.
• We can’t do step 1, so we are done. We check (just in case we made a mistake) that our matrix is in
row-reduced echelon form. Since it is, we are done.
Remark. The algorithm we describe is not always the quickest or easiest way of row-reducing the matrix. If
you see a quicker way, you should feel free to use it! However, using the algorithm will get you to the answer,
whereas trying random elementary row operations can lead you round in circles.
We would like to know that this process really does give us a matrix in row-reduced echelon form. The proof is
given in Proposition A.2 in the Appendix.
Now suppose that we have a system of simultaneous linear equations where the augmented matrix is in row-
reduced echelon form. We want to know how to read off the solution set from this matrix.
Proposition 4.9. Suppose that we have a system of m simultaneous equations in n unknowns and that the
corresponding augmented matrix A is in row-reduced echelon form.
• If A has a row (0 0 . . . 0|B) with B 6= 0 then the system is inconsistent. (That is there are no solutions.)
34
• If the first n columns of the matrix all contain a leading entry, the system has a unique solution.
• If neither of the conditions above hold, the system has infinitely many solutions.
Remark.
• The first case corresponds to the equation 0x1 + 0x2 + . . . + 0xn = B, which has no solution if B 6= 0.
• The second case holds if and only if m ≥ n and the matrix consisting of the top n rows and n columns
of A is In . It is then straightforward to read off the solution.
• In the third case, there is at least one of the first n columns which only has zeros in it. This is the most
complicated case and we will look at it in more detail below.
Examples. Suppose we have got the augmented matrix to one of the following forms.
We’ve decided to set x2 = k and x5 = l. Rewriting the four equations above, we find
x1 = 2 − 2k + 3l, x2 = k x3 = −2 − l,
x4 = −2l, x5 = l, x6 = 3.
We are finally ready to put all these results together and describe how to solve systems of linear equations.
Proposition 4.10. Suppose that we want to solve a system of m simultaneous equations in n unknowns:
a11 x1 + a12 x2 + ... + a1n xn = b1
a21 x1 +
a22 x2 + ... + a2n xn = b2
.. .. .. .. ..
. . . = . .
am1 x1 + am2 x2 + . . . + amn xn = bm
35
1. Write down the augmented matrix
a11 a12 ... a1n b1
a21 a22 ... a2n b2
.
.. ..
. .
am1 am2 . . . amn bn
2. Bring the matrix into row-reduced echelon form using elementary row operations.
3. If at any stage you get a row
(0 0 0 · · · 0 | B)
where B 6= 0, your system is inconsistent and has no solutions. Stop here - you do not need to finish
row-reducing.
4. Read off the solutions, if any exist.
5. If you have a solution or solutions put it back into the initial equation and check that it works.
• If it works, you know you have got the right answer. Sigh of relief.
• If it doesn’t work, you have gone wrong somewhere. Work backward until you find a step where
your solution satisfies one system of equations but not the one before it. That is where (one of the)
errors occured.
36
We write down the augmented matrix and row-reduce it.
7 1 6 0 −2 4 −3 4 1 −2 3/2 −2
r1 ↔r2 r1 7→1/2r1
−2 4 −3 4 −− −−→ 7 1 6 0 −−−−−−→ 7 1 6 0
31 −17 33 −20 31 −17 33 −20 31 −17 33 −20
1 −2 3/2 −2 1 −2 3/2 −2
r2 7→r2 −7r1 r3 7→r3 −3r2
−−−−−−−−→ 0 15 −9/2 14 −−−−−−−→ 0 15 −9/2 14
r3 7→r3 −31r1
0 45 −27/2 42 0 0 0 0
1 −2 3/2 −2 1 0 9/10 −2/15
r2 7→1/15r2 r1 7→r1 +2r2
−−−−−−−→ 0 1 −3/10 14/15 −− −−−−−→ 0 1 −3/10 14/15
0 0 0 0 0 0 0 0
Remark. If a solution set has infinitely many solutions, there will be more than one way of writing the solution
set. For example, the solution set above can be seen as the line going through (−2/15, 14/15, 0) with gradient
(−9/10, 3/10, 1). You could also write (for example)
n o n o
(−2/15 − 9/10k, 14/15 + 3/10k, k) : k ∈ R = (1 + 9l, 5/9 − 3l, −34/27 − 10l) : l ∈ R .
While this answer is also correct, it is much easier just to write down the solution space using the row-reduced
echelon matrix as we have done above.
Example. Determine the values of λ ∈ R for which the following system of linear equations is consistent and,
for these values of λ, find all possible solutions:
2x + (1 + 2λ)y + z = 3
x + λy 2
+ λ z = λ+1
x + λy + z = 2
We perform row reduction on the augmented matrix, being careful not to divide by zero:
λ2 λ + 1 λ2
2 1 + 2λ 1 3 1 λ 1 λ λ+1
2 r1 ↔r2 r2 7→r2 −2r1
1 λ λ λ + 1 −−−−→ 2 1 + 2λ
1 3 −− −−−−−→ 0 1 1 − 2λ2 1 − 2λ
r3 7→r3 −r1
1 λ 1 2 1 λ 1 2 0 0 1 − λ2 1 − λ
2 0 λ2 + λ 2λ + 1
1 λ λ λ+1 1 1 0 1+λ 2+λ
r2 7→r2 −2r3 r1 −λr2 r1 7→r1 +r3
−−−−−−−→ 0 1 −1 −1 −−−−−→ 0 1 −1 −1 −− −−−−→ 0 1 −1 −1 (‡)
0 0 1 − λ2 1 − λ 0 0 1−λ 2 1−λ 2
0 0 1−λ 1−λ
Case 1: 1 − λ2 6= 0, that is λ 6= ±1. In this case, we can divide by 1 − λ2 = (1 − λ)(1 + λ) so we can continue
the row reduction to obtain an equivalent matrix in row-reduced echelon form:
r3 7→ 1 0 λ+1 2+λ
1
r3 1 0 0 λ+1
1−λ2 r1 7→r1 −(λ+1)r3 1
−−−−−−−→ 0 1 −1 −1 −−−−−−−−−−→ 0 1 0 1+λ − 1
1 r2 7→r2 +r3 1
0 0 1 1+λ 0 0 1 1+λ
From this we can see that the system is consistent, with unique solution
1 λ 1
x = λ + 1, y= 1+λ − 1 = − 1+λ , z= 1+λ .
37
1 0 2 3
Case 2: λ = 1. The final matrix in (‡) is 0 1 −1
−1 so the equations are equivalent to
0 0 0 0
x + 2z = 3
y − z = −1
0 = 0
The system is consistent, with infinitely many solutions. The general solution is
x = 3 − 2k, y = k − 1, z = k, k∈R
38
Proof. Let B + be the unique m × n + 1 matrix in row-reduced echelon form which is row equivalent to (A|b)
and let B be the matrix obtained by removing the last column from B + . Then B is in row-reduced echelon
form and the elementary row operations that turn (A|b) into B + also turn A into B. Hence rk(A) is the
number of non-zero rows of B and rk(A|b) is the number of non-zero rows of B + . Let r = rk(A). Then either
the last column of B + has 1 in row r + 1 and 0 elsewhere; or it contains only 0 in rows r + 1, . . . , m. In the
former case, the system is inconsistent and rk(A) 6= rk(A|b); in the latter case, the system is consistent and
rk(A) = rk(A|b).
Now assume the system is consistent, so that rk(A) = rk(A|b) and rows r + 1, r + 2, . . . , m of B + are all zero
rows. There is a unique solution if and only if every column of B contains a leading entry, that is, if and only
if n ≤ m and the first n rows of B are equal to In . This also occurs if and only if rk(A) = n.
Examples. Calculating the ranks of the coefficient matrix and augmented matrix for the system
7x + 6y + 5z = 4
8x + 7y + 6z = 5
9x + 8y + 7z = 6
gives rk(A) = rk(A|b) = 2, so we know from the theorem the system is consistent. We can also conclude, as
2 < n = 3, that we have infinitely many solutions.
Calculating the ranks of the coefficient matrix and augmented matrix for the system
7x + 6y + 5z = 4
8x + 7y + 6z = 5
9x + 8y + 7z = 0
gives rk(A) = 2 and rk(A|b) = 3, so we know from the theorem the system is inconsistent.
Theorem 4.14. Let A be an n × n matrix. Then the following statements are equivalent.
1. A is invertible.
2. rk(A) = n
3. The system Ax = b has a unique solution for each n × 1 matrix b.
4. The system Ax = v0 has only the trivial solution.
Proof. These results all follow by noting that A is invertible if and only if A is row equivalent to In .
So there will be three different kinds of elementary matrices. Rather than writing out the details we give an
39
example that illustrates these three types. For instance, if n = 3 and k, l are real numbers with b 6= 0 then
0 1 0
r1 ↔ r2 corresponds to E = 1 0 0
0 0 1
1 0 0
r2 7→ r2 + kr1 corresponds to E = k 1 0 .
0 0 1
1 0 0
r3 7→ lr3 corresponds to E = 0 1 0
0 0 l
Theorem 5.3. An elementary matrix is invertible and its inverse is also an elementary matrix.
Proof. Let In denote the identity matrix of the relevant size. Going back to the definition of elementary
operations note that each can be undone by an elementary operation: For ri ↔ rj apply the same again, for
ri 7→ ri + krj apply ri 7→ ri − krj and for ri → lri apply ri 7→ l−1 ri . Therefore, if F is the elementary matrix
corresponding to the row operation which undoes the one that gives rise to E then
F (EIn ) = In
by the definition. Hence In = F (EIn ) = (F E)In = F E and similarly we get In = EF. Hence E is invertible
with inverse F .
Recall that Theorem 4.7 says that any matrix is row equivalent to a unique matrix in row-reduced echelon form.
We need this result in mind for our following results.
Lemma 5.4. Suppose that A ∈ Mn,n (R) and that the last row of A contains only 0s. Then A is not invertible.
Proof. Suppose for a contradiction that A is invertible and that AB = BA = In . Then the (n, n)-entry in the
matrix AB is equal to 1. But this entry is equal to
n
X
ank bkn = 0,
k=1
Theorem 5.5. Suppose that A ∈ Mn,n (R). Then A is invertible if and only if A is row equivalent to In .
Proof. By Theorem 4.7 there exists a unique matrix M in row-reduced echelon form such that A is row equivalent
to M . Then by Lemma 5.2 there exist elementary matrices E1 , E2 , . . . , Et such that M = E1 E2 . . . Et A. Since
M is in row-reduced echelon form and is square, either M = In or the last row of M consists of 0s by Lemma 4.5.
Suppose A is invertible. Then M = E1 E2 . . . Et A is also invertible, since each elementary matrix Ei is invertible.
So M = In , since otherwise M is not invertible, by Lemma 5.4.
Suppose M = In . Then (E1 E2 . . . Et )A = In and A is invertible with inverse (E1 E2 . . . Et ) by Proposition 2.6.
40
Corollary 5.6. Any sequence of elementary row operations that transforms A to In also transforms In to A−1 .
Proof. Suppose that
In = (E1 E2 . . . Et )A
where E1 , E2 , . . . , Et are elementary matrices. Then
A−1 = E1 E2 . . . Et = E1 E2 . . . Et In
Corollary 5.7. Suppose that A is row-equivalent to a matrix B. Then we can write A as a product
where each Ei is an elementary matrix. In particular, suppose that A is equivalent to In . Then we can write A
as a product
A = E10 E20 . . . Et0
of elementary matrices.
Proof. By Lemma 5.2 there exist elementary matrices E1 , E2 , . . . , Et such that B = E1 E2 . . . Et A, where each
−1
elementary matrix is invertible by Lemma 5.3. Set Ei0 = Et−i+1 , so that Ei0 is also invertible. Then
So we get
1 2 −3
1
A−1 = 1 −2 1 ,
4
1 2 1
which you should check by multiplying by A.
41
1 5 9
• Let B = 2 −1 1 . We determine if B is invertible.
−1 17 25
1 5 9 1 0 0 1 5 9 1 0 0
2 −1 1 0 1 0 −r−3− +r1
−→ 0 −11 −17 −2 1 0
r2 −2r1
−1 17 25 0 0 1 0 22 34 1 0 1
1 5 9 1 0 0
r3 +2r2
−−−−→ 0 −11 −17 −2 1 0
0 0 0 3 2 1
So B is not invertible.
Now let us consider how the determinant changes for equivalent matrices. What can we deduce about the
determinant of a matrix A from the elementary operations we apply to find the equivalent matrix in row-reduced
echelon form?
Theorem 5.9. Let E be an elementary matrix and let k 6= 0 ∈ R.
1. If E results from interchanging two rows of In , then det E = −1.
2. If E results from multiplying a row of In by k, then det E = k.
3. If E results from adding l times one row of In to another row, then det E = 1.
Proof. This follows from Theorem 2.16, noting that det In = 1.
We are almost in a position to prove Theorem 2.14 (3). We begin with a case.
Theorem 5.10. Suppose that A, E ∈ Mn,n (R) with E an elementary matrix. Then
det(EA) = (det E)(det A).
Proof. By Theorem 5.2, EA is the matrix obtained by applying the relevant elementary row operation to A.
Then det(EA) is described in Theorem 2.16 and det E is described in Theorem 5.9. Comparing the determinants
gives the result.
Theorem 5.11. Let A ∈ Mn,n (R). Then A is row equivalent to In if and only if det A 6= 0.
Proof. Suppose that A is row equivalent to a matrix B which is in row-reduced echelon form. By Corol-
lary 5.7 there exist elementary matrices E10 , E20 , . . . , El0 such that A = E10 E20 . . . Et0 B. By repeatedly applying
Theorem 5.10, we have
det A = det E10 det E20 . . . det Et0 det B.
If B = In then det B = 1 and det Ei 6= 0 for any i since by Theorem 5.9, none of the elementary matrices have
determinant zero. Hence det A 6= 0. If B 6= In then by Lemma 4.5, the last row of B consists zeros and hence
det B = 0 by Lemma 5.4. Hence det A = 0.
Now suppose A is not invertible or B is not invertible, so by Theorem 5.12, det A = 0 or det B = 0 and so
(det A)(det B) = 0. Now AB is not invertible by Lemma 2.7 hence by Theorem 5.12 again, det(AB) = 0.
Hence det(AB) = (det A)(det B).
42
6 Eigenvalues and eigenvectors
In this Section we will
• See how matrices map n-dimensional space.
• Define the eigenvalues, eigenvectors and eigenspace of a square matrix and learn how to compute them.
43
a b
Remark. We have that det A = 0 if and only if the vectors and are parallel.
c d
Proposition 6.3. Let A ∈ M3,3 (R). The map v 7→ Av for v ∈ R3 maps the unit cube to a parallelepiped with
volume | det A|.
Remark. Similar results hold for Rn , if you can imagine n-dimensional space. In fact, we can define the
determinant in this way. Let A ∈ Mn,n (R). The map v 7→ Av from Rn to Rn takes the unit cube in Rn to a
shape of volume det A. If we have A1 , A2 ∈ Mn,n (R) then we can compose the maps
A A
Rn −−→
1
Rn −−→
2
Rn
so that v 7→ A2 (A1 v) = (A2 A1 )v. So on one hand, this sends the unit cube to a shape of volume | det(A1 A2 )|;
but on the other hand, it first sends the unit cube to a shape of volume det A1 and then to a shape of volume
| det A1 || det A2 |.
Theorem 6.5. Let A be an n × n matrix over R and let λ ∈ R. Then λ is an eigenvalue of A if and only if it
satisfies the equation det(A − λIn ) = 0.
(A − λIn )v = 0n .
So, saying that λ is an eigenvalue of A is precisely the same as saying that there is a non-trivial solution to the
equation (A − λIn )v = 0n which, by Theorem 4.14 happens if and only if the matrix (A − λIn ) is not invertible,
which by Theorem 5.12 happens if and only if det(A − λIn ) = 0.
is called the characteristic polynomial of the matrix A. It is a polynomial in the variable λ of degree n.
44
Combining this definition with Theorem 6.5, we have the following result.
Proposition 6.7. Let A ∈ Mn,n (R). Then λ is an eigenvalue of A if and only if
charA (λ) = 0.
2 3
Example. With A = as in the example above, we have the characteristic polynomial
−1 −2
2 3 1 0 2−λ 3
det(A − λIn ) = det −λ = det = (2 − λ)(−2 − λ) + 3 = λ2 − 1.
−1 −2 0 1 −1 −2 − λ
We have that charA (λ) = λ2 − 1 = 0 if and only if λ = ±1, so eigenvalues of A are λ = 1 and λ = −1.
x
Let us find all eigenvectors corresponding to eigenvalue 1. This means that we want to find all v = such
y
that
2 3 x x
=1
−1 −2 y y
or equivalently, such that
2−1 3 x 0
=
−1 −2 − 1 y 0
that is, we need to solve the pair of simultaneous equations
x + 3y = 0
−x − 3y = 0
which we can all do easily! This has general solution x = −3k, y = k, k ∈ R, BUT we must recall from the
definition that eigenvectors are non-zero, so we also need k 6= 0. So the eigenvectors of A with eigenvector 1
are all vectors of the form
−3k
v= : k ∈ R \ {0} .
k
Exercise: Find all the eigenvectors corresponding to −1.
Remark. The zero vector v0 will satisfy Av0 = λv0 for any value of λ. So v0 is never an eigenvector. (It’s
kind of similar to the way that we don’t consider 1 to be a prime number.) However, as we saw above, it’s
then a bit clumsy to write down the eigenvectors as we have to explicitly exclude v. We get around this by
introducing eigenspaces. You will see more of these if you take Algebra II next year.
Definition 6.8. Suppose that A ∈ Mn,n (R) and that λ is an eigenvalue of A. The eigenspace of λ is defined
to be
Eλ = {v ∈ Rn : Av = λv}.
Note that v is always an element of Eλ . So the eigenspace Eλ consists of all the eigenvectors for λ together
with the zero vector.
Example. Find the eigenvalues of the matrix A below and for each eigenvalue, find the corresponding eigenspace.
2 2 −1
A = −1 3 0
−1 3 1
We first compute the characteristic polynomial:
2−λ 2 −1
charA (λ) = det −1 3 − λ 0
−1 3 1−λ
= (2 − λ)(3 − λ)(1 − λ) − 2(−(1 − λ)) − 1(−3 + (3 − λ)
= (2 − λ)(3 − λ)(1 − λ) + 2(1 − λ) + λ
= (2 − λ)(3 − λ)(1 − λ) + (2 − λ)
= (2 − λ)(3 − 4λ + λ2 + 1)
= (2 − λ)(λ − 2)2
45
Hence charA (λ) = 0 if and only if λ = 2, so λ = 2 is the only eigenvalue of A. We now compute E2 .
E2 = v ∈ R3 : Av = 2v
x 2 2 −1 x x
= y : −1 3 0 y = 2 y
z −1 3 1 z z
x 2x + 2y − z = 2x
= y : −x + 3y = 2y
z −x + 3y + z = 2z
x 2y − z = 0
= y : −x + y = 0
z −x + 3y − z = 0
To solve this homogeneous system of equations, we go back to the techniques in Section 4.2. Since the system
is homogeneous (that is, all the constants are 0) we don’t bother writing down the last column. We consider
the matrix
0 2 −1 2−2 2 −1
−1 1 0 = −1 3 − 2 0 .
−1 3 −1 −1 3 1−2
After applying elementary row operations we get
0 2 −1 1 0 −1/2
−1 1 0 ; 0 1 −1/2 .
−1 3 −1 0 0 0
Hence the solution set, that is to say E2 , is given by
1/2k
E2 = 1/2k : k ∈ R .
k
Remark. In the example above, we showed all the working. However, all we really need to solve to find the
eigenspace
Eλis the system (A − λIn ) = 0v . We didn’t really need to justify the steps that led us to the matrix
0 2 −1
−1 1 0 .
−1 3 −1
Example. Find the eigenvalues of the matrix A below and for each eigenvalue, find the corresponding eigenspace.
2 1 −1
A = 1 2 −1
1 1 0
We first compute the characteristic polynomial:
2−λ 1 −1
charA (λ) = det 1 2 − λ −1 = (2 − λ)(λ − 1)2
1 1 −λ
so the eigenvalues are λ = 2 and λ = 1. To find E2 , consider the matrix
0 1 −1 1 0 −1
1 0 −1 − EROs
−−→ 0 1 −1 .
1 1 −2 0 0 0
Hence
k
E2 = k :k∈R .
k
46
To find E1 , consider the matrix
1 1 −1 1 1 −1
EROs
1 1 −1 −−−→ 0 0 0 .
1 1 −1 0 0 0
Hence
l2 − l1
E1 = l1 : l1 , l2 ∈ R .
l2
1. Compute the characteristic polynomial charA (λ) = det(A − λIn ) and find the values of λ for which
charA (λ) = 0. These are the eigenvalues of A.
2. For each eigenvalue λ of A:
A Appendix
In the appendix, we collect together some of the long proofs of results that we have used. This is not part of
the module - just some extra information. If you want to know why certain things we claimed in the lecture
course are actually true, you may find this section interesting.
Unfortunately this is NOT enough to conclude that (In − BA) = 0n or B = 0n - recall that two matrices can
multiply together to get the zero matrix even if neither of them are the zero matrix. Instead, consider the linear
transformation θ : Rn → Rn given by θ(x) = Bx for all x ∈ Rn . This is injective, since if Bx = By then
x = (AB)x = (AB)y = y, and any injective linear map from an n-dimensional vector space to itself is also
surjective. So we have
where Cij = (−1)i+j det Aij is the (i, j)-cofactor and Aij is the matrix obtained by deleting row i and column
A to mean the (i, j)-cofactor of the matrix A (since we will be looking at cofactors
j of A. Below, we write Cij
of more than one matrix). We write Aii0 jj 0 to denote the matrix obtained from A by deleting rows i and i0 and
columns j and j 0 .
47
Theorem A.2. Suppose that det A 6= 0. Then we can find the inverse of A as follows
1
A−1 = adj A.
det A
Proof. We want to consider the matrix A(adj A). First, suppose 1 ≤ i ≤ n and consider the (i, i)-entry of
A(adj A). This is given by
Xn Xn
aik Cik = (−1)i+k aik det Aik = det A.
k=1 k=1
=0
We now look at how performing elementary row operations on square matrices affects their determinants.
Lemma A.3. Let A ∈ Mn,n (R) where n ≥ 2. Let B be the matrix obtained by swapping rows 1 and 2 of A.
Then det B = − det A.
Proof. Write Aii0 jj 0 (resp. Bii0 jj 0 ) to denote the matrix obtained by deleting rows i and i0 and columns j andj 0
for A (resp. B). Then
n
X
det B = b1j (−1)j+1 det B1j
j=1
n
X Xj−1 n
X
= b1j (−1)j+1 b2k (−1)k+1 det B12jk + b2k (−1)k det B12kj
j=1 k=1 k=j+1
n
X j−1
X n
X
= a2j (−1)j+1 a1k (−1)k+1 det A12jk + a1k (−1)k det A12kj
j=1 k=1 k=j+1
X X
= (−1)j+k a2j a1k det A12kj + (−1)j+k+1 a2j a1k det A12kj
1≤k<j≤n 1≤j<k≤n
n
X
det A = a1k (−1)k+1 det A1k
k=1
n
X Xk−1 n
X
= a1k (−1)k+1 a2j (−1)j+1 det A12jk + a2j (−1)j det A12jk
k=1 j=1 j=k+1
X X
= (−1)j+k a1k a2j det A12jk + (−1)j+k+1 a1k a2j det A12kj
1≤j<k≤n 1≤k<j≤n
48
Lemma A.4. Let A ∈ Mn,n (R) where n ≥ 2 and suppose that 1 ≤ x < y ≤ n. Let B be the matrix obtained
by swapping rows x and y of A. Then det B = − det A.
Proof. We use induction on n. If n = 2 then
a11 a12 a21 a22
det = a11 a22 − a12 a21 = −(a21 a12 − a11 a22 ) = − det
a21 a22 a11 a12
so the Lemma holds for n = 2. So suppose that n > 2 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let
A ∈ Mn,n (R) and let B be the matrix obtained by swapping rows x and y of A. We have 3 cases to consider.
• If x 6= 1 then
n
X n
X n
X
B B A
det B = b1j C1j = a1j C1j =− a1j C1j = − det A,
j=1 j=1 j=1
B = −C A .
where we have used the inductive hypothesis to deduce that Cij ij
• If x = 1 and y = 2 then the result is Lemma A.3.
• Suppose x = 1 and y > 2. Let D be the matrix obtained by swapping rows 2 and y of A. and D0 the
matrix obtained by swapping rows 2 and y of B. Then using the first two parts of the Lemma,
Lemma A.5. Let A ∈ Mn,n (R) and let k ∈ R. Suppose that 1 ≤ x ≤ n and let B be the matrix obtained by
multiplying row x by k. Then det B = k det A.
Proof. We use induction on n. If n = 1 and A = (a) then det(ka) = ka = k det(a) so the Lemma holds. So
suppose that n > 1 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let A ∈ Mn,n (R) and let B be the matrix
obtained by multiplying row x by k. If x = 1 then
n
X n
X n
X
B A A
det B = b1j C1j = ka1j C1j =k a1j C1j = k det A,
j=1 j=1 j=1
where we have used the inductive hypothesis to deduce that CijB = kC A . Hence the Lemma hold for all
ij
A ∈ Mn,n (R) and so the induction hypothesis holds for all n.
Lemma A.6. Let A ∈ Mn,n (R) with n ≥ 2 and let k ∈ R. Suppose that 1 ≤ x, y ≤ n with x 6= y and let B
be the matrix obtained by adding k copies of row y to row x. Then det B = det A.
Proof. We use induction on n. If n = 2 then
a11 + ka21 a12 + ka22
det = (a11 + ka21 )a22 − (a12 + ka22 )a21 = a11 a22 − a12 a21
a21 a22
a11 a12
= det
a21 a22
a11 a12
det = a11 (a22 + ka12 ) − a12 (a21 + ka11 ) = a11 a22 − a12 a21
a21 + ka11 a22 + ka12
a11 a12
= det
a21 a22
so the result holds for n = 2. So suppose that n > 2 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let
A ∈ Mn,n (R) and let B be the matrix obtained by adding k copies of row y to row x. If x =
6 1 then
n
X n
X
B A
det B = b1j C1j = a1j C1j = det A,
j=1 j=1
49
B = C A . So suppose that x = 1. Let D be the
where we have used the inductive hypothesis to deduce that Cij ij
0
matrix obtained by swapping rows 2 and y of A. and D the matrix obtained by swapping rows 2 and y of B.
Then by the last result
det B = − det D0 = − det D = det A
as required.
Proof. We use induction on n. If n = 1 then A = AT so the result is true. Suppose that n > 2 and the Lemma
holds for all A0 ∈ Mn−1,n−1 (R). Let A ∈ Mn,n (R) and let B = AT .
n
X
det B = b1j (−1)j+1 det B1j
j=1
n
X
T
= b11 det B11 + b1j (−1)j+1 det B1j
T
using the inductive hypothesis
j=2
Xn
= a11 det A11 + aj1 (−1)j+1 det Aj1
j=2
n
X n
X
= a11 det A11 + aj1 (−1)j+1 (−1)k a1k det A1j1k
j=2 k=2
Xn Xn
= a11 det A11 + (−1)j+k+1 det A1j1k
j=2 k=2
n
X
det A = (−1)k+1 a1k det A1k
k=1
n
X
= a11 det A11 + (−1)k+1 a1k det AT1k using the inductive hypothesis
k=2
Xn
= a11 det A11 + (−1)k+1 a1k det Bk1
k=2
Xn n
X
= a11 det A11 + (−1)k+1 a1k (−1)j b1j det B1k1j
k=2 j=2
n
X n
X
= a11 det A11 + (−1)k+1 a1k (−1)j aj1 det A1j1k using the inductive hypothesis
k=2 j=2
Xn Xn
= a11 det A11 + (−1)j+k+1 det A1j1k
j=2 k=2
Proof. Let M be an m × n matrix. If M has all entries = 0 then we are done. So we suppose that M has
some non-zero entries. The proof is by induction on m, the number of rows of M.
Base Step m = 1 : Let d be the leading entry of the (only) row of M. Now multiply the row by d−1 to get
leading entry 1 and we are done.
Inductive step: Suppose the result holds for all matrices with (m − 1) rows. Carry out the following:
50
(1) If necessary, interchange rows so that the first column which does not consist entirely of zeros (call it column
j) has a non-zero entry d in the first row. The matrix looks something like:
0 ··· 0 d ∗ ··· ∗
.. .. .. .. ,
. . . .
0 ··· 0 ∗ ∗ ··· ∗
where ∗ represents entries that we are not interested in for the moment.
(2) Multiply the first row by d−1 . We denote the remaining entries in column j by c2 , ..., cm , so the matrix
looks like:
0 ··· 0 1 ∗ ··· ∗
0 · · · 0 c2 ∗ · · · ∗
.. .. .. .. ..
. . . . .
0 ··· 0 cm ∗ · · · ∗
(3) We clear the j th column by subtracting
For k = j + 1, . . . , n proceed as follows - IF ek 6= 0 AND ek appears above a leading entry 1 in row ri , then
perform the elementary row operation r1 −ek ri . After performing these finitely many row operations, our matrix
M will be in row-reduced echelon form.
Hence, by induction the result holds for all m × n matrices.
Theorem A.9. Every matrix is row equivalent to a unique matrix in row-reduced echelon form.
Proof. In Theorem A.2 above we saw an algorithm that shows that we can use row operations convert any
matrix into a matrix in row-reduced echelon form. So we need to consider uniqueness. We use induction on the
number of columns. If there is only one column, there are only two matrices in row-reduced echelon form, the
zero matrix and the matrix with 1 in the top left corner and 0 elsewhere, and these are clearly not row-equivalent.
So suppose that A ∈ Mm,n (R) and that the result holds for A0 ∈ Mm,n−1 (R). Suppose that B and C are
matrices in row-reduced echelon form which are both equivalent to A.
Let Ā (resp. B̄, C̄) be the matrices obtained by removing the last column from A (resp. from B, C). Any
sequence of elementary row operations that turns A into a matrix in row-reduced echelon form also turns Ā
into a matrix in row-reduced echelon form, so B̄ and C̄ are in row-reduced echelon form and therefore by the
51
induction hypothesis, B̄ = C̄. Hence B and C agree, except possibly in the last column. Let bi (resp. ci ) be
the entry in row i and column n of B (resp. C).
Consider A, B and C as augmented matrices for a system of m equations in n − 1 unknowns. By Theorem 4.6,
the systems all have the same solution space. Suppose that B̄ (and hence C̄) has got r non-zero rows and that
the leading 1s are in columns f1 , f2 , . . . , fr . Let 1 ≤ i ≤ r.
Suppose that bi 6= ci for some 1 ≤ i ≤ r. Then xfi = bi and xk = 0 for all k 6= fi solution to the system
indexed by B but not to the system indexed by C, giving a contradiction. So bi = ci for all 1 ≤ i ≤ r.
Now the system given by B (resp. by C) is inconsistent if and only if bi 6= 0 for some i ≥ r (resp. ci 6= 0
for some i ≥ r) and since B (resp. C) is in row-reduced echelon form, in this case it must be that br+1 = 1,
with bi = 0 for all i 6= r + 1 (resp. cr+1 = 1, with ci = 0 for all i 6= r + 1). Thus if the system given by B
is inconsistent then so is the system given by C (by Theorem 4.6) and B = C. Otherwise both systems are
consistent and bi = ci = 0 for i ≥ r + 1 and so again B = C.
52