notesVM
notesVM
1 Vectors 1
1.1 Bound Vectors and Free Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Vector Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Position Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 The Definition of u + v does not depend on A . . . . . . . . . . . . . . . . . 6
2 Coordinates 7
2.1 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Sums and Scalar Multiples in Coordinates . . . . . . . . . . . . . . . . . . . . 8
2.3 Equations of Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Matrices 27
5.1 Matrices and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Special types of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Column vectors of dimension n . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.5 Linear systems in matrix notation . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6 Elementary matrices and the Invertible Matrix Theorem . . . . . . . . . . . . 35
5.7 Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
iii
iv CONTENTS
6 Determinants 43
6.1 Determinants of 2 × 2 and 3 × 3 matrices . . . . . . . . . . . . . . . . . . . 43
6.2 General definition of determinants . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 1
Vectors
−→
As the notation AB suggests, an ordered pair of points A, B in 3-space determines a
bound vector. Alternatively, a bound vector is determined by its:
starting point.
length,
−→ −→
We denote the length of the bound vector AB by |AB|. If a bound vector has length 0 then
−→
it is of the form AA (where A is some point in 3-space) and has undefined direction.
If we ignore the starting point we get the notion of a free vector (or simply a vector).
So a free vector is determined by its:
length,
We will use letters in bold type for free vectors (u, v, w etc.)1 The length of the free vector
v will be denoted by |v|.
−→
Definition 1.2. We say that the bound vector AB represents the free vector v if it has the
same length and direction as v.
1
In handwritten notes underlining would be used: u,v,w etc.
1
2 CHAPTER 1. VECTORS
−→ −−→
In the figure below, the two bound vectors AB and CD represent the same free vector v.
B D
v v
A C
Definition 1.3. The zero vector is the free vector with length 0 and undefined direction. It
is denoted by 0.
−→
For any point A, the bound vector AA represents 0.
It is important to be aware of the difference between bound vectors and free vectors. In
−→
particular you should never write something like AB = v. The problem with this is that the
two things we are asserting are equal are different types of mathematical object. It would be
−→
correct to say that AB represents v.
One informal analogy that might be helpful is that a free vector is a bit like an instruction
(go 20 miles Northeast say) while a bound vector is like the path you trace out when you
follow that instruction. Notice that there is no way to draw the instruction on a map (sim-
ilarly we cannot really draw a free vector) and that if you follow the same instruction from
different starting points you get different paths (just as we have many different bound vectors
representing the same free vector).
In what follows we will mainly be working with free vectors and when we write vector we
will always mean free vector.
Definition 1.6. Given vectors u and v we define the sum u + v as follows. Pick any point
−→ −−→
A and let B, C, D be points such that AB represents u, AD represents v and ABCD is a
−→
parallelogram. Then u + v is the vector represented by AC.
1.3. VECTOR ADDITION 3
v C
B
u u
D
A
v
−−→
In the figure, DC represents u because we chose C in order to make ABCD a paral-
−−→ −−→
lelogram. Now by the parallelogram axiom AD and BC represent the same vector v. By
−→
definition AC represents u + v.
Remark 1.7. In order for this to be a sensible definition it needs to specify u + v in a
completely unambiguous way. In other words, if two people follow the recipe above to find
u + v they should come up with the same vector. For this to be true we need to check that
B, C, D are uniquely specified by the rule we gave (this is obvious) and that the answer we
get does not depend on the choice of A. This last point requires bit of checking which I give
as an exercise (with hints) at the end of the chapter.
From this definition we get the following useful interpretation of vector addition2 :
−→ −−→
Proposition 1.8 (The Triangle Rule for Vector Addition). If AB represents u and BC rep-
−→
resents v then AC represents u + v.
−→ −−→
Proof. Suppose that AB represents u and BC represents v. Let D be the point such that
−−→ −→ −−→
DC represents u. Since AB and DC both represent u, the figure ABCD is a parallelogram.
−−→ −−→ −−→
By the parallelogram axiom BC and AD represent the same free vector and so AD represents
v. It follows that the figure ABCD is precisely the parallelogram constructed in the definition
−→
of u + v and so by definition AC represents u + v as required.
Vector addition shares several properties with ordinary addition of numbers.
Proposition 1.9. [Properties of Vector Addition3 ] If u, v, w are vectors then:
1. u + v = v + u (that is vector addition is commutative),
2. u + 0 = u (that is 0 is an identity for vector addition),
3. u + (v + w) = (u + v) + w (that is vector addition is associative),
4. u + (−u) = 0 (that is −u is an additive inverse for u).
We usually write u − v for u + (−v) (this defines vector subtraction) and so part (iv)
could be written as u − u = 0.
−→ −−→ −−→
Proof. 1. If AB represents u, AD represents v and ABCD is a parallelogram then DC
−→
represents u. It follows from the triangle rule applied to the triangle ADC that AC
−→
represents v + u. We know (by the definition of vector addition) that AC represents
u + v. Hence u + v = v + u.
2
Some people regard this as the definition of u + v in which case the parallelogram interpretation becomes
a result that needs a proof
3
If you take the module Introduction To Algebra you will recognise that these properties mean that the set
of free vectors forms an Abelian group under vector addition.
4 CHAPTER 1. VECTORS
−→ −−→
2. Let AB represent u. Since BB represents 0, the triangle rule applied to the (degenerate)
−→
triangle ABB gives that AB represents u + 0. Hence u + 0 = u.
The last two properties in this proposition are called distributive laws.
We will show parts of the proofs of these but will not go through all cases in detail.
−→ −−→ −→
(ii) Let AB represent αu and BC represent βu. Then by the triangle rule AC represents
αu + βu.
−→
If α > 0, β > 0 then AC is a bound vector of length α|u|+β|u| = (α+β)|u| in the same
−→
direction as u. That is AC represents (α + β)u. It follows that αu + βu = (α + β)u.
The remaining cases are similar.
(iv) [This proof was not given in lectures, and so the proof is non-examinable]
If α = 0 then both sides are equal to 0.
−→ −−→ −−→ −−→
Suppose that α > 0. Let AB represent u, BC represent v, AD represent αu, and DE
represent αv (draw a picture).
The triangles ABC and ADE are similar triangles and the edge AB is in the same
−→
direction as the edge AD. It follows that the bound vector AE is in the same direction
−→ −→ −→
as AC and its length differs by a factor of α. But AC represents u + v and AE
represents αu + αv. It follows that αu + αv = α(u + v).
The α < 0 case is similar.
Definition 1.11. If P is a point, the position vector of P is the free vector represented by
−→
the bound vector OP .
We will usually write p for the position vector of P , q for the position vector of Q and so
on.
Each point in space has a unique position vector and each vector is the position vector of
a unique point in space.
If A and B are points with position vectors a and b respectively then by the triangle rule
−→
applied to the triangle AOB we get that AB represents the vector b − a.
Theorem 1.12. Let A, B be points with position vectors a and b respectively. Let P be the
−→ −→
point on the line segment AB with |AP | = λ|AB|. The position vector p of P is (1−λ)a+λb.
−→
Proof. Define u to be the free vector such that AB represents u. It follows that the bound
−→
vector AP represents λu. The triangle rule applied to OAP gives that p = a + λu. Also
u = b − a. Putting these together gives that p = a + λ(b − a) which after some manipulation
(using the distributive laws of Proposition 1.3(iii,iv)) gives the result.
In lectures we used this theorem to prove the following geometric fact about parallelograms.
Consider the parallelogram needed to define u + v with respect to point A (name the
points).
Fill in the gaps: “In order for it not to matter whether we used the parallelogram based
at A or the one based at E in the definition of u + v, our task is to show that the bound
vectors . . . and . . . represent the same free vector”.
Fill in the gaps: “The figure . . . is a parallelogram because . . . and . . . both represent the
vector u and so by the parallelogram axiom . . . and . . . represent the same vector [give
it a name].”
Fill in the gaps: “The figure . . . is a parallelogram because . . . and . . . both represent the
vector v and so by the parallelogram axiom . . . and . . . represent the same vector [what
is that vector].”
Make one more application of the parallelogram axiom to show that the bound vectors
from step 3 really do represent the same free vector.
Chapter 2
Coordinates
Suppose now that we choose an origin O and 3 mutually perpendicular axes (the x−, y− and
z− axes) arranged in a right-handed system as in the figures below:
y x z
O x O z O y
z y x
Let i, j, k denote vectors of unit length (i.e. length 1) in the directions of the x−, y− and
z−axes respectively.
We say that R is the point with coordinates (a, b, c) if the position vector of R is
r = ai + bj + ck.
If Q is the point with position vector ai + bj and P is the point with position vector ai then
−→
OP Q is a right-angled triangle and P Q represents bj. It follows from Pythyagoras’s Theorem
that
−→
|OQ|2 = |ai|2 + |bj|2 = a2 + b2 .
−→
Further, OQR is a right-angled triangle and QR represents ck. So
−→ −→ −→
|r|2 = |OR|2 = |OQ|2 + |QR|2 = a2 + b2 + c2 .
It follows that
√
|r| = a2 + b 2 + c 2 .
7
8 CHAPTER 2. COORDINATES
= |u|2 + |v|2 + 2u · v.
11
12 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT
Secondly, in coordinates
2
u1 + v1
|u + v|2 = u2 + v2
u3 + v3
= (u1 + v1 )2 + (u2 + v2 )2 + (u3 + v3 )2
= (u21 + u22 + u23 ) + (v12 + v22 + v32 ) + 2(u1 v1 + u2 v2 + u3 v3 ).
Equating these two expressions for |u + v|2 and rearranging gives the result.
Theorem 3.3 can be used to find the angle θ between two non-zero vectors given in
coordinates. Rearranging the definition of u · v and substituting the formula of Theorem 3.3
we get
u·v u1 v1 + u2 v2 + u3 v3
cos θ = =p 2 .
|u||v| (u1 + u22 + u23 )(v12 + v22 + v32 )
This also shows that for non-zero u and v
is positive if and only if 0 ≤ θ < π/2
u · v is zero if and only if θ = π/2
is negative if and only if π/2 < θ ≤ π
Proposition 3.4 (Properties of Scalar Product). For any vectors u, v, w and α ∈ R we have
1. u · v = v · u,
2. u · (v + w) = u · v + u · w,
3. (u + v) · w = u · w + v · w,
4. (αu) · v = u · (αv) = α(u · v).
Proof. These are all easy consequences of Theorem 3.3.
Note in particular that the vector product of two vectors is itself a vector, and that in
general u × v = −v × u.
1 −1 11
Example 3.7. If u = 2 , v = 3 then u × v = −3 and v × u =
−1 4 5
−11
3 .
−5
The slightly strange-looking definition of the vector product can be explained geometrically,
by the following result:
Proposition 3.8. Given vectors u and v, the vector product u × v is orthogonal to both u
and v, and its length |u × v| satisfies
|u||v| sin θ if u ̸= 0, v ̸= 0
|u × v| =
0 if u = 0 or v = 0
where θ denotes the angle between u and v (in the case that they are both non-zero).
Proof. To prove orthogonality we need to show that (u × v) · u = 0 and (u × v) · v = 0. We
calculate
u2 v3 − u3 v2 u1
(u×v)·u = u3 v1 − u1 v3 · u2 = (u2 v3 −u3 v2 )u1 +(u3 v1 −u1 v3 )u2 +(u1 v2 −u2 v1 )u3 = 0 ,
u1 v2 − u2 v1 u3
as required, and the calculation showing (u × v) · v = 0 is similar and left as an exercise.
If u = 0 or v = 0 then it is easily seen that u × v = 0, so that |u × v| = 0. If both u
and v are non-zero then we note that
|u × v|2 = |u|2 |v|2 − (u · v)2 , (3.1)
because
|u × v|2 = (u2 v3 − u3 v2 )2 + (u3 v1 − u1 v3 )2 + (u1 v2 − u2 v1 )2
= u22 v32 + u23 v22 + u23 v12 + u21 v32 + u21 v22 + u22 v12 − 2(u2 v3 u3 v2 + u3 v1 u1 v3 + u1 v2 u2 v1 )
= (u21 + u22 + u23 )(v12 + v22 + v32 ) − (u1 v1 + u2 v2 + u3 v3 )2
= |u|2 |v|2 − (u · v)2 .
Now
|u|2 |v|2 − (u · v)2 = |u|2 |v|2 − (|u||v| cos θ)2 = |u|2 |v|2 (1 − cos2 θ) = |u|2 |v|2 sin2 θ ,
so substituting into (3.1) gives |u × v|2 = |u|2 |v|2 sin2 θ, and hence |u × v| = |u||v| sin θ.
−−→ |u × v| |u × (x − p)|
|M X| = |v| sin θ = = .
|u| |u|
Note that when u and v are parallel X lies on l and so the formula above is still valid (it
correctly gives the distance as 0).
w = b − a = q + µv − p − λu
α(u × v) · (u × v) = (q + µv − p − λu) · (u × v) = (q − p) · (u × v)
(note that the µv · (u × v) and λu · (u × v) terms are 0 because u and v are orthogonal to
u × v.)
Dividing by (u × v) · (u × v) = |u × v|2 in the above equality gives us
(q − p) · (u × v)
α=
|u × v|2
and therefore
|(q − p) · (u × v)|
|w| = |α||u × v| =
|u × v|
is the distance between the two lines l1 and l2 .
16 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT
we solve
a(p1 + λu1 ) + b(p2 + λu2 ) + c(p3 + λu3 ) = d
for λ. Usually there will be a unique solution (reflecting the fact that a plane and a line
in 3-space typically intersect in a single point). However there are some conditions on
a, b, c, d, p, u (can you work out these conditions?) which mean that either there are no
solutions (corresponding to the case when the line is parallel to the plane), or that every
point on the line gives a solution (corresponding to the case when the line is a subset
of the plane). Substituting the obtained value of λ back into the parametric equations
for the line then gives the coordinates of the point of intersection.
3.9. INTERSECTIONS OF OTHER GEOMETRIC OBJECTS 17
we solve
p1 + λu1 = q1 + µv1
p2 + λu2 = q2 + µv2
p3 + λu3 = q3 + µv3
or equivalently solve
λu1 − µv1 = q1 − p1
λu2 − µv2 = q2 − p2
λu3 − µv3 = q3 − p3
for λ and µ. As there are three equations in two unknowns, there will typically be no
solutions (reflecting the fact that two lines in 3-space typically do not intersect).
18 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT
Chapter 4
Systems of linear equations arise frequently in many areas of the sciences, including physics,
engineering, business, economics, and sociology. Their systematic study also provided part of
the motivation for the development of modern linear algebra at the end of the 19th century.
Linear equations are extremely important and in particular in higher dimensions, one aims to
have a systematic and efficient way to solve them.
a1 x 1 + a2 x 2 + · · · + an x n = b ,
where the aij ’s and bi ’s are all real numbers. We also call such systems m × n systems.
Example 4.1.
x1 − x2 = 0
2x1 + x2 = 4 x + x2 − x3 = 3
(a) (b) 1 (c) x1 + x2 = 3 .
3x1 + 2x2 = 7 2x1 − x2 + x3 = 6
x2 = 1
19
20 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS
A system with no solution is called inconsistent, while a system with at least one solution
is called consistent.
The set of all solutions of a system is called its solution set, which may be empty if the
system is inconsistent.
The basic problem we want to address in this section is the following: given an arbitrary
m × n system, determine its solution set. Later on, we will discuss a procedure that provides
a complete and practical solution to this problem (the so-called ‘Gaussian algorithm’). Before
we encounter this procedure, we require a bit more terminology.
Definition 4.3. Two m × n systems are said to be equivalent, if they have the same solution
set.
Example 4.4. Consider the two systems
System (a) is easy to solve: looking at the last equation we find first that x3 = 2; the
second from bottom equation implies x2 = 2; and finally the first equation yields x1 =
(−3 + x2 − 2x3 )/5 = −1. So the solution set of this system is {(−1, 2, 2)}.
To find the solution of system (b), add the first and the second equation. Then x2 = 2,
while subtracting the first from the third equation gives 3x3 = 6, that is x3 = 2. Finally,
the first equation now gives x1 = (−3 + x2 − 2x3 )/5 = −1, so the solution set is again
{(−1, 2, 2)}.
Thus the systems (a) and (b) are equivalent.
In solving system (b) above we have implicitly used the following important observation:
Lemma 4.5. The following operations do not change the solution set of a linear system:
(i) interchanging two equations;
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
4.2. GAUSSIAN ELIMINATION 21
Example 4.6.
3x1 + 2x2 − x3 = 5 3 2 −1 5
system: augmented matrix: .
2x1 + x3 = −1 2 0 1 −1
Definition 4.8. A matrix is said to be in row echelon form if it satisfies the following three
conditions:
(i) All zero rows (consisting entirely of zeros) are at the bottom.
(ii) The first non-zero entry from the left in each nonzero row is a 1, called the leading 1
for that row.
(iii) Each leading 1 is to the right of all leading 1’s in the rows above it.
A row echelon matrix is said to be in reduced row echelon form if, in addition it satisfies
the following condition:
Roughly speaking, a matrix is in row echelon form if the leading 1’s form an echelon (that
is, a ‘steplike’) pattern.
22 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS
The variables corresponding to the leading 1’s of the augmented matrix in row echelon
form will be referred to as the leading variables, the remaining ones as the free variables.
Example 4.10.
1 2 3 −4 6
(a) .
0 0 1 2 3
Leading variables: x1 and x3 ; free variables: x2 and x4 .
1 0 5
(b) .
0 1 3
Leading variables: x1 and x2 ; no free variables.
Note that if the augmented matrix of a system is in row echelon form, the solution set is
easily obtained.
Example 4.11. Determine the solution set of the systems given by the following augmented
matrices in row echelon form:
1 −2 0 1 2
1 3 0 2
(a) , (b) 0 0 1 −2 1 .
0 0 0 1
0 0 0 0 0
x1 + 3x2 = 2
0 = 1
x1 − 2x2 + x4 = 2
x3 − 2x4 = 1
0 = 0
We can express the leading variables in terms of the free variables x2 and x4 . So set x2 = α
and x4 = β, where α and β are arbitrary real numbers. The second line now tells us that
x3 = 1 + 2x4 = 1 + 2β, and then the first line that x1 = 2 + 2x2 − x4 = 2 + 2α − β. Thus
the solution set is { (2 + 2α − β, α, 1 + 2β, β) | α, β ∈ R }.
It turns out that every matrix can be brought into row echelon form using only elementary
row operations. The procedure is known as the
4.2. GAUSSIAN ELIMINATION 23
Gaussian algorithm:
Step 1 If the matrix consists entirely of zeros, stop — it is already in row echelon form.
Step 2 Otherwise, find the first column from the left containing a non-zero entry (call it a),
and move the row containing that entry to the top position.
Step 4 By subtracting multiples of that row from rows below it, make each entry below the
leading 1 zero.
This completes the first row. All further operations are carried out on the other rows.
Step 5 Repeat steps 1-4 on the matrix consisting of the remaining rows
The process stops when either no rows remain at Step 5 or the remaining rows consist of
zeros.
Example 4.12. Solve the following system using the Gaussian algorithm:
x2 + 6x3 =4
3x1 − 3x2 + 9x3 = −3
2x1 + 2x2 + 18x3 = 8
1 −1 3 −1 1 −1 3 −1 1 −1 3 −1
∼ 0 1 6 4 ∼ 0 1 6 4 ∼ 0 1 6 4 ,
R3 − 2R1 0 4 12 10 R3 − 4R2 0 0 −12 −6 − 12 R3 0 0 1 12
1
where the last matrix is now in row echelon form. The corresponding system reads:
x1 − x2 + 3x3 = −1
x2 + 6x3 = 4
x3 = 12
Leading variables are x1 , x2 and x3 ; there are no free variables. The last equation now implies
x3 = 12 ; the second equation from bottom yields x2 = 4−6x 33= 11and finally the first equation
3
yields x1 = −1 + x2 − 3x3 = − 2 . Thus the solution is (− 2 , 1, 2 ) .
A variant of the Gauss algorithm is the Gauss-Jordan algorithm, which brings a matrix to
reduced row echelon form:
24 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS
Gauss-Jordan algorithm
Step 1 Bring matrix to row echelon form using the Gaussian algorithm.
Step 2 Find the row containing the first leading 1 from the right, and add suitable multiples
of this row to the rows above it to make each entry above the leading 1 zero.
This completes the first non-zero row from the bottom. All further operations are carried out
on the rows above it.
Step 3 Repeat steps 1-2 on the matrix consisting of the remaining rows.
Example 4.13. Solve the following system using the Gauss-Jordan algorithm:
x1 + x2 + x3 + x4 + x5 = 4
x1 + x2 + x3 + 2x4 + 2x5 = 5
x1 + x2 + x3 + 2x4 + 3x5 = 7
Solution.Performing the Gauss-Jordan algorithm on the augmented matrix gives:
1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 4
1 1 1 2 2 5 ∼ R2 − R1 0 0 0 1 1 1 ∼ 0 0 0 1 1 1
1 1 1 2 3 7 R3 − R1 0 0 0 1 2 3 R3 − R2 0 0 0 0 1 2
R1 − R3 1 1 1 1 0 2 R1 − R2 1 1 1 0 0 3
∼ R2 − R3 0 0 0 1 0 −1 ∼ 0 0 0 1 0 −1 ,
0 0 0 0 1 2 0 0 0 0 1 2
where the last matrix is now in reduced row echelon form. The corresponding system reads:
x1 + x2 + x3 = 3
x4 = −1
x5 = 2
Leading variables are x1 , x4 , and x5 ; free variables x2 and x3 . Now set x2 = α and
x3 = β, and solve for the leading variables starting from the last equation. This yields
x5 = 2, x4 = −1, and finally x1 = 3 − x2 − x3 = 3 − α − β. Thus the solution set is
{ (3 − α − β, α, β, −1, 2) | α, β ∈ R }.
We have just seen that any matrix can be brought to (reduced) row echelon form using
only elementary row operations, and moreover that there is an explicit procedure to achieve
this (namely the Gaussian and Gauss-Jordan algorithm). We record this important insight for
later use:
Theorem 4.14.
(a) Every matrix can be brought to row echelon form by a series of elementary row opera-
tions.
(b) Every matrix can be brought to reduced row echelon form by a series of elementary row
operations.
Proof. For (a): apply the Gaussian algorithm; for (b): apply the Gauss-Jordan algorithm.
Remark 4.15. It can be shown (but not in this module) that the reduced row echelon form
of a matrix is unique. On the contrary, this is not the case for just the row echelon form.
The remark above implies that if a matrix is brought to reduced row echelon form by
any sequence of elementary row operations (that is, not necessarily by those prescribed by the
Gauss-Jordan algorithm) the leading ones will nevertheless always appear in the same positions.
4.3. SPECIAL CLASSES OF LINEAR SYSTEMS 25
Note that overdetermined systems are usually (but not necessarily) inconsistent. Under-
determined systems may or may not be consistent. However, if they are consistent, then they
necessarily have infinitely many solutions:
Proof. Note that the row echelon form of the augmented matrix of the system has r ≤ m
non-zero rows. Thus there are r leading variables, and consequently n − r ≥ n − m > 0 free
variables.
Another useful classification of linear systems is the following:
Example 4.19.
The first observation about homogeneous systems is that they always have a solution, the
so-called trivial or zero solution: (0, 0, . . . , 0).
For later use we record the following useful consequence of the previous theorem on con-
sistent homogeneous systems:
Proof. We just observed that a homogeneous system is consistent. Thus, if the system is
underdetermined and homogeneous, it must have infinitely many solutions by Theorem 4.17,
hence, in particular, it must have a non-zero solution.
26 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS
Our final result in this section is devoted to the special case of n × n systems. For such
systems there is a delightful characterisation of the existence and uniqueness of solutions of a
given system in terms of the associated homogeneous systems. At the same time, the proof of
this result serves as another illustration of the usefulness of the row echelon form for theoretical
purposes.
Theorem 4.21. An n × n system is consistent and has a unique solution, if and only if the
only solution of the associated homogeneous system is the zero solution.
The same sequence of elementary row operations that brings the augmented matrix
of a system to row echelon form, also brings the augmented matrix of the associated
homogeneous system to row echelon form, and vice versa.
An n×n system in row echelon form has a unique solution precisely if there are n leading
variables.
Thus, if an n × n system is consistent and has a unique solution, the corresponding homoge-
neous system must have a unique solution, which is necessarily the zero solution.
Conversely, if the associated homogeneous system of a given system has the zero solution
as its unique solution, then the original inhomogeneous system must have a solution, and this
solution must be unique.
Chapter 5
Matrices
In this chapter we give basic rules and definitions that are necessary for doing calculations with
matrices in an efficient way. We will then consider the inverse of a matrix, the transpose of
a matrix, and what is meant by the concept of a symmetric matrix. A highlight in the later
sections is the Invertible Matrix Theorem.
27
28 CHAPTER 5. MATRICES
Definition 5.6 (Zero matrix). We write Om×n or simply O (if the size is clear from the
context) for the m × n matrix all of whose entries are zero, and call it a zero matrix.
Scalar multiplication and addition of matrices satisfy the following rules:
Theorem 5.7. Let A, B and C be matrices of the same size, and let α and β be scalars.
Then:
(a) A + B = B + A;
(b) A + (B + C) = (A + B) + C;
(c) A + O = A;
(d) A + (−A) = O, where −A = (−1)A;
(e) α(A + B) = αA + αB;
(f) (α + β)A = αA + βA;
(g) (αβ)A = α(βA);
(h) 1A = A.
Proof. We prove part (b) only, leaving the other parts as exercises.
For part (b), B + C is an m × n matrix and so A + (B + C) is an m × n matrix.
The ij-entry of B + C is bij + cij and so the ij-entry of A + (B + C) is aij + (bij + cij ).
Similarly, A + B is an m × n matrix and so (A + B) + C is an m × n matrix.
The ij-entry of A + B is aij + bij and so the ij-entry of (A + B) + C is (aij + bij ) + cij .
Since aij + (bij + cij ) = (aij + bij ) + cij we have that A + (B + C) = (A + B) + C.
Example 5.8. Simplify 2(A + 3B) − 3(C + 2B), where A, B, and C are matrices with the
same size.
Solution.
2(A + 3B) − 3(C + 2B) = 2A + 2 · 3B − 3C − 3 · 2B = 2A + 6B − 3C − 6B = 2A − 3C .
Example 5.10. Compute the (1, 3)-entry and the (2, 4)-entry of AB, where
2 1 6 0
3 −1 2
A= and B = 0 2 3 4 .
0 1 4
−1 0 5 8
Solution.
(1, 3)-entry: 3 · 6 + (−1) · 3 + 2 · 5 = 25;
(2, 4)-entry: 0 · 0 + 1 · 4 + 4 · 8 = 36.
Definition 5.11 (Identity matrix). An identity matrix I is a square matrix with 1’s on the
diagonal and zeros elsewhere. If we want to emphasise its size we write In for the n × n
identity matrix.
5.1. MATRICES AND BASIC PROPERTIES 29
(a) Let A + B = M = (mij )m×n so mij = aij + bij . Now, M C is an m × p matrix. Also
AC and BC are m × p matrices and so AC + BC is an m × p matrix.
The ij-entry of M C is
n
X X
mik ckj = (aik + bik )ckj
k=1
n
X n
X
= aik ckj + bik ckj
k=1 k=1
= (ij-entry of AC) + (ij-entry of BC)
= (ij-entry of AC + BC)
It follows that (A + B)C = AC + BC.
The second identity in part (a) is proved in a similar way
(c) Im A is an m × n matrix with ij-entry
0 × a1j + 0 × a2j + · · · + 1 × aij + · · · + 0 × amj = aij .
(where we are multiplying the aij by the entries in row i of In ). So Im A = A.
Similarly AIn is an m × n matrix with ij-entry
ai1 × 0 + ai2 × 0 + · · · + aij × 1 + · · · + ain × 0 = aij .
(where we are multiplying the aij by the entries in column j of In ). So AIn = A.
(d) Both (XY )Z and X(Y Z) are m × q matrices.
Let XY = T = (tij )m×p so
tij = xi1 y1j + xi2 y2j + · · · + xin ynj .
Now (XY )Z = T Z has ij-entry
ti1 z1j + ti2 z2j + · · · + tip zpj = (xi1 y11 + xi2 y21 + · · · + xin yn1 )z1j
+ (xi1 y12 + xi2 y22 + · · · + xin yn2 )z2j
+ ...
(xi1 y1p + xi2 y2p + · · · + xin ynp )zpj .
30 CHAPTER 5. MATRICES
Expanding out the brackets we get that this sum consists of all terms xir yrs zsj where r
ranges over 1, . . . , n and s ranges over 1, . . . , p. Equivalently,
p
n X
X
The ij-entry of T Z = xir yrs zsj .
r=1 s=1
Notation 5.13.
Since X(Y Z) = (XY )Z, we can omit the brackets and simply write XY Z and similarly
for products of more than three factors.
Example 5.14.
1 0 0 1 0 1
=
0 0 0 0 0 0
but
0 1 1 0 0 0
=
0 0 0 0 0 0
Definition 5.15. If A and B are two matrices with AB = BA, then A and B are said to
commute.
We now come to the important notion of an inverse of a matrix.
Definition 5.16. If A is a square matrix, a matrix B is called an inverse of A if
AB = I and BA = I .
Later on in this chapter we shall discuss an algorithm that lets us decide whether a matrix
is invertible. If the matrix is invertible then this algorithm also tells us exactly what the inverse
is. If a matrix is invertible then its inverse is unique, by the following result:
5.2. TRANSPOSE OF A MATRIX 31
B = IB = (CA)B = C(AB) = CI = C .
If A is an invertible matrix, the unique inverse of A is denoted by A−1 . Hence A−1 (if it
exists!) is a square matrix of the same size as A with the property that
AA−1 = A−1 A = I .
Note that the above equality implies that if A is invertible, then its inverse A−1 is also invertible
with inverse A, that is,
(A−1 )−1 = A .
Slightly deeper is the following result:
Theorem 5.18. If A and B are invertible matrices of the same size, then AB is invertible
and
(AB)−1 = B −1 A−1 .
Example 5.20.
1 4
1 2 3
(a) A = ⇒ AT = 2 5
4 5 6
3 6
1 2 T 1 3
(b) B = ⇒B =
3 −1 2 −1
32 CHAPTER 5. MATRICES
Theorem 5.21. Assume that α is a scalar and that A, B, and C are matrices so that the
indicated operations can be performed. Then:
(a) (AT )T = A;
(c) (A + B)T = AT + B T ;
(d) (AB)T = B T AT .
Proof. (a) is obvious while (b) and (c) are proved as a Coursework exercise. For the proof
of (d) assume A = (aij )m×n and B = (bij )n×p and write AT = (ãij )n×m and B T = (b̃ij )p×n
where
ãij = aji and b̃ij = bji .
Notice that (AB)T and B T AT have the same size, so it suffices to show that they have the
same entries. Now, the (i, j)-entry of B T AT is
n
X n
X n
X
b̃ik ãkj = bki ajk = ajk bki ,
k=1 k=1 k=1
which is the (j, i)-entry of AB, that is, the (i, j)-entry of (AB)T . Thus B T AT = (AB)T .
Example 5.24.
1 2 4
5 2
symmetric: 2 −1 3 , .
2 −1
4 3 0
2 2 4
1 1 1
not symmetric: 2 2 3 .
1 1 1
1 3 5
5.4. COLUMN VECTORS OF DIMENSION n 33
Symmetric matrices play an important role in many parts of pure and applied Mathematics
as well as in some other areas of science, for example in quantum physics. Some of the reasons
for this will become clearer towards the end of this course, when we shall study symmetric
matrices in much more detail.
Some other useful classes of square matrices are the triangular ones, which will also play
a role later on in the course.
Definition 5.25. A square matrix A = (aij ) is said to be
upper triangular if aij = 0 for i > j;
strictly upper triangular if aij = 0 for i ≥ j;
lower triangular if aij = 0 for i < j;
strictly lower triangular if aij = 0 for i ≤ j;
diagonal if aij = 0 for i ̸= j.
If A = (aij ) is a square matrix of size n × n, we call a11 , a22 , . . . , ann the diagonal entries
of A. So, informally speaking, a matrix is upper triangular if all the entries below the diagonal
entries are zero, and it is strictly upper triangular if all entries below the diagonal entries and
the diagonal entries itself are zero. Similarly for (strictly) lower triangular matrices.
Example 5.26.
1 0 0 0
1 2 0 3 0 0
upper triangular: , diagonal:
0 3 0 0 5 0
0 0 0 3
0 0 0
strictly lower triangular: −1 0 0 .
2 3 0
We close this section with the following two observations:
Theorem 5.27. The sum and product of two upper triangular matrices of the same size is
upper triangular.
Proof. See a Coursework exercise.
We call this the set of all (column) vectors of dimension n (or the set of n-dimensional
(column) vectors). In particular, a column vector of dimension n is just an n × 1 matrix.
34 CHAPTER 5. MATRICES
We can extend our definitions of how to add two vectors and multiply a vector by a scalar
to Rn by letting:
a1 b1 a1 + b 1 a1 αa1
a2 b2 a2 + b2 a2 αa2
.. + .. = and α .. = ..
..
. . . . .
an bn an + b n an αan
1/8
k 1 2 3 4
P (X = k) 1/2 1/4 1/8 1/8
The reformulation is based on the observation that we can write this system as a single matrix
equation
Ax = b , (5.2)
where
a11 ··· a1n x1 b1
.. .. , .. ..
A= . .
n
x = . ∈ R , and b = . ∈ Rm ,
am1 · · · amn xn bm
2x1 − 3x2 + x3 = 2
3x1 − x3 = −1
5.6. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 35
can be written
x1
2 −3 1 2
x2 = .
3 0 −1 −1
| {z } x 3 | {z }
=A | {z } =b
=x
Apart from obvious notational economy, writing (5.1) in the form (5.2) has a number of
other advantages which will become clearer shortly.
M Ax = M b (5.4)
Proof. Note that if x satisfies (5.3), then it clearly satisfies (5.4). Conversely, suppose that x
satisfies (5.4), that is,
M Ax = M b .
Since M is invertible, we may multiply both sides of the above equation by M −1 from the left
to obtain
M −1 M Ax = M −1 M b ,
We now come back to the idea outlined at the beginning of this section. It turns out that
we can ‘algebraize’ the process of applying an elementary row operation to a matrix A by
left-multiplying A by a certain type of matrix, defined as follows:
Definition 5.30. An elementary matrix of type I (respectively, type II, type III) is a
matrix obtained by applying an elementary row operation of type I (respectively, type II, type
III) to an identity matrix.
36 CHAPTER 5. MATRICES
Example 5.31.
0 1 0
type I: E1 = 1 0 0 (take I3 and swap rows 1 and 2)
0 0 1
1 0 0
type II: E2 = 0
1 0 (take I3 and multiply row 3 by 4)
0 0 4
1 0 2
type III: E3 = 0
1 0 (take I3 and add 2 times row 3 to row 1)
0 0 1
Example 5.32. Let A = (aij )3×3 and let El (l = 1, 2, 3) be defined as in the previous example.
Then
0 1 0 a11 a12 a13 a21 a22 a23
E1 A = 1 0 0 a21 a22 a23 = a11 a12 a13 ,
0 0 1 a31 a32 a33 a31 a32 a33
1 0 0 a11 a12 a13 a11 a12 a13
E2 A = 0 1 0 a21 a22 a23 = a21 a22 a23 ,
0 0 4 a31 a32 a33 4a31 4a32 4a33
1 0 2 a11 a12 a13 a11 + 2a31 a12 + 2a32 a13 + 2a33
E3 A = 0 1 0 a21 a22 a23 = a21 a22 a23 .
0 0 1 a31 a32 a33 a31 a32 a33
You should now pause and marvel at the following observation: interchanging rows 1 and 2
of A produces E1 A, multiplying row 3 of A by 4 produces E2 A, and adding 2 times row 3 to
row 1 of A produces E3 A.
This example should convince you of the truth of the following theorem, the proof of which
will be omitted as it is straightforward, slightly lengthy and not particularly instructive.
Proof. The assertion follows from the previous theorem and the observation that an elementary
row operation can be reversed by an elementary row operation of the same type. More precisely,
if two rows of a matrix are interchanged, then interchanging them again restores the
original matrix;
if a row is multiplied by α ̸= 0, then multiplying the same row by 1/α restores the
original matrix;
5.6. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 37
if α times row q has been added to row r, then adding −α times row q to row r restores
the original matrix.
Now, suppose that E was obtained from I by a certain row operation. Then, as we just
observed, there is another row operation of the same type that changes E back to I. Thus
there is an elementary matrix F of the same type as E such that F E = I. A moment’s
thought shows that EF = I as well, since E and F correspond to reverse operations. All in
all, we have now shown that E is invertible and its inverse E −1 = F is an elementary matrix
of the same type.
Example 5.35. Determine the inverses of the elementary matrices E1 , E2 , and E3 in Exam-
ple 5.31.
Solution. In order to transform E1 into I we need to swap rows 1 and 2 of E1 . The elementary
matrix that performs this feat is
0 1 0
E1−1 = 1 0 0 .
0 0 1
Finally, in order to transform E3 into I we need to add −2 times row 3 to row 1, and so
1 0 −2
E3−1 = 0 1 0 .
0 0 1
Before we come to the main result of this chapter we need some more terminology:
Definition 5.36. A matrix B is row equivalent to a matrix A if there exists a finite sequence
E1 , E2 , . . . , Ek of elementary matrices such that
B = Ek Ek−1 · · · E1 A .
Property (b) follows from Theorem 5.34. Details of the proof of (a), (b), and (c) are left
as an exercise.
We are now able to formulate and prove a delightful characterisation of invertibility of
matrices. More precisely, the following theorem provides three equivalent conditions for a
matrix to be invertible (and later on in this module we will encounter one further equivalent
condition).
Before stating the theorem we recall that the zero vector, denoted by 0, is the column
vector all of whose entries are zero.
Theorem 5.38 (Invertible Matrix Theorem). Let A be a square n × n matrix. The following
are equivalent:
(a) A is invertible;
Proof. We shall prove this theorem using a cyclic argument: we shall first show that (a)
implies (b), then (b) implies (c), then (c) implies (d), and finally that (d) implies (a). This is
a frequently used trick to show the logical equivalence of a list of assertions.
(a) ⇒ (b): Suppose that A is invertible. If x satisfies Ax = 0, then
Corollary 5.39. Suppose that A and C are square matrices such that CA = I. Then also
AC = I; in particular, both A and C are invertible with C = A−1 and A = C −1 .
Proof. To show that A is invertible, by the Invertible Matrix Theorem it suffices to show
that the only solution of Ax = 0 is the trivial one. To show this, note that if Ax = 0
then x = Ix = CAx = C0 = 0, as required, so A is indeed invertible. Then note that
C = CI = CAA−1 = IA−1 = A−1 , so both A and C are invertible, and are the inverses of
each other.
5.7. GAUSS-JORDAN INVERSION 39
What is surprising about this result is the following: suppose we are given a square matrix
A. If we want to check that A is invertible, then, by the definition of invertibility, we need
to produce a matrix B such that AB = I and BA = I. The above corollary tells us that
if we have a candidate C for an inverse of A it is enough to check that either AC = I or
CA = I in order to guarantee that A is invertible with inverse C. This is a non-trivial fact
about matrices, which is often useful.
Determinants
We will define the important concept of a determinant, which is a useful invariant for general
n × n matrices. We will discuss the most important properties of determinants, and illustrate
what they are good for and how calculations involving determinants can be simplified.
43
44 CHAPTER 6. DETERMINANTS
Our goal in this chapter is to introduce determinants for square matrices of any size, study
some of their properties, and then prove the generalisation of the above theorem. However,
before considering this very general definition, let us move to the case of 3 × 3 determinants:
Notation 6.4. For any square matrix A, let Aij denote the submatrix formed by deleting the
i-th row and the j-th column of A.
Example 6.5. If
3 2 5 −1
−2 9 0 6
A=
7 −2 −3 1 ,
4 −5 8 −4
then
3 2 −1
A23 = 7 −2 1 .
4 −5 −4
If n > 1 then det(A) is the sum of n terms of the form ±ai1 det(Ai1 ), with plus and
minus signs alternating, and where the entries a11 , a21 , . . . , an1 are from the first column
of A. In symbols:
det(A) = a11 det(A11 ) − a21 det(A21 ) + · · · + (−1)n+1 an1 det(An1 )
Xn
= (−1)i+1 ai1 det(Ai1 ) .
i=1
To state the next theorem, it will be convenient to write the definition of det(A) in a
slightly different form.
Definition 6.8. Given a square matrix A = (aij ), the (i, j)-cofactor of A is the number Cij
defined by
Cij = (−1)i+j det(Aij ) .
Thus, the definition of det(A) reads
det(A) = a11 C11 + a21 C21 + · · · + an1 Cn1 .
This is called the cofactor expansion down the first column of A. There is nothing special
about the first column, as the next theorem shows:
Theorem 6.9 (Cofactor Expansion Theorem). The determinant of an n × n matrix A can be
computed by a cofactor expansion across any column or row. The expansion down the j-th
column is
det(A) = a1j C1j + a2j C2j + · · · + anj Cnj
and the cofactor expansion across the i-th row is
det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .
Although this theorem is fundamental for the development of determinants, we shall not
prove it here, as it would lead to a rather lengthy workout.
Before moving on, notice that the plus or minus sign in the (i, j)-cofactor depends on
the position of aij in the matrix, regardless of aij itself. The factor (−1)i+j determines the
following checkerboard pattern of signs
+ − + ···
− + −
.
+ − +
..
..
. .
46 CHAPTER 6. DETERMINANTS
Example 6.10. Use a cofactor expansion across the second row to compute det(A), where
4 −1 3
A = 0 0 2 .
1 0 7
Solution.
Solution. Notice that all entries but the first of row 1 are 0. Thus it will shorten our labours
if we expand across the first row:
5 0 0 0
−6 4 −1 3
det(A) = 3 .
4 0 0 2
3 1 0 7
4 −1 3
det(A) = 3 · 5 · 0 0 2 .
1 0 7
We have already computed the value of the above 3 × 3 determinant in the previous example
and found it to be equal to −2. Thus det(A) = 3 · 5 · (−2) = −30.
Notice that the matrix in the previous example was almost lower triangular. The method
of this example is easily generalised to prove the following theorem:
Theorem 6.12. If A is either an upper or a lower triangular matrix, then det(A) is the product
of the diagonal entries of A.
(c) If a multiple of one row of A is added to another row to produce a matrix B then
det(B) = det(A).
Proof. These assertions follow from a slightly stronger result to be proved later in this chapter
(see Theorem 6.23).
Example 6.14.
1 2 3 4 5 6
(a) 4 5 6 = − 1 2 3 by (a) of the previous theorem.
7 8 9 7 8 9
0 1 2 0 1 2
(b) 3 12 9 = 3 1 4 3 by (b) of the previous theorem.
1 2 1 1 2 1
3 1 0 3 1 0
(c) 4 2 9 = 7 3 9 by (c) of the previous theorem.
0 −2 1 0 −2 1
The following examples show how to use the previous theorem for the effective computation
of determinants:
Solution. Perhaps the easiest way to compute this determinant is to spot that when adding
two times row 1 to row 3 we get two identical rows, which, by another application of the
previous theorem, implies that the determinant is zero:
3 −1 2 −5 3 −1 2 −5
0 5 −3 −6 0 5 −3 −6
=
−6 7 −7 4 R3 + 2R1 0 5 −3 −6
−5 −8 0 9 −5 −8 0 9
3 −1 2 −5
0 5 −3 −6
= = 0,
R3 − R2 0 0 0 0
−5 −8 0 9
Solution. Here we see that the first column already has two zero entries. Using the previous
theorem we can introduce another zero in this column by adding row 2 to row 4. Thus
0 1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = .
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1
1 2 −1
det(A) = −2 3 6 2 .
0 −3 1
The 3 × 3 determinant above can be further simplified by subtracting 3 times row 1 from row
2. Thus
1 2 −1
det(A) = −2 0 0 5 .
0 −3 1
Finally we notice that the above determinant can be brought to triangular form by swapping
row 2 and row 3, which changes the sign of the determinant by the previous theorem. Thus
1 2 −1
det(A) = (−2) · (−1) 0 −3 1 = (−2) · (−1) · 1 · (−3) · 5 = −30 ,
0 0 5
by Theorem 6.12.
We are now able to prove the first important general result about determinants, allowing
us to decide whether a matrix is invertible or not by computing its determinant (as such it is
a generalisation of the 2 × 2 case treated in Theorem 6.2(b)).
Proof. Bring A to row echelon form U (which is then necessarily upper triangular) using
elementary row operations. In the process we only ever multiply a row by a non-zero scalar,
so Theorem 6.13 implies that det(A) = γ det(U ) for some γ ̸= 0. If A is invertible, then
det(U ) = 1 by Theorem 6.12, since U is upper triangular with 1’s on the diagonal, and hence
det(A) = γ det(U ) = γ ̸= 0. If A is not invertible then at least one diagonal entry of U is
zero, so det(U ) = 0 by Theorem 6.12, and hence det(A) = γ det(U ) = 0.
Our next result shows what effect transposing a matrix has on its determinant:
Theorem 6.20. If A is an n × n matrix, then det(A) = det(AT ).
Proof. The proof is by induction on n (that is, the size of A). The theorem is obvious for
n = 1. Suppose now that it has already been proved for k × k matrices for some integer k.
Our aim now is to show that the assertion of the theorem is true for (k + 1) × (k + 1) matrices
as well. Let A be a (k + 1) × (k + 1) matrix. Note that the (i, j)-cofactor of A equals the
(i, j)-cofactor of AT , because the cofactors involve k × k determinants only, for which we
assumed that the assertion of the theorem holds. Hence
cofactor expansion of det(A) across first row
=cofactor expansion of det(AT ) down first column
so det(A) = det(AT ).
Let’s summarise: the theorem is true for 1 × 1 matrices, and the truth of the theorem for
k × k matrices for some k implies the truth of the theorem for (k + 1) × (k + 1) matrices.
Thus, the theorem must be true for 2 × 2 matrices (choose k = 1); but since we now know
that it is true for 2 × 2 matrices, it must be true for 3 × 3 matrices as well (choose k = 2);
continuing with this process, we see that the theorem must be true for matrices of arbitrary
size.
By the previous theorem, each statement of the theorem on the behaviour of determinants
under row operations (Theorem 6.13) is also true if the word ‘row’ is replaced by ‘column’,
since a row operation on AT amounts to a column operation on A.
Theorem 6.21. Let A be a square matrix.
(a) If two columns of A are interchanged to produce B, then det(B) = − det(A).
(b) If one column of A is multiplied by α to produce B, then det(B) = α det(A).
(c) If a multiple of one column of A is added to another column to produce a matrix B
then det(B) = det(A).
Example 6.22. Find det(A) where
1 3 4 8
−1 2 1 9
A=
2
.
5 7 0
3 −4 −1 5
Solution. Adding column 1 to column 2 gives
1 3 4 8 1 4 4 8
−1 2 1 9 −1 1 1 9
det(A) = = .
2 5 7 0 2 7 7 0
3 −4 −1 5 3 −1 −1 5
Now subtracting column 3 from column 2 the determinant is seen to vanish by a cofactor
expansion down column 2.
1 0 4 8
−1 0 1 9
det(A) = = 0.
2 0 7 0
3 0 −1 5
50 CHAPTER 6. DETERMINANTS
Our next aim is to prove that determinants are multiplicative, that is, det(AB) = det(A) det(B)
for any two square matrices A and B of the same size. We start by establishing a baby-version
of this result, which, at the same time, proves the theorem on the behaviour of determinants
under row operations stated earlier (see Theorem 6.13).
with
−1
if E is of type I (interchanging two rows)
det(E) = α if E is of type II (multiplying a row by α) .
1 if E is of type III (adding a multiple of one row to another)
Proof. By induction on the size of A. The case where A is a 2 × 2 matrix follows from
Theorem 6.2(a). Suppose now that the theorem has been verified for determinants of k × k
matrices for some k with k ≥ 2. Let A be (k + 1) × (k + 1) matrix and write B = EA.
Expand det(EA) across a row that is unaffected by the action of E on A, say, row i. Note
that Bij is obtained from Aij by the same type of elementary row operation that E performs
on A. But since these matrices are only k × k, our hypothesis implies that
det(Bij ) = r det(Aij ) ,
= r det(A) .
In particular, taking A = Ik+1 we see that det(E) = −1, α, 1 depending on the nature of E.
To summarise: the theorem is true for 2 × 2 matrices and the truth of the theorem for
k × k matrices for some k ≥ 2 implies the truth of the theorem for (k + 1) × (k + 1) matrices.
By the principle of induction the theorem is true for matrices of any size.
Using the previous theorem we are now able to prove the second important general result
of this chapter (and a generalisation of the 2 × 2 case treated in Theorem 6.2(a)):
Theorem 6.24. If A and B are square matrices of the same size, then
Case II: If A is invertible, then by the Invertible Matrix Theorem A is a product of elementary
matrices, that is, there exist elementary matrices E1 , . . . , Ek , such that
A = Ek Ek−1 · · · E1 .
For brevity, write |A| for det(A). Then, by the previous theorem,
Proof. Since A is invertible, we have A−1 A = I. Taking determinants of both sides gives
det(A−1 A) = det(I) = 1. By Theorem 6.24 we know that det(A−1 A) = det(A−1 ) det(A),
and so in fact we have det(A−1 ) det(A) = 1. Moreover, det(A) ̸= 0 because A is invertible
(by Theorem 6.17), and so we can divide both sides of the preceding equation by det(A) to
obtain the required property
1
det(A−1 ) = .
det(A)