0% found this document useful (0 votes)
1 views

notesVM

The document outlines the course content for 'Vectors & Matrices' taught by Prof. Oliver Jenkinson for the academic year 2024-25. It includes detailed sections on vectors, coordinates, scalar and vector products, systems of linear equations, matrices, and determinants, with definitions and properties of each topic. The document serves as a comprehensive guide for students studying these mathematical concepts.

Uploaded by

rajahas2006
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

notesVM

The document outlines the course content for 'Vectors & Matrices' taught by Prof. Oliver Jenkinson for the academic year 2024-25. It includes detailed sections on vectors, coordinates, scalar and vector products, systems of linear equations, matrices, and determinants, with definitions and properties of each topic. The document serves as a comprehensive guide for students studying these mathematical concepts.

Uploaded by

rajahas2006
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Vectors & Matrices, 2024–25

Prof. Oliver Jenkinson

January 20, 2025


ii
Contents

1 Vectors 1
1.1 Bound Vectors and Free Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Vector Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Position Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 The Definition of u + v does not depend on A . . . . . . . . . . . . . . . . . 6

2 Coordinates 7
2.1 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Sums and Scalar Multiples in Coordinates . . . . . . . . . . . . . . . . . . . . 8
2.3 Equations of Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Scalar Product and Vector Product 11


3.1 The scalar product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 The Equation of a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Distance from a Point to a Plane . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 The vector product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Vector equation of a plane given 3 points on it . . . . . . . . . . . . . . . . . 14
3.6 Distance from a point to a line . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Distance between two lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 Intersections of Planes and Systems of Linear Equations . . . . . . . . . . . . 16
3.9 Intersections of other geometric objects . . . . . . . . . . . . . . . . . . . . . 16

4 Systems of Linear Equations 19


4.1 Basic terminology and examples . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Special classes of linear systems . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Matrices 27
5.1 Matrices and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Special types of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Column vectors of dimension n . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.5 Linear systems in matrix notation . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6 Elementary matrices and the Invertible Matrix Theorem . . . . . . . . . . . . 35
5.7 Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

iii
iv CONTENTS

6 Determinants 43
6.1 Determinants of 2 × 2 and 3 × 3 matrices . . . . . . . . . . . . . . . . . . . 43
6.2 General definition of determinants . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 1

Vectors

1.1 Bound Vectors and Free Vectors


Definition 1.1. A bound vector is a directed line segment in 3-space. If A and B are points
−→
in 3-space, we denote the bound vector with starting point A and endpoint B by AB.

−→
As the notation AB suggests, an ordered pair of points A, B in 3-space determines a
bound vector. Alternatively, a bound vector is determined by its:

ˆ starting point.

ˆ length,

ˆ direction (provided that the length is not 0),

−→ −→
We denote the length of the bound vector AB by |AB|. If a bound vector has length 0 then
−→
it is of the form AA (where A is some point in 3-space) and has undefined direction.
If we ignore the starting point we get the notion of a free vector (or simply a vector).
So a free vector is determined by its:

ˆ length,

ˆ direction (provided that the length is not 0).

We will use letters in bold type for free vectors (u, v, w etc.)1 The length of the free vector
v will be denoted by |v|.

−→
Definition 1.2. We say that the bound vector AB represents the free vector v if it has the
same length and direction as v.

1
In handwritten notes underlining would be used: u,v,w etc.

1
2 CHAPTER 1. VECTORS

−→ −−→
In the figure below, the two bound vectors AB and CD represent the same free vector v.

B D

v v

A C
Definition 1.3. The zero vector is the free vector with length 0 and undefined direction. It
is denoted by 0.
−→
For any point A, the bound vector AA represents 0.
It is important to be aware of the difference between bound vectors and free vectors. In
−→
particular you should never write something like AB = v. The problem with this is that the
two things we are asserting are equal are different types of mathematical object. It would be
−→
correct to say that AB represents v.
One informal analogy that might be helpful is that a free vector is a bit like an instruction
(go 20 miles Northeast say) while a bound vector is like the path you trace out when you
follow that instruction. Notice that there is no way to draw the instruction on a map (sim-
ilarly we cannot really draw a free vector) and that if you follow the same instruction from
different starting points you get different paths (just as we have many different bound vectors
representing the same free vector).
In what follows we will mainly be working with free vectors and when we write vector we
will always mean free vector.

1.2 Vector Negation


If v is a non-zero vector we define its negation −v to be the vector with the same length as
v and opposite direction. We define −0 = 0. Negation is a function from the set of vectors
−→ −→
to itself. If AB represents v then BA represents −v.

1.3 Vector Addition


To define the sum of two vectors we need the notion of a parallelogram.
−→ −−→
Definition 1.4. The figure ABCD is a parallelogram if AB and DC represent the same
vector.
−→ −−→
Fact 1.5 (The Parallelogram Axiom). If AB and DC represent the same vector (u, say),
−−→ −−→
then BC and AD represent the same vector (v, say). Note that we need not have u = v.

Definition 1.6. Given vectors u and v we define the sum u + v as follows. Pick any point
−→ −−→
A and let B, C, D be points such that AB represents u, AD represents v and ABCD is a
−→
parallelogram. Then u + v is the vector represented by AC.
1.3. VECTOR ADDITION 3

v C
B

u u

D
A
v
−−→
In the figure, DC represents u because we chose C in order to make ABCD a paral-
−−→ −−→
lelogram. Now by the parallelogram axiom AD and BC represent the same vector v. By
−→
definition AC represents u + v.
Remark 1.7. In order for this to be a sensible definition it needs to specify u + v in a
completely unambiguous way. In other words, if two people follow the recipe above to find
u + v they should come up with the same vector. For this to be true we need to check that
B, C, D are uniquely specified by the rule we gave (this is obvious) and that the answer we
get does not depend on the choice of A. This last point requires bit of checking which I give
as an exercise (with hints) at the end of the chapter.
From this definition we get the following useful interpretation of vector addition2 :
−→ −−→
Proposition 1.8 (The Triangle Rule for Vector Addition). If AB represents u and BC rep-
−→
resents v then AC represents u + v.
−→ −−→
Proof. Suppose that AB represents u and BC represents v. Let D be the point such that
−−→ −→ −−→
DC represents u. Since AB and DC both represent u, the figure ABCD is a parallelogram.
−−→ −−→ −−→
By the parallelogram axiom BC and AD represent the same free vector and so AD represents
v. It follows that the figure ABCD is precisely the parallelogram constructed in the definition
−→
of u + v and so by definition AC represents u + v as required.
Vector addition shares several properties with ordinary addition of numbers.
Proposition 1.9. [Properties of Vector Addition3 ] If u, v, w are vectors then:
1. u + v = v + u (that is vector addition is commutative),
2. u + 0 = u (that is 0 is an identity for vector addition),
3. u + (v + w) = (u + v) + w (that is vector addition is associative),
4. u + (−u) = 0 (that is −u is an additive inverse for u).
We usually write u − v for u + (−v) (this defines vector subtraction) and so part (iv)
could be written as u − u = 0.
−→ −−→ −−→
Proof. 1. If AB represents u, AD represents v and ABCD is a parallelogram then DC
−→
represents u. It follows from the triangle rule applied to the triangle ADC that AC
−→
represents v + u. We know (by the definition of vector addition) that AC represents
u + v. Hence u + v = v + u.
2
Some people regard this as the definition of u + v in which case the parallelogram interpretation becomes
a result that needs a proof
3
If you take the module Introduction To Algebra you will recognise that these properties mean that the set
of free vectors forms an Abelian group under vector addition.
4 CHAPTER 1. VECTORS

−→ −−→
2. Let AB represent u. Since BB represents 0, the triangle rule applied to the (degenerate)
−→
triangle ABB gives that AB represents u + 0. Hence u + 0 = u.

3. See problem sheet 1.


−→ −→
4. Let AB represent u. Since BA represents −u, the triangle rule applied to the (degen-
−→ −→
erate) triangle ABA gives that AA represents u + (−u). But AA represents 0 and so
u − u = 0.

1.4 Scalar Multiplication


If α ∈ R and v is a vector, we define αv to be the vector with length4 |α||v| and direction
the same as v if α > 0, opposite to v if α < 0, and undefined if α = 0.
Multiplication of a vector by a scalar also has some nice properties:

Proposition 1.10. [Properties of Scalar Multiplication] For any α, β ∈ R and vectors u, v


we have:

(i) 0u = 0, α0 = 0, 1u = u, −1u = −u,

(ii) α(βu) = (αβ)u,

(iii) (α + β)u = αu + βu,

(iv) α(u + v) = αu + αv.

The last two properties in this proposition are called distributive laws.
We will show parts of the proofs of these but will not go through all cases in detail.

Proof. (i) Trivial.

(ii) By definition of scalar multiplication:

|α(βu)| = |α||βu| = |α||β||u| = |αβ||u| = |(αβ)u|.

So α(βu) and (αβ)u have the same length.


If α = 0 or β = 0 (or both) then both sides are equal to 0.
We consider cases according to whether α and β are positive or negative.
If α > 0, β > 0. Then both α(βu) and (αβ)u have the same direction as u and so are
equal.
If α < 0, β > 0. Then βu has the same direction as u and α(βu) has direction opposite
to u. Also αβ < 0 so (αβ)u has direction opposite to u. It follows that both α(βu)
and (αβ)u have the same direction as −u and so are equal.
The remaining cases of α > 0, β < 0 and α < 0, β < 0 are similar.
4
Be careful with the notation here. In this expression |α| is the absolute value of the scalar α while v is
the length of the vector v.
1.5. POSITION VECTORS 5

−→ −−→ −→
(ii) Let AB represent αu and BC represent βu. Then by the triangle rule AC represents
αu + βu.
−→
If α > 0, β > 0 then AC is a bound vector of length α|u|+β|u| = (α+β)|u| in the same
−→
direction as u. That is AC represents (α + β)u. It follows that αu + βu = (α + β)u.
The remaining cases are similar.

(iv) [This proof was not given in lectures, and so the proof is non-examinable]
If α = 0 then both sides are equal to 0.
−→ −−→ −−→ −−→
Suppose that α > 0. Let AB represent u, BC represent v, AD represent αu, and DE
represent αv (draw a picture).
The triangles ABC and ADE are similar triangles and the edge AB is in the same
−→
direction as the edge AD. It follows that the bound vector AE is in the same direction
−→ −→ −→
as AC and its length differs by a factor of α. But AC represents u + v and AE
represents αu + αv. It follows that αu + αv = α(u + v).
The α < 0 case is similar.

1.5 Position Vectors


Suppose now that we fix a special point in space called the origin and denoted by O.

Definition 1.11. If P is a point, the position vector of P is the free vector represented by
−→
the bound vector OP .

We will usually write p for the position vector of P , q for the position vector of Q and so
on.
Each point in space has a unique position vector and each vector is the position vector of
a unique point in space.
If A and B are points with position vectors a and b respectively then by the triangle rule
−→
applied to the triangle AOB we get that AB represents the vector b − a.

Theorem 1.12. Let A, B be points with position vectors a and b respectively. Let P be the
−→ −→
point on the line segment AB with |AP | = λ|AB|. The position vector p of P is (1−λ)a+λb.

−→
Proof. Define u to be the free vector such that AB represents u. It follows that the bound
−→
vector AP represents λu. The triangle rule applied to OAP gives that p = a + λu. Also
u = b − a. Putting these together gives that p = a + λ(b − a) which after some manipulation
(using the distributive laws of Proposition 1.3(iii,iv)) gives the result.

In lectures we used this theorem to prove the following geometric fact about parallelograms.

Example 1.13. The diagonals of a parallelogram intersect at their midpoints.


6 CHAPTER 1. VECTORS

1.6 The Definition of u + v does not depend on A


This section is non-examinable but I encourage you to work through the argument as an
exercise.
If you look back to the definition of vector addition you will see that we started by picking
an arbitrary point A. This question leads you through the proof that the definition of vector
addition does not depend on which point is chosen. The steps are roughly indicated with some
gaps. Your task is to expand each step and fill in the gaps to get a complete proof:

ˆ Consider the parallelogram needed to define u + v with respect to point A (name the
points).

ˆ Consider the parallelogram needed to define u + v with respect to a different point E


(name the points).

ˆ Fill in the gaps: “In order for it not to matter whether we used the parallelogram based
at A or the one based at E in the definition of u + v, our task is to show that the bound
vectors . . . and . . . represent the same free vector”.

ˆ Fill in the gaps: “The figure . . . is a parallelogram because . . . and . . . both represent the
vector u and so by the parallelogram axiom . . . and . . . represent the same vector [give
it a name].”

ˆ Fill in the gaps: “The figure . . . is a parallelogram because . . . and . . . both represent the
vector v and so by the parallelogram axiom . . . and . . . represent the same vector [what
is that vector].”

ˆ Make one more application of the parallelogram axiom to show that the bound vectors
from step 3 really do represent the same free vector.
Chapter 2

Coordinates

Suppose now that we choose an origin O and 3 mutually perpendicular axes (the x−, y− and
z− axes) arranged in a right-handed system as in the figures below:

y x z

O x O z O y
z y x

Let i, j, k denote vectors of unit length (i.e. length 1) in the directions of the x−, y− and
z−axes respectively.
We say that R is the point with coordinates (a, b, c) if the position vector of R is

r = ai + bj + ck.

If Q is the point with position vector ai + bj and P is the point with position vector ai then
−→
OP Q is a right-angled triangle and P Q represents bj. It follows from Pythyagoras’s Theorem
that
−→
|OQ|2 = |ai|2 + |bj|2 = a2 + b2 .
−→
Further, OQR is a right-angled triangle and QR represents ck. So
−→ −→ −→
|r|2 = |OR|2 = |OQ|2 + |QR|2 = a2 + b2 + c2 .

It follows that

|r| = a2 + b 2 + c 2 .

2.1 Unit Vectors


Definition 2.1. A unit vector is a vector of length 1.

For instance i, j and k are unit vectors.


 
1
If u is any non-zero vector then û = |u| u is a unit vector in the same direction as u.
u 1
We often write |u|
for |u|
u.

7
8 CHAPTER 2. COORDINATES

2.2 Sums and Scalar Multiples in Coordinates


 
a
We will write  b  for the vector ai + bj + ck where a, b, c ∈ R.
 c  
a d
Let u =  b ,v=
  e  and α ∈ R.
c f
   
a d
u + v =  b  +  e  = (ai + bj + ck) + (di + ej + f k)
c f
= (ai + di) + (bj + ej) + (ck + f k)
= (a + d)i + (b + e)j + (c + f )k
 
a+d
=  b + e .
c+f
Also,
 
a
αu = α  b  = α(ai + bj + ck)
c
= α(ai) + α(bj) + α(ck)

αa
= (αa)i + (αb)j) + (αc)k =  αb  .
αc
So vector addition and scalar multiplication can be nicely expressed in coordinates.
Note that in deriving these expressions, we used our properties of vector addition and scalar
multiplication (Propositions 1.9 and 1.10) repeatedly.

2.3 Equations of Lines


Let l be the line through the point P in the direction of the non-zero vector u.
The point R with position vector r is on the line l if and only if the vector represented
−→
by P R is a multiple of u. That is, R is on l if and only if r − p = λu for some λ ∈ R, or
equivalently if and only if r = p + λu. This is called the vector equation for l.
Note that in this equation, p and u are constant vectors (depending on the line), while r is
a (vector) variable depending on the (real number) variable λ. The equation gives a condition
which r satisfies if and only if R lies on the line l. Specifically, suppose that R is a point with
position vector r. If there is some λ for which r = p + λu then R lies on l; if there is no such
λ then R does not lie on l.      
x p1 u1
Working in coordinates, let r =  y , p =  p2  and u =  u2 . We get that R
z p3 u3
is on l if and only if
       
x p1 u1 p1 + λu1
 y  =  p2  + λ  u2  =  p2 + λu2  .
z p3 u3 p3 + λu3
2.3. EQUATIONS OF LINES 9

This is equivalent to the system of equations:



x = p1 + λu1 
y = p2 + λu2 .
z = p3 + λu3

These are called the parametric equations for the line l.


The variable λ is referred to as a parameter. Note that it appears in the above 3 equations
(which we have called the parametric equations), but it also appears in the equation r = p+λu
(which we called the vector equation, but could equally well be called a parametric vector
equation).
If u1 ̸= 0, u2 ̸= 0, u3 ̸= 0 we can eliminate the parameter λ from the parametric equations
to get
x − p1 y − p2 z − p3
= = ,
u1 u2 u3
called the Cartesian equations for the line l.
If u1 = 0, u2 ̸= 0, u3 ̸= 0 then the Cartesian equations are
y − p2 z − p3
x = p1 , = .
u2 u3
If u1 = u2 = 0, u3 ̸= 0 then the Cartesian equations are
x = p1 , y = p2
(with no constraint on z).
Note that we cannot have u1 = u2 = u3 = 0 because we insisted that u was a non-zero
vector.
Another natural way of describing a line is by giving two points that lie on it. If P and Q
are distinct points1 with position vectors p and q respectively then the line containing P and
Q is in direction q − p. We can now use the method above with u = q − p. For instance the
line through P and Q has vector equation
r = p + λ(q − p).
We found this equation by noting that the line in question is the line through P in direction
(q − p). However we could equally have identified the same line as the line through Q in
direction (q − p). This yields the vector equation
r = q + λ(q − p).
In general there are many possible different vector equations all determining the same line.
Whether we choose to use the vector, parametric or Cartesian equations for the line, in
each case we have described the geometric object (in this case a line) by giving a condition
that position vectors of points on the line must satisfy. If we think of the line as being the set
of points on it then this set is determined by the following set of position vectors:
{r : r = p + λu for some λ ∈ R}.
 
x
More generally, any set of position vectors defined by giving a condition on r (or on  y )
z
determines a geometric object in 3-space.
1
Can you see what goes wrong if P = Q?
10 CHAPTER 2. COORDINATES
Chapter 3

Scalar Product and Vector Product

3.1 The scalar product


−→ −→
If u and v are non-zero vectors with AB representing u and AC representing v, we define
−→
the angle between u and v to be the angle θ (in radians) between the line segments AB
−→
and AC with 0 ≤ θ ≤ π.
Definition 3.1. The scalar product of u and v is denoted by u · v and defined by

|u||v| cos θ if u ̸= 0, v ̸= 0
u·v =
0 if u = 0 or v = 0
where θ is the angle between u and v.
Definition 3.2. We say that u and v are orthogonal if u · v = 0.
Note that u and v are orthogonal if either one or both of them is the zero vector, or they
are perpendicular (the angle between them is π/2).
Working in coordinates we have the following very useful formula for the scalar product:
   
u1 v1
Theorem 3.3. If u =  u2  and v =  v2 . Then
u3 v3
u · v = u1 v1 + u2 v2 + u3 v3 .
 
0
Proof. If u = 0 =  0  or v = 0 then u · v = 0 by definition and u1 v1 + u2 v2 + u3 v3 = 0
0
and so the result is true.
Suppose that u ̸= 0 and v ̸= 0 and let θ be the angle between u and v.
We will use the fact that if we calculate |u + v|2 in two different ways we must get the
same answer.
−→ −−→
First, let AB represent u, AD represent v, and ABCD be a parallelogram. Let E be the
−−→ −→
point on the line through AB with EC perpendicular to AE (draw a picture!).
−→
By the definition of vector addition we have that AC represents u + v and AEC is a
right-angled triangle so
−→ −−→
|u + v|2 = |AE|2 + |EC|2
= (|u| + |v| cos θ)2 + (|v| sin θ)2
= |u|2 + |v|2 (sin θ)2 + (cos θ)2 + 2|u||v| cos θ


= |u|2 + |v|2 + 2u · v.

11
12 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT

Secondly, in coordinates
  2
u1 + v1
|u + v|2 =  u2 + v2 
u3 + v3
= (u1 + v1 )2 + (u2 + v2 )2 + (u3 + v3 )2
= (u21 + u22 + u23 ) + (v12 + v22 + v32 ) + 2(u1 v1 + u2 v2 + u3 v3 ).
Equating these two expressions for |u + v|2 and rearranging gives the result.
Theorem 3.3 can be used to find the angle θ between two non-zero vectors given in
coordinates. Rearranging the definition of u · v and substituting the formula of Theorem 3.3
we get
u·v u1 v1 + u2 v2 + u3 v3
cos θ = =p 2 .
|u||v| (u1 + u22 + u23 )(v12 + v22 + v32 )
This also shows that for non-zero u and v

 is positive if and only if 0 ≤ θ < π/2
u · v is zero if and only if θ = π/2
is negative if and only if π/2 < θ ≤ π

Proposition 3.4 (Properties of Scalar Product). For any vectors u, v, w and α ∈ R we have
1. u · v = v · u,
2. u · (v + w) = u · v + u · w,
3. (u + v) · w = u · w + v · w,
4. (αu) · v = u · (αv) = α(u · v).
Proof. These are all easy consequences of Theorem 3.3.

3.2 The Equation of a Plane


A plane Π in 3-space can be specified by giving
ˆ a point P on Π
ˆ a non-zero vector n orthogonal to Π.
By n being orthogonal to Π we mean that for any two points A and B on Π, the vector
−→
represented by AB is orthogonal to n.
Let R be a point with position vector r. The point R is on Π if and only if the vector
−→
represented by P R is orthogonal to n. That is, if and only if (r − p) · n = 0. Rearranging
this we get the vector equation for the plane through P and orthogonal to n to be
r · n = p · n.
    
x a p1
In coordinates, letting r =  y , n =  b  (with a, b, c not all 0) and p =  p2 ,
z c p3
we get the Cartesian equation for the plane to be
ax + by + cz = d.
where d = n1 p1 + n2 p2 + n3 p3 . That is to say, the point with coordinates (x, y, z) is on Π if
and only if it satisfies ax + by + cz = d.
3.3. DISTANCE FROM A POINT TO A PLANE 13

3.3 Distance from a Point to a Plane


Let Π be a plane and Q be a point with position vector q. We would like to determine the
distance from Q to Π; that is the distance from Q to M where M is the point on Π which is
closest to Q.  
x
Suppose that Π has equation n ·  y  = d (or equivalently ax + by + cz = d). For M
z
−−→
to be the point of Π closest to Q we need that the vector represented by M Q is orthogonal
to the plane Π and so is a scalar multiple of n. That is, writing m for the position vector of
M , we need q − m = αn for some α ∈ R. This means that
(q − m) · n = (αn) · n
q · n − m · n = α|n|2 .
but M is on Π and so m · n = d. We get
q·n−d
α= .
|n|2
Now, the distance from M to Q is
−−→ |q · n − d|
|M Q| = |q − m| = |α||n| = .
|n|
Note that, as you would expect, this is 0 if q · n = d since in this case Q lies on the plane Π.
We could also use this method to find the position vector of M , the point on Π closest to
Q by  
q·n−d
m = q − αn = q − n.
|n|2
Summarising the above, we have proved the following result:
Proposition 3.5. If the plane Π has equation r · n = d, and the point Q has position vector
q, then the distance between Q and Π is
|q · n − d|
,
|n|
and the point on Π that is closest to Q has position vector
 
q·n−d
q− n.
|n|2

3.4 The vector product



  
u1 v1
Definition 3.6. Given vectors u =  u2  and v =  v2 , the vector product u × v is
u3 v3
defined to be  
u2 v3 − u3 v2
u×v =  u3 v1 − u1 v3 .
u1 v2 − u2 v1
In other words,
u × v = (u2 v3 − u3 v2 )i − (u1 v3 − u3 v1 )j + (u1 v2 − u2 v1 )k .
14 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT

Note in particular that the vector product of two vectors is itself a vector, and that in
general u × v = −v × u.
     
1 −1 11
Example 3.7. If u =  2 , v =  3  then u × v =  −3  and v × u =
  −1 4 5
−11
 3 .
−5
The slightly strange-looking definition of the vector product can be explained geometrically,
by the following result:
Proposition 3.8. Given vectors u and v, the vector product u × v is orthogonal to both u
and v, and its length |u × v| satisfies

|u||v| sin θ if u ̸= 0, v ̸= 0
|u × v| =
0 if u = 0 or v = 0
where θ denotes the angle between u and v (in the case that they are both non-zero).
Proof. To prove orthogonality we need to show that (u × v) · u = 0 and (u × v) · v = 0. We
calculate
  
u2 v3 − u3 v2 u1
(u×v)·u =  u3 v1 − u1 v3 · u2  = (u2 v3 −u3 v2 )u1 +(u3 v1 −u1 v3 )u2 +(u1 v2 −u2 v1 )u3 = 0 ,
u1 v2 − u2 v1 u3
as required, and the calculation showing (u × v) · v = 0 is similar and left as an exercise.
If u = 0 or v = 0 then it is easily seen that u × v = 0, so that |u × v| = 0. If both u
and v are non-zero then we note that
|u × v|2 = |u|2 |v|2 − (u · v)2 , (3.1)
because
|u × v|2 = (u2 v3 − u3 v2 )2 + (u3 v1 − u1 v3 )2 + (u1 v2 − u2 v1 )2
= u22 v32 + u23 v22 + u23 v12 + u21 v32 + u21 v22 + u22 v12 − 2(u2 v3 u3 v2 + u3 v1 u1 v3 + u1 v2 u2 v1 )
= (u21 + u22 + u23 )(v12 + v22 + v32 ) − (u1 v1 + u2 v2 + u3 v3 )2
= |u|2 |v|2 − (u · v)2 .
Now
|u|2 |v|2 − (u · v)2 = |u|2 |v|2 − (|u||v| cos θ)2 = |u|2 |v|2 (1 − cos2 θ) = |u|2 |v|2 sin2 θ ,
so substituting into (3.1) gives |u × v|2 = |u|2 |v|2 sin2 θ, and hence |u × v| = |u||v| sin θ.

3.5 Vector equation of a plane given 3 points on it


Let A, B, C be points in 3-space which do not all lie on a common line. Let a, b, c be their
respective position vectors. Let Π be the plane containing the three points A, B, C. Then
n = (b − a) × (c − a) is orthogonal to Π.
A vector equation for Π is r · n = a · n (since A is on Π and n is orthogonal to Π). This
equation can be written as
r · ((b − a) × (c − a)) = a · ((b − a) × (c − a)) ,
or in other words
(r − a) · ((b − a) × (c − a)) = 0 .
3.6. DISTANCE FROM A POINT TO A LINE 15

3.6 Distance from a point to a line


Let l be the line with vector equation r = p + λu. (Recall that this means the point R with
position vector r lies on l if and only if this equation is satisfied.) The line l has the same
direction as u and goes though the point P with position vector p. Let X be a point with
position vector x. We wish to find the distance from l to X. This means that if M is the
−−→
point on l which is closest to X we need to find |M X|. (Draw a picture.)
−−→
Let v be the vector represented by P X so v = x − p. We may assume that v ̸= 0 since
if v = 0 then X lies on l and we conclude that the distance in question is 0. We now let θ
be the angle between u and v. We have that

−−→ |u × v| |u × (x − p)|
|M X| = |v| sin θ = = .
|u| |u|

Note that when u and v are parallel X lies on l and so the formula above is still valid (it
correctly gives the distance as 0).

3.7 Distance between two lines


Let l1 be the line with vector equation r = p + λu and l2 be the line with vector equation
r = q + µv. We wish to find the distance between l1 and l2 . Note that in contrast to lines
in 2-space, two lines in 3-space will typically neither intersect nor be parallel.
If u = αv for some α ∈ R then l1 and l2 lie in the same direction and we can find the
distance between them by choosing any point A on l1 , and then finding the distance from the
point A to the line l2 (e.g. by using the method of §3.6).
Suppose that u and v are such that we cannot write u = αv for α ∈ R. We wish to
−→
choose the point A on l1 , and the point B on l2 , so that |AB| is minimised. Let w be
−→ −→
the vector represented by AB. Ensuring that |AB| is as small as possible means that w is
orthogonal to both u and v, and so w = α(u × v) for some α ∈ R. Also

w = b − a = q + µv − p − λu

for some λ, µ ∈ R (since A is on l1 and B is on l2 ).


Putting this together and taking the scalar product with u × v we get

α(u × v) · (u × v) = (q + µv − p − λu) · (u × v) = (q − p) · (u × v)

(note that the µv · (u × v) and λu · (u × v) terms are 0 because u and v are orthogonal to
u × v.)
Dividing by (u × v) · (u × v) = |u × v|2 in the above equality gives us

(q − p) · (u × v)
α=
|u × v|2

and therefore
|(q − p) · (u × v)|
|w| = |α||u × v| =
|u × v|
is the distance between the two lines l1 and l2 .
16 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT

3.8 Intersections of Planes and Systems of Linear Equa-


tions
We will write (x, y, z)
 to mean
 the point with coordinates (x, y, z) or equivalently the point
x
with position vector  y .
z
Let Π be the plane with Cartesian equation ax + by + cz = d. The set of points on Π is
{(x, y, z) : ax + by + cz = d}.
We will be interested in the intersection of a collection of planes, that is the set of points
which lie in all of them.
As a warm-up think about how a collection of lines in 2-space may intersect: two lines
will typically intersect in one point but may not intersect (if they are parallel) or intersect in a
whole line (if they are both the same line), three or more lines will typically not intersect but
other configurations are possible.
Similarly, suppose we have k planes in 3-space and want to find all points which lie on all
k of them. This intersection will typically be a line if k = 2, a point if k = 3, and empty if
k ≥ 4, although (as in the lines in 2-space example) there are other possibilities.
Algebraically, we have k planes Π1 , . . . , Πk with Cartesian equations
a1 x + b 1 y + c 1 z = d 1
a2 x + b 2 y + c 2 z = d 2
.. .. ..
. . .
ak x + b k y + c k z = d k .
A point (p, q, r) is in the intersection of the k planes precisely if it a common solution to these
k equations.

3.9 Intersections of other geometric objects


Finding intersections of geometric objects often reduces to finding solutions to collections of
equations of various kinds. Here are two more instances.
ˆ To find the intersection of the plane Π with equation ax + by + cz = d and the line l
with parametric equations 
x = p1 + λu1 
y = p2 + λu2
z = p3 + λu3

we solve
a(p1 + λu1 ) + b(p2 + λu2 ) + c(p3 + λu3 ) = d
for λ. Usually there will be a unique solution (reflecting the fact that a plane and a line
in 3-space typically intersect in a single point). However there are some conditions on
a, b, c, d, p, u (can you work out these conditions?) which mean that either there are no
solutions (corresponding to the case when the line is parallel to the plane), or that every
point on the line gives a solution (corresponding to the case when the line is a subset
of the plane). Substituting the obtained value of λ back into the parametric equations
for the line then gives the coordinates of the point of intersection.
3.9. INTERSECTIONS OF OTHER GEOMETRIC OBJECTS 17

ˆ To find the intersection of the line l1 with parametric equations



x = p1 + λu1 
y = p2 + λu2
z = p3 + λu3

and the line l2 with parametric equations



x = q1 + µv1 
y = q2 + µv2
z = q3 + µv3

we solve 
p1 + λu1 = q1 + µv1 
p2 + λu2 = q2 + µv2
p3 + λu3 = q3 + µv3

or equivalently solve 
λu1 − µv1 = q1 − p1 
λu2 − µv2 = q2 − p2
λu3 − µv3 = q3 − p3

for λ and µ. As there are three equations in two unknowns, there will typically be no
solutions (reflecting the fact that two lines in 3-space typically do not intersect).
18 CHAPTER 3. SCALAR PRODUCT AND VECTOR PRODUCT
Chapter 4

Systems of Linear Equations

Systems of linear equations arise frequently in many areas of the sciences, including physics,
engineering, business, economics, and sociology. Their systematic study also provided part of
the motivation for the development of modern linear algebra at the end of the 19th century.
Linear equations are extremely important and in particular in higher dimensions, one aims to
have a systematic and efficient way to solve them.

4.1 Basic terminology and examples


A linear equation in n unknowns is an equation of the form

a1 x 1 + a2 x 2 + · · · + an x n = b ,

where a1 , . . . , an and b are given real numbers and x1 , . . . , xn are variables.


A system of m linear equations in n unknowns is a collection of equations of the form

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm

where the aij ’s and bi ’s are all real numbers. We also call such systems m × n systems.

Example 4.1.

x1 − x2 = 0
2x1 + x2 = 4 x + x2 − x3 = 3
(a) (b) 1 (c) x1 + x2 = 3 .
3x1 + 2x2 = 7 2x1 − x2 + x3 = 6
x2 = 1

(a) is a 2 × 2 system, (b) is a 2 × 3 system, and (c) is a 3 × 2 system.

A solution of an m × n system is an ordered n-tuple (x1 , x2 , . . . , xn ) of specific real values


of x1 , . . . , xn that satisfies all equations of the system.

Example 4.2. (1, 2) is a solution of Example 4.1 (a).


For each α ∈ R, the 3-tuple (3, α, α) is a solution of Example 4.1 (b) (Exercise: check this!).
Example 4.1 (c) has no solution, since, on the one hand x2 = 1 by the last equation, but the
first equation implies x1 = 1, while the second equation implies x1 = 2, which is impossible.

19
20 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS

A system with no solution is called inconsistent, while a system with at least one solution
is called consistent.
The set of all solutions of a system is called its solution set, which may be empty if the
system is inconsistent.
The basic problem we want to address in this section is the following: given an arbitrary
m × n system, determine its solution set. Later on, we will discuss a procedure that provides
a complete and practical solution to this problem (the so-called ‘Gaussian algorithm’). Before
we encounter this procedure, we require a bit more terminology.
Definition 4.3. Two m × n systems are said to be equivalent, if they have the same solution
set.
Example 4.4. Consider the two systems

5x1 − x2 + 2x3 = −3 5x1 − x2 + 2x3 = −3


(a) x2 = 2 (b) −5x1 + 2x2 − 2x3 = 5 .
3x3 = 6 5x1 − x2 + 5x3 = 3

System (a) is easy to solve: looking at the last equation we find first that x3 = 2; the
second from bottom equation implies x2 = 2; and finally the first equation yields x1 =
(−3 + x2 − 2x3 )/5 = −1. So the solution set of this system is {(−1, 2, 2)}.
To find the solution of system (b), add the first and the second equation. Then x2 = 2,
while subtracting the first from the third equation gives 3x3 = 6, that is x3 = 2. Finally,
the first equation now gives x1 = (−3 + x2 − 2x3 )/5 = −1, so the solution set is again
{(−1, 2, 2)}.
Thus the systems (a) and (b) are equivalent.
In solving system (b) above we have implicitly used the following important observation:
Lemma 4.5. The following operations do not change the solution set of a linear system:
(i) interchanging two equations;

(ii) multiplying an equation by a non-zero scalar;

(iii) adding a multiple of one equation to another.


Proof. (i) and (ii) are obvious. (iii) is a simple consequence of the fact that these equations
are linear equations.
We shall see shortly how to use the above operations systematically to obtain the solution
set of any given linear system. Before doing so, however, we introduce a useful short-hand.
An m × n matrix is a rectangular array of real numbers:
 
c11 · · · c1n
 .. ..  .
 . . 
cm1 · · · cmn

Given an m × n linear system

a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
4.2. GAUSSIAN ELIMINATION 21

we call the array  


a11 ··· a1n b1
 .. .. .. 
 . . . 
am1 · · · amn bm
the augmented matrix of the linear system, and the m × n matrix
 
a11 · · · a1n
 .. .. 
 . . 
am1 · · · amn

the coefficient matrix of the linear system.

Example 4.6.
 
3x1 + 2x2 − x3 = 5 3 2 −1 5
system: augmented matrix: .
2x1 + x3 = −1 2 0 1 −1

A system can be solved by performing operations on the augmented matrix. Corresponding


to the three operations given in Lemma 4.5 we have the following three operations that can
be applied to the augmented matrix, called elementary row operations.

Definition 4.7 (Elementary row operations).


Type I interchanging two rows;
Type II multiplying a row by a non-zero scalar;
Type III adding a multiple of one row to another row.

4.2 Gaussian elimination


Gaussian elimination is a systematic procedure to determine the solution set of a given lin-
ear system. The basic idea is to perform elementary row operations on the corresponding
augmented matrix bringing it to a simpler form from which the solution set is readily obtained.
The simple form alluded to above is given in the following definition.

Definition 4.8. A matrix is said to be in row echelon form if it satisfies the following three
conditions:

(i) All zero rows (consisting entirely of zeros) are at the bottom.

(ii) The first non-zero entry from the left in each nonzero row is a 1, called the leading 1
for that row.

(iii) Each leading 1 is to the right of all leading 1’s in the rows above it.

A row echelon matrix is said to be in reduced row echelon form if, in addition it satisfies
the following condition:

(iv) Each leading 1 is the only nonzero entry in its column

Roughly speaking, a matrix is in row echelon form if the leading 1’s form an echelon (that
is, a ‘steplike’) pattern.
22 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS

Example 4.9. Matrices in row echelon form:


   
1 4 2 1 3 1 0  
0 1 2
0 1 3 , 0 0 1 3 , .
0 0 1
0 0 1 0 0 0 0

Matrices in reduced row echelon form:


     
1 2 0 1 0 1 5 0 2 1 0 3
0 0 1 2 0  , 0 0 1 −1 , 0 1 2 .
0 0 0 0 1 0 0 0 0 0 0 0

The variables corresponding to the leading 1’s of the augmented matrix in row echelon
form will be referred to as the leading variables, the remaining ones as the free variables.

Example 4.10.
 
1 2 3 −4 6
(a) .
0 0 1 2 3
Leading variables: x1 and x3 ; free variables: x2 and x4 .
 
1 0 5
(b) .
0 1 3
Leading variables: x1 and x2 ; no free variables.

Note that if the augmented matrix of a system is in row echelon form, the solution set is
easily obtained.

Example 4.11. Determine the solution set of the systems given by the following augmented
matrices in row echelon form:
 
  1 −2 0 1 2
1 3 0 2
(a) , (b)  0 0 1 −2 1 .
0 0 0 1
0 0 0 0 0

Solution. (a) The corresponding system is

x1 + 3x2 = 2
0 = 1

so the system is inconsistent and the solution set is empty.


(b) The corresponding system is

x1 − 2x2 + x4 = 2
x3 − 2x4 = 1
0 = 0

We can express the leading variables in terms of the free variables x2 and x4 . So set x2 = α
and x4 = β, where α and β are arbitrary real numbers. The second line now tells us that
x3 = 1 + 2x4 = 1 + 2β, and then the first line that x1 = 2 + 2x2 − x4 = 2 + 2α − β. Thus
the solution set is { (2 + 2α − β, α, 1 + 2β, β) | α, β ∈ R }.
It turns out that every matrix can be brought into row echelon form using only elementary
row operations. The procedure is known as the
4.2. GAUSSIAN ELIMINATION 23

Gaussian algorithm:

Step 1 If the matrix consists entirely of zeros, stop — it is already in row echelon form.

Step 2 Otherwise, find the first column from the left containing a non-zero entry (call it a),
and move the row containing that entry to the top position.

Step 3 Now multiply that row by 1/a to create a leading 1.

Step 4 By subtracting multiples of that row from rows below it, make each entry below the
leading 1 zero.

This completes the first row. All further operations are carried out on the other rows.

Step 5 Repeat steps 1-4 on the matrix consisting of the remaining rows

The process stops when either no rows remain at Step 5 or the remaining rows consist of
zeros.

Example 4.12. Solve the following system using the Gaussian algorithm:

x2 + 6x3 =4
3x1 − 3x2 + 9x3 = −3
2x1 + 2x2 + 18x3 = 8

Solution. Performing the Gaussian algorithm on the augmented matrix gives:


    1  
0 1 6 4 3 −3 9 −3 R1 1 −1 3 −1
 3 −3 9 −3  ∼ R1 ↔ R2  0 1 6 4  ∼
3
0 1 6 4 
2 2 18 8 2 2 18 8 2 2 18 8

     
1 −1 3 −1 1 −1 3 −1 1 −1 3 −1
∼ 0 1 6 4  ∼ 0 1 6 4 ∼ 0 1 6 4  ,
R3 − 2R1 0 4 12 10 R3 − 4R2 0 0 −12 −6 − 12 R3 0 0 1 12
1

where the last matrix is now in row echelon form. The corresponding system reads:

x1 − x2 + 3x3 = −1
x2 + 6x3 = 4
x3 = 12

Leading variables are x1 , x2 and x3 ; there are no free variables. The last equation now implies
x3 = 12 ; the second equation from bottom yields x2 = 4−6x  33= 11and finally the first equation
3
yields x1 = −1 + x2 − 3x3 = − 2 . Thus the solution is (− 2 , 1, 2 ) .

A variant of the Gauss algorithm is the Gauss-Jordan algorithm, which brings a matrix to
reduced row echelon form:
24 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS

Gauss-Jordan algorithm

Step 1 Bring matrix to row echelon form using the Gaussian algorithm.

Step 2 Find the row containing the first leading 1 from the right, and add suitable multiples
of this row to the rows above it to make each entry above the leading 1 zero.

This completes the first non-zero row from the bottom. All further operations are carried out
on the rows above it.

Step 3 Repeat steps 1-2 on the matrix consisting of the remaining rows.
Example 4.13. Solve the following system using the Gauss-Jordan algorithm:
x1 + x2 + x3 + x4 + x5 = 4
x1 + x2 + x3 + 2x4 + 2x5 = 5
x1 + x2 + x3 + 2x4 + 3x5 = 7
Solution.Performing the Gauss-Jordan algorithm on the augmented matrix gives:
     
1 1 1 1 1 4 1 1 1 1 1 4 1 1 1 1 1 4
1 1 1 2 2 5  ∼ R2 − R1  0 0 0 1 1 1  ∼ 0 0 0 1 1 1
1 1 1 2 3 7 R3 − R1 0 0 0 1 2 3 R3 − R2 0 0 0 0 1 2
   
R1 − R3 1 1 1 1 0 2 R1 − R2 1 1 1 0 0 3
∼ R2 − R3  0 0 0 1 0 −1  ∼  0 0 0 1 0 −1  ,
0 0 0 0 1 2 0 0 0 0 1 2
where the last matrix is now in reduced row echelon form. The corresponding system reads:
x1 + x2 + x3 = 3
x4 = −1
x5 = 2
Leading variables are x1 , x4 , and x5 ; free variables x2 and x3 . Now set x2 = α and
x3 = β, and solve for the leading variables starting from the last equation. This yields
x5 = 2, x4 = −1, and finally x1 = 3 − x2 − x3 = 3 − α − β. Thus the solution set is
{ (3 − α − β, α, β, −1, 2) | α, β ∈ R }.
We have just seen that any matrix can be brought to (reduced) row echelon form using
only elementary row operations, and moreover that there is an explicit procedure to achieve
this (namely the Gaussian and Gauss-Jordan algorithm). We record this important insight for
later use:
Theorem 4.14.
(a) Every matrix can be brought to row echelon form by a series of elementary row opera-
tions.
(b) Every matrix can be brought to reduced row echelon form by a series of elementary row
operations.
Proof. For (a): apply the Gaussian algorithm; for (b): apply the Gauss-Jordan algorithm.
Remark 4.15. It can be shown (but not in this module) that the reduced row echelon form
of a matrix is unique. On the contrary, this is not the case for just the row echelon form.
The remark above implies that if a matrix is brought to reduced row echelon form by
any sequence of elementary row operations (that is, not necessarily by those prescribed by the
Gauss-Jordan algorithm) the leading ones will nevertheless always appear in the same positions.
4.3. SPECIAL CLASSES OF LINEAR SYSTEMS 25

4.3 Special classes of linear systems


In this last section of the chapter we’ll have a look at a number of special types of linear
systems and derive the first important consequences of the fact that every matrix can be
brought to row echelon form by a series of elementary row operations.
We start with the following classification of linear systems:

Definition 4.16. An m × n linear system is said to be

ˆ overdetermined if it has more equations than unknowns (i.e. m > n);

ˆ underdetermined if it has fewer equations than unknowns (i.e. m < n).

Note that overdetermined systems are usually (but not necessarily) inconsistent. Under-
determined systems may or may not be consistent. However, if they are consistent, then they
necessarily have infinitely many solutions:

Theorem 4.17. If an underdetermined system is consistent, it must have infinitely many


solutions.

Proof. Note that the row echelon form of the augmented matrix of the system has r ≤ m
non-zero rows. Thus there are r leading variables, and consequently n − r ≥ n − m > 0 free
variables.
Another useful classification of linear systems is the following:

Definition 4.18. A linear system

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. (4.1)
.
am1 x1 + am2 x2 + · · · + amn xn = bm

is said to be homogeneous if bi = 0 for all i. Otherwise it is said to be inhomogeneous.


Given an inhomogeneous system (4.1), call the system obtained by setting all bi ’s to zero,
the associated homogeneous system .

Example 4.19.

3x1 + 2x2 + 5x3 = 2 3x1 + 2x2 + 5x3 = 0


2x1 − x2 + x3 = 5 2x1 − x2 + x3 = 0
| {z } | {z }
inhomogeneous system associated homogeneous system

The first observation about homogeneous systems is that they always have a solution, the
so-called trivial or zero solution: (0, 0, . . . , 0).
For later use we record the following useful consequence of the previous theorem on con-
sistent homogeneous systems:

Theorem 4.20. An underdetermined homogeneous system always has non-trivial solutions.

Proof. We just observed that a homogeneous system is consistent. Thus, if the system is
underdetermined and homogeneous, it must have infinitely many solutions by Theorem 4.17,
hence, in particular, it must have a non-zero solution.
26 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS

Our final result in this section is devoted to the special case of n × n systems. For such
systems there is a delightful characterisation of the existence and uniqueness of solutions of a
given system in terms of the associated homogeneous systems. At the same time, the proof of
this result serves as another illustration of the usefulness of the row echelon form for theoretical
purposes.

Theorem 4.21. An n × n system is consistent and has a unique solution, if and only if the
only solution of the associated homogeneous system is the zero solution.

Proof. Follows from the following two observations:

ˆ The same sequence of elementary row operations that brings the augmented matrix
of a system to row echelon form, also brings the augmented matrix of the associated
homogeneous system to row echelon form, and vice versa.

ˆ An n×n system in row echelon form has a unique solution precisely if there are n leading
variables.

Thus, if an n × n system is consistent and has a unique solution, the corresponding homoge-
neous system must have a unique solution, which is necessarily the zero solution.
Conversely, if the associated homogeneous system of a given system has the zero solution
as its unique solution, then the original inhomogeneous system must have a solution, and this
solution must be unique.
Chapter 5

Matrices

In this chapter we give basic rules and definitions that are necessary for doing calculations with
matrices in an efficient way. We will then consider the inverse of a matrix, the transpose of
a matrix, and what is meant by the concept of a symmetric matrix. A highlight in the later
sections is the Invertible Matrix Theorem.

5.1 Matrices and basic properties


An m × n matrix A is a rectangular array of scalars (real numbers)
 
a11 · · · a1n
 .. ..  .
 . . 
am1 · · · amn
We write A = (aij )m×n or simply A = (aij ) to denote an m × n matrix whose (i, j)-entry is
aij , i.e. aij is the i-th row and in the j-th column.
If A = (aij )m×n we say that A has size m × n. An n × n matrix is said to be square.
Example 5.1. If  
1 3 2
A= ,
−2 4 0
then A is a matrix of size 2 × 3. The (1, 2)-entry of A is 3 and the (2, 3)-entry of A is 0.
Definition 5.2 (Equality). Two matrices A and B are equal, and we write A = B, if they
have the same size and aij = bij where A = (aij ) and B = (bij ).
Definition 5.3 (Scalar multiplication). If A = (aij )m×n and α is a scalar, then αA is the
m × n matrix whose (i, j)-entry is αaij .
Definition 5.4 (Addition). If A = (aij )m×n and B = (bij )m×n then the sum A + B of A
and B is the m × n matrix whose (i, j)-entry is aij + bij .
Example 5.5. Let    
2 3 0 1
A = −1 2 and B =  2 3 .
4 0 −2 1
Then      
6 9 0 2 6 11
3A + 2B = −3 6 +  4 6 = 1 12 .
12 0 −4 2 8 2

27
28 CHAPTER 5. MATRICES

Definition 5.6 (Zero matrix). We write Om×n or simply O (if the size is clear from the
context) for the m × n matrix all of whose entries are zero, and call it a zero matrix.
Scalar multiplication and addition of matrices satisfy the following rules:
Theorem 5.7. Let A, B and C be matrices of the same size, and let α and β be scalars.
Then:
(a) A + B = B + A;
(b) A + (B + C) = (A + B) + C;
(c) A + O = A;
(d) A + (−A) = O, where −A = (−1)A;
(e) α(A + B) = αA + αB;
(f) (α + β)A = αA + βA;
(g) (αβ)A = α(βA);
(h) 1A = A.
Proof. We prove part (b) only, leaving the other parts as exercises.
For part (b), B + C is an m × n matrix and so A + (B + C) is an m × n matrix.
The ij-entry of B + C is bij + cij and so the ij-entry of A + (B + C) is aij + (bij + cij ).
Similarly, A + B is an m × n matrix and so (A + B) + C is an m × n matrix.
The ij-entry of A + B is aij + bij and so the ij-entry of (A + B) + C is (aij + bij ) + cij .
Since aij + (bij + cij ) = (aij + bij ) + cij we have that A + (B + C) = (A + B) + C.
Example 5.8. Simplify 2(A + 3B) − 3(C + 2B), where A, B, and C are matrices with the
same size.
Solution.
2(A + 3B) − 3(C + 2B) = 2A + 2 · 3B − 3C − 3 · 2B = 2A + 6B − 3C − 6B = 2A − 3C .

Definition 5.9 (Matrix multiplication). If A = (aij ) is an m × n matrix and B = (bij ) is an


n × p matrix then the product AB of A and B is the m × p matrix C = (cij ) with
n
X
cij = aik bkj .
k=1

Example 5.10. Compute the (1, 3)-entry and the (2, 4)-entry of AB, where
 
  2 1 6 0
3 −1 2
A= and B =  0 2 3 4 .
0 1 4
−1 0 5 8
Solution.
(1, 3)-entry: 3 · 6 + (−1) · 3 + 2 · 5 = 25;
(2, 4)-entry: 0 · 0 + 1 · 4 + 4 · 8 = 36.
Definition 5.11 (Identity matrix). An identity matrix I is a square matrix with 1’s on the
diagonal and zeros elsewhere. If we want to emphasise its size we write In for the n × n
identity matrix.
5.1. MATRICES AND BASIC PROPERTIES 29

Matrix multiplication satisfies the following rules:


Theorem 5.12. Let A = (aij )m×n , B = (bij )m×n , C = (cij )n×p and D = (dij )n×p be
matrices and α ∈ R. Then,
(a) (A + B)C = AC + BC and A(C + D) = AC + AD;
(b) α(AC) = (αA)C = A(αC);
(c) Im A = AIn = A;
Let X = (xij )m×n , Y = (yij )n×p and Z = (zij )p×q . Then,
(d) (XY )Z = X(Y Z).
Proof. Again we will just prove selected parts, the remainder are similar and left as exercises.

(a) Let A + B = M = (mij )m×n so mij = aij + bij . Now, M C is an m × p matrix. Also
AC and BC are m × p matrices and so AC + BC is an m × p matrix.
The ij-entry of M C is
n
X X
mik ckj = (aik + bik )ckj
k=1
n
X n
X
= aik ckj + bik ckj
k=1 k=1
= (ij-entry of AC) + (ij-entry of BC)
= (ij-entry of AC + BC)
It follows that (A + B)C = AC + BC.
The second identity in part (a) is proved in a similar way
(c) Im A is an m × n matrix with ij-entry
0 × a1j + 0 × a2j + · · · + 1 × aij + · · · + 0 × amj = aij .
(where we are multiplying the aij by the entries in row i of In ). So Im A = A.
Similarly AIn is an m × n matrix with ij-entry
ai1 × 0 + ai2 × 0 + · · · + aij × 1 + · · · + ain × 0 = aij .
(where we are multiplying the aij by the entries in column j of In ). So AIn = A.
(d) Both (XY )Z and X(Y Z) are m × q matrices.
Let XY = T = (tij )m×p so
tij = xi1 y1j + xi2 y2j + · · · + xin ynj .
Now (XY )Z = T Z has ij-entry
ti1 z1j + ti2 z2j + · · · + tip zpj = (xi1 y11 + xi2 y21 + · · · + xin yn1 )z1j
+ (xi1 y12 + xi2 y22 + · · · + xin yn2 )z2j
+ ...
(xi1 y1p + xi2 y2p + · · · + xin ynp )zpj .
30 CHAPTER 5. MATRICES

Expanding out the brackets we get that this sum consists of all terms xir yrs zsj where r
ranges over 1, . . . , n and s ranges over 1, . . . , p. Equivalently,
p
n X
X
The ij-entry of T Z = xir yrs zsj .
r=1 s=1

A similar calculation of X(Y Z) = ZS where Y Z = S gives that the ij-entry of X(Y Z)


is the same sum.
This completes the proof.

Notation 5.13.

ˆ Since X(Y Z) = (XY )Z, we can omit the brackets and simply write XY Z and similarly
for products of more than three factors.

ˆ If A is a square matrix we write Ak = AA · · · A} for the k-th power of A.


| {z
k factors

Warning: In general AB ̸= BA, even if AB and BA have the same size!

Example 5.14.     
1 0 0 1 0 1
=
0 0 0 0 0 0
but     
0 1 1 0 0 0
=
0 0 0 0 0 0
Definition 5.15. If A and B are two matrices with AB = BA, then A and B are said to
commute.
We now come to the important notion of an inverse of a matrix.
Definition 5.16. If A is a square matrix, a matrix B is called an inverse of A if

AB = I and BA = I .

A matrix that has an inverse is called invertible.


Note that not every matrix is invertible. For example the matrix
 
1 0
A=
0 0

cannot have an inverse since for any 2 × 2 matrix B = (bij ) we have


    
1 0 b11 b12 b11 b12
AB = = ̸= I2 .
0 0 b21 b22 0 0

Later on in this chapter we shall discuss an algorithm that lets us decide whether a matrix
is invertible. If the matrix is invertible then this algorithm also tells us exactly what the inverse
is. If a matrix is invertible then its inverse is unique, by the following result:
5.2. TRANSPOSE OF A MATRIX 31

Theorem 5.17. If B and C are both inverses of A, then B = C.

Proof. Since B and C are inverses of A we have AB = I and CA = I. Thus

B = IB = (CA)B = C(AB) = CI = C .

If A is an invertible matrix, the unique inverse of A is denoted by A−1 . Hence A−1 (if it
exists!) is a square matrix of the same size as A with the property that

AA−1 = A−1 A = I .

Note that the above equality implies that if A is invertible, then its inverse A−1 is also invertible
with inverse A, that is,
(A−1 )−1 = A .
Slightly deeper is the following result:

Theorem 5.18. If A and B are invertible matrices of the same size, then AB is invertible
and
(AB)−1 = B −1 A−1 .

Proof. Observe that

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I ,

(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 IB = B −1 B = I .


Thus, by definition of invertibility, AB is invertible with inverse B −1 A−1 .

5.2 Transpose of a matrix


The first new concept we encounter is the following:

Definition 5.19. The transpose of an m × n matrix A = (aij ) is the n × m matrix B = (bij )


given by
bij = aji
The transpose of A is denoted by AT .

Example 5.20.  
  1 4
1 2 3
(a) A = ⇒ AT = 2 5
4 5 6
3 6
   
1 2 T 1 3
(b) B = ⇒B =
3 −1 2 −1
32 CHAPTER 5. MATRICES

Matrix transposition satisfies the following rules:

Theorem 5.21. Assume that α is a scalar and that A, B, and C are matrices so that the
indicated operations can be performed. Then:

(a) (AT )T = A;

(b) (αA)T = α(AT );

(c) (A + B)T = AT + B T ;

(d) (AB)T = B T AT .

Proof. (a) is obvious while (b) and (c) are proved as a Coursework exercise. For the proof
of (d) assume A = (aij )m×n and B = (bij )n×p and write AT = (ãij )n×m and B T = (b̃ij )p×n
where
ãij = aji and b̃ij = bji .
Notice that (AB)T and B T AT have the same size, so it suffices to show that they have the
same entries. Now, the (i, j)-entry of B T AT is
n
X n
X n
X
b̃ik ãkj = bki ajk = ajk bki ,
k=1 k=1 k=1

which is the (j, i)-entry of AB, that is, the (i, j)-entry of (AB)T . Thus B T AT = (AB)T .

Transposition ties in nicely with invertibility:

Theorem 5.22. Let A be invertible. Then AT is invertible and

(AT )−1 = (A−1 )T .

Proof. See a Coursework exercise.

5.3 Special types of square matrices


In this section we briefly introduce a number of special classes of matrices which will be studied
in more detail later in this course.

Definition 5.23. A matrix is said to be symmetric if AT = A.

Note that a symmetric matrix is necessarily square.

Example 5.24.  
1 2 4  
5 2
symmetric: 2 −1 3 , .
2 −1
4 3 0
 
2 2 4  
1 1 1
not symmetric: 2 2 3 .
1 1 1
1 3 5
5.4. COLUMN VECTORS OF DIMENSION n 33

Symmetric matrices play an important role in many parts of pure and applied Mathematics
as well as in some other areas of science, for example in quantum physics. Some of the reasons
for this will become clearer towards the end of this course, when we shall study symmetric
matrices in much more detail.
Some other useful classes of square matrices are the triangular ones, which will also play
a role later on in the course.
Definition 5.25. A square matrix A = (aij ) is said to be
upper triangular if aij = 0 for i > j;
strictly upper triangular if aij = 0 for i ≥ j;
lower triangular if aij = 0 for i < j;
strictly lower triangular if aij = 0 for i ≤ j;
diagonal if aij = 0 for i ̸= j.
If A = (aij ) is a square matrix of size n × n, we call a11 , a22 , . . . , ann the diagonal entries
of A. So, informally speaking, a matrix is upper triangular if all the entries below the diagonal
entries are zero, and it is strictly upper triangular if all entries below the diagonal entries and
the diagonal entries itself are zero. Similarly for (strictly) lower triangular matrices.
Example 5.26.
 
  1 0 0 0
1 2 0 3 0 0
upper triangular: , diagonal:  
0 3 0 0 5 0
0 0 0 3
 
0 0 0
strictly lower triangular: −1 0 0 .
2 3 0
We close this section with the following two observations:
Theorem 5.27. The sum and product of two upper triangular matrices of the same size is
upper triangular.
Proof. See a Coursework exercise.

5.4 Column vectors of dimension n


Although vectors in 3-space were originally defined geometrically, recall that the introduction
of coordinates allowed usto think
 of vectors as lists of numbers.
 a 
Let us write R3 =  b  : a, b, c ∈ R for the set of all vectors in 3-space thought of
c
 
in coordinate form.
More generally, let us write
  

 a1 

 a2 
 

n
R =  ..  : a1 , a2 , . . . , an ∈ R .
 
 . 
 


 a 

n

We call this the set of all (column) vectors of dimension n (or the set of n-dimensional
(column) vectors). In particular, a column vector of dimension n is just an n × 1 matrix.
34 CHAPTER 5. MATRICES

We can extend our definitions of how to add two vectors and multiply a vector by a scalar
to Rn by letting:
         
a1 b1 a1 + b 1 a1 αa1
 a2   b2   a2 + b2   a2   αa2 
 ..  +  ..  =  and α  ..  =  .. 
         
.. 
 .   .   .   .   . 
an bn an + b n an αan

(that is define addition and scalar multiplication coordinate-wise).


We denote the zero vector in Rn by 0n (or simply 0 if n is clear from the context).
Note that our definition of the scalar product u·v was only made for vectors in R3 although
we could extend it to work in Rn (using the formula in coordinates). Our definition of the
vector product u × v was only made for vectors in R3 and, in contrast, cannot be extended
to Rn .
Working in Rn for n > 3 we lose some geometric intuition. However, the mathematics
still makes sense
 and can
 be useful. For instance under some circumstances we may want to
1/2
 1/4 
use the vector  1/8  to represent the probability distribution

1/8

k 1 2 3 4
P (X = k) 1/2 1/4 1/8 1/8

5.5 Linear systems in matrix notation


We shall now have another look at systems of linear equations, this time using the language
of matrices to study them.
Suppose that we are given an m × n linear system

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. . (5.1)
.
am1 x1 + am2 x2 + · · · + amn xn = bm

The reformulation is based on the observation that we can write this system as a single matrix
equation
Ax = b , (5.2)
where
     
a11 ··· a1n x1 b1
 .. ..  ,  ..   .. 
A= . . 
n
x =  .  ∈ R , and b =  .  ∈ Rm ,
am1 · · · amn xn bm

and where Ax is interpreted as the matrix product of A and x.


Example 5.28. Using matrix notation the system

2x1 − 3x2 + x3 = 2
3x1 − x3 = −1
5.6. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 35

can be written  
  x1  
2 −3 1   2
x2 = .
3 0 −1 −1
| {z } x 3 | {z }
=A | {z } =b
=x

Apart from obvious notational economy, writing (5.1) in the form (5.2) has a number of
other advantages which will become clearer shortly.

5.6 Elementary matrices and the Invertible Matrix The-


orem
Using the reformulation of linear systems discussed in the previous section we shall now have
another look at the process of solving them. Instead of performing elementary row operations
we shall now view this process in terms of matrix multiplication. This will shed some light
on both matrices and linear systems and will be useful for formulating and proving the main
result of this chapter, the Invertible Matrix Theorem, which will be presented towards the end
of this section. Before doing so, however, we shall consider the effect of multiplying both sides
of a linear system in matrix form by an invertible matrix.

Lemma 5.29. Let A be an m × n matrix and let b ∈ Rm . Suppose that M is an invertible


m × m matrix. The following two systems are equivalent (i.e. they have the same set of
solutions):
Ax = b (5.3)

M Ax = M b (5.4)

Proof. Note that if x satisfies (5.3), then it clearly satisfies (5.4). Conversely, suppose that x
satisfies (5.4), that is,
M Ax = M b .

Since M is invertible, we may multiply both sides of the above equation by M −1 from the left
to obtain
M −1 M Ax = M −1 M b ,

so IAx = Ib, and hence Ax = b, that is, x satisfies (5.3).

We now come back to the idea outlined at the beginning of this section. It turns out that
we can ‘algebraize’ the process of applying an elementary row operation to a matrix A by
left-multiplying A by a certain type of matrix, defined as follows:

Definition 5.30. An elementary matrix of type I (respectively, type II, type III) is a
matrix obtained by applying an elementary row operation of type I (respectively, type II, type
III) to an identity matrix.
36 CHAPTER 5. MATRICES

Example 5.31.
 
0 1 0
type I: E1 = 1 0 0 (take I3 and swap rows 1 and 2)
0 0 1
 
1 0 0
type II: E2 = 0
 1 0 (take I3 and multiply row 3 by 4)
0 0 4
 
1 0 2
type III: E3 = 0
 1 0 (take I3 and add 2 times row 3 to row 1)
0 0 1

Let us now consider the effect of left-multiplying an arbitrary 3 × 3 matrix A in turn by


each of the three elementary matrices given in the previous example.

Example 5.32. Let A = (aij )3×3 and let El (l = 1, 2, 3) be defined as in the previous example.
Then
    
0 1 0 a11 a12 a13 a21 a22 a23
E1 A = 1 0 0 a21 a22 a23  = a11 a12 a13  ,
0 0 1 a31 a32 a33 a31 a32 a33
    
1 0 0 a11 a12 a13 a11 a12 a13
E2 A = 0 1 0 a21 a22 a23  =  a21 a22 a23  ,
0 0 4 a31 a32 a33 4a31 4a32 4a33
    
1 0 2 a11 a12 a13 a11 + 2a31 a12 + 2a32 a13 + 2a33
E3 A = 0 1 0 a21 a22 a23  =  a21 a22 a23 .
0 0 1 a31 a32 a33 a31 a32 a33

You should now pause and marvel at the following observation: interchanging rows 1 and 2
of A produces E1 A, multiplying row 3 of A by 4 produces E2 A, and adding 2 times row 3 to
row 1 of A produces E3 A.

This example should convince you of the truth of the following theorem, the proof of which
will be omitted as it is straightforward, slightly lengthy and not particularly instructive.

Theorem 5.33. If E is an m × m elementary matrix obtained from I by an elementary row


operation, then left-multiplying an m × n matrix A by E has the effect of performing that
same row operation on A.

Slightly deeper is the following:

Theorem 5.34. If E is an elementary matrix, then E is invertible and E −1 is an elementary


matrix of the same type.

Proof. The assertion follows from the previous theorem and the observation that an elementary
row operation can be reversed by an elementary row operation of the same type. More precisely,

ˆ if two rows of a matrix are interchanged, then interchanging them again restores the
original matrix;

ˆ if a row is multiplied by α ̸= 0, then multiplying the same row by 1/α restores the
original matrix;
5.6. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 37

ˆ if α times row q has been added to row r, then adding −α times row q to row r restores
the original matrix.
Now, suppose that E was obtained from I by a certain row operation. Then, as we just
observed, there is another row operation of the same type that changes E back to I. Thus
there is an elementary matrix F of the same type as E such that F E = I. A moment’s
thought shows that EF = I as well, since E and F correspond to reverse operations. All in
all, we have now shown that E is invertible and its inverse E −1 = F is an elementary matrix
of the same type.
Example 5.35. Determine the inverses of the elementary matrices E1 , E2 , and E3 in Exam-
ple 5.31.
Solution. In order to transform E1 into I we need to swap rows 1 and 2 of E1 . The elementary
matrix that performs this feat is
 
0 1 0
E1−1 = 1 0 0 .
0 0 1

Similarly, in order to transform E2 into I we need to multiply row 3 of E2 by 41 . Thus


 
1 0 0
E2−1 = 0 1 0  .
0 0 41

Finally, in order to transform E3 into I we need to add −2 times row 3 to row 1, and so
 
1 0 −2
E3−1 = 0 1 0  .
0 0 1

Before we come to the main result of this chapter we need some more terminology:
Definition 5.36. A matrix B is row equivalent to a matrix A if there exists a finite sequence
E1 , E2 , . . . , Ek of elementary matrices such that

B = Ek Ek−1 · · · E1 A .

In other words, B is row equivalent to A if and only if B can be obtained from A by a


finite number of row operations. In particular, two augmented matrices (A|b) and (B|c) are
row equivalent if and only if Ax = b and Bx = c are equivalent systems.
The following properties of row equivalent matrices are easily established:
Fact 5.37.

(a) A is row equivalent to itself;

(b) if A is row equivalent to B, then B is row equivalent to A;

(c) if A is row equivalent to B, and B is row equivalent to C, then A is row equivalent to


C.
38 CHAPTER 5. MATRICES

Property (b) follows from Theorem 5.34. Details of the proof of (a), (b), and (c) are left
as an exercise.
We are now able to formulate and prove a delightful characterisation of invertibility of
matrices. More precisely, the following theorem provides three equivalent conditions for a
matrix to be invertible (and later on in this module we will encounter one further equivalent
condition).
Before stating the theorem we recall that the zero vector, denoted by 0, is the column
vector all of whose entries are zero.

Theorem 5.38 (Invertible Matrix Theorem). Let A be a square n × n matrix. The following
are equivalent:

(a) A is invertible;

(b) Ax = 0 has only the trivial solution;

(c) A is row equivalent to I;

(d) A is a product of elementary matrices.

Proof. We shall prove this theorem using a cyclic argument: we shall first show that (a)
implies (b), then (b) implies (c), then (c) implies (d), and finally that (d) implies (a). This is
a frequently used trick to show the logical equivalence of a list of assertions.
(a) ⇒ (b): Suppose that A is invertible. If x satisfies Ax = 0, then

x = Ix = (A−1 A)x = A−1 0 = 0 ,

so the only solution of Ax = 0 is the trivial solution.


(b) ⇒ (c): Use elementary row operations to bring the system Ax = 0 to the form
U x = 0, where U is in row echelon form. Since, by hypothesis, the solution of Ax = 0 and
hence the solution of U x = 0 is unique, there must be exactly n leading variables. Thus U is
upper triangular with 1’s on the diagonal, and hence, the reduced row echelon form of U is I.
Thus A is row equivalent to I.
(c) ⇒ (d): If A is row equivalent to I, then there is a sequence E1 , . . . , Ek of elementary
matrices such that
A = Ek Ek−1 · · · E1 I = Ek Ek−1 · · · E1 ,
that is, A is a product of elementary matrices.
(d) ⇒ (a). If A is a product of elementary matrices, then A must be invertible, since
elementary matrices are invertible by Theorem 5.34 and since the product of invertible matrices
is invertible by Theorem 5.18.
An immediate consequence of the previous theorem is the following perhaps surprising
result:

Corollary 5.39. Suppose that A and C are square matrices such that CA = I. Then also
AC = I; in particular, both A and C are invertible with C = A−1 and A = C −1 .

Proof. To show that A is invertible, by the Invertible Matrix Theorem it suffices to show
that the only solution of Ax = 0 is the trivial one. To show this, note that if Ax = 0
then x = Ix = CAx = C0 = 0, as required, so A is indeed invertible. Then note that
C = CI = CAA−1 = IA−1 = A−1 , so both A and C are invertible, and are the inverses of
each other.
5.7. GAUSS-JORDAN INVERSION 39

What is surprising about this result is the following: suppose we are given a square matrix
A. If we want to check that A is invertible, then, by the definition of invertibility, we need
to produce a matrix B such that AB = I and BA = I. The above corollary tells us that
if we have a candidate C for an inverse of A it is enough to check that either AC = I or
CA = I in order to guarantee that A is invertible with inverse C. This is a non-trivial fact
about matrices, which is often useful.

5.7 Gauss-Jordan inversion


The Invertible Matrix Theorem provides a simple method for inverting matrices. Recall that
the theorem states (amongst other things) that if A is invertible, then A is row equivalent to
I. Thus there is a sequence E1 , . . . Ek of elementary matrices such that
Ek Ek−1 · · · E1 A = I .
Multiplying both sides of the above equation by A−1 from the right yields
Ek Ek−1 · · · E1 = A−1 ,
that is,
Ek Ek−1 · · · E1 I = A−1 .
Thus, the same sequence of elementary row operations that brings an invertible matrix to
I, will bring I to A−1 . This gives a practical algorithm for inverting matrices, known as
Gauss-Jordan inversion.
Note that in the following we use a slight generalisation of the augmented matrix notation.
Given an m × n matrix A and an m-dimensional vector b we currently use (A|b) to denote
the m × (n + 1) matrix consisting of A with b attached as an extra column to the right of A,
and a vertical line in between them. Suppose now that B is an m × r matrix then we write
(A|B) for the m × (n + r) matrix consisting of A with B attached to the right of A, and a
vertical line separating them.
Gauss-Jordan inversion
Bring the augmented matrix (A|I) to reduced row echelon form. If A is row equivalent to I,
then (A|I) is row equivalent to (I|A−1 ). Otherwise, A does not have an inverse.
Example 5.40. Show that  
1 2 0
A = 2 5 3
0 3 8
is invertible and compute A−1 .
Solution. Using Gauss-Jordan inversion we find
   
1 2 0 1 0 0 1 2 0 1 0 0
 2 5 3 0 1 0  ∼ R2 − 2R1  0 1 3 −2 1 0
0 3 8 0 0 1 0 3 8 0 0 1
   
1 2 0 1 0 0 1 2 0 1 0 0
∼  0 1 3 −2 1 0 ∼   0 1 3 −2 1 0 
R3 − 3R2 0 0 −1 6 −3 1 (−1)R3 0 0 1 −6 3 −1
   
1 2 0 1 0 0 R1 − 2R2 1 0 0 −31 16 −6
∼ R2 − 3R3  0 1 0 16 −8 3  ∼  0 1 0 16 −8 3  .
0 0 1 −6 3 −1 0 0 1 −6 3 −1
40 CHAPTER 5. MATRICES

Thus A is invertible (because it is row equivalent to I3 ) and


 
−31 16 −6
A−1 =  16 −8 3  .
−6 3 −1
5.7. GAUSS-JORDAN INVERSION 41
42 CHAPTER 5. MATRICES
Chapter 6

Determinants

We will define the important concept of a determinant, which is a useful invariant for general
n × n matrices. We will discuss the most important properties of determinants, and illustrate
what they are good for and how calculations involving determinants can be simplified.

6.1 Determinants of 2 × 2 and 3 × 3 matrices


To every 2 × 2 matrix A we associate a scalar, called the determinant of A, which is given by
a certain sum of products of the entries of A:

Definition 6.1. Let A = (aij ) be a 2 × 2 matrix. The determinant of A, denoted det(A),


is defined by
a a
det(A) = 11 12 = a11 a22 − a21 a12 . (6.1)
a21 a22
Although this definition of the determinant may look strange and non-intuitive, one of the
main motivations for introducing it is that it allows us to decide whether a matrix is invertible
or not.

Theorem 6.2. If A and B are 2 × 2 matrices then

(a) det(AB) = det(A) det(B),

(b) det(A) ̸= 0 if and only if A is invertible,


 
a c
(c) If A = is invertible then
b d
 
−1 1 d −c
A = . (6.2)
det(A) −b a

Proof. (a) can be proved by a direct calculation.


To prove (b) first note that if det(A) = 0 but A is invertible then 1 = det(I) =
det(AA−1 ) = det(A) det(A−1 ) = 0, which is a contradiction, so A is not invertible. If
on the other hand det(A) ̸= 0 then the matrix C given by the righthand side of (6.2) is
well-defined, and we can calculate directly that CA = I. From Corollary 5.39 it follows that
AC = I (or alternatively we could show that AC = I by direct calculation as well). Thus A
is invertible, so (b) is proved. In fact we have proved (c) as well: the fact that AC = I = CA
means that C = A−1 .

43
44 CHAPTER 6. DETERMINANTS

Our goal in this chapter is to introduce determinants for square matrices of any size, study
some of their properties, and then prove the generalisation of the above theorem. However,
before considering this very general definition, let us move to the case of 3 × 3 determinants:

Definition 6.3. If A = (aij ) is a 3 × 3 matrix, its determinant det(A) is defined by

a11 a12 a13


a a a a a a
det(A) = a21 a22 a23 = a11 22 23 − a21 12 13 + a31 12 13 (6.3)
a32 a33 a32 a33 a22 a23
a31 a32 a33
= a11 a22 a33 − a11 a32 a23 − a21 a12 a33 + a21 a32 a13 + a31 a12 a23 − a31 a22 a13 .

Notice that the determinant of a 3 × 3 matrix A is given in terms of the determinants of


certain 2 × 2 submatrices of A.

6.2 General definition of determinants


In general, we shall see that the determinant of a 4 × 4 matrix is given in terms of the
determinants of 3 × 3 submatrices, and so forth. Before stating the general definition we
introduce a convenient short-hand:

Notation 6.4. For any square matrix A, let Aij denote the submatrix formed by deleting the
i-th row and the j-th column of A.

Example 6.5. If  
3 2 5 −1
−2 9 0 6
A=
 7 −2 −3 1  ,

4 −5 8 −4
then  
3 2 −1
A23 = 7 −2 1  .
4 −5 −4

If we now define the determinant of a 1 × 1 matrix A = (aij ) by det(A) = a11 , we can


re-write (6.1) and (6.3) as follows:

ˆ if A = (aij )2×2 then

det(A) = a11 det(A11 ) − a21 det(A21 ) ;

ˆ if A = (aij )3×3 then

det(A) = a11 det(A11 ) − a21 det(A21 ) + a31 det(A31 ) .

This observation motivates the following recursive definition:

Definition 6.6. Let A = (aij ) be an n × n matrix. The determinant of A, written det(A),


is defined as follows:

ˆ If n = 1, then det(A) = a11 .


6.2. GENERAL DEFINITION OF DETERMINANTS 45

ˆ If n > 1 then det(A) is the sum of n terms of the form ±ai1 det(Ai1 ), with plus and
minus signs alternating, and where the entries a11 , a21 , . . . , an1 are from the first column
of A. In symbols:
det(A) = a11 det(A11 ) − a21 det(A21 ) + · · · + (−1)n+1 an1 det(An1 )
Xn
= (−1)i+1 ai1 det(Ai1 ) .
i=1

Example 6.7. Compute the determinant of


 
0 0 7 −5
−2 9 6 −8
A= .
0 0 −3 2 
0 3 −1 4
Solution.
0 0 7 −5
0 7 −5
−2 9 6 −8 7 −5
= −(−2) 0 −3 2 = 2 · 3 = 2 · 3 · [7 · 2 − (−3) · (−5)] = −6 .
0 0 −3 2 −3 2
3 −1 4
0 3 −1 4

To state the next theorem, it will be convenient to write the definition of det(A) in a
slightly different form.
Definition 6.8. Given a square matrix A = (aij ), the (i, j)-cofactor of A is the number Cij
defined by
Cij = (−1)i+j det(Aij ) .
Thus, the definition of det(A) reads
det(A) = a11 C11 + a21 C21 + · · · + an1 Cn1 .
This is called the cofactor expansion down the first column of A. There is nothing special
about the first column, as the next theorem shows:
Theorem 6.9 (Cofactor Expansion Theorem). The determinant of an n × n matrix A can be
computed by a cofactor expansion across any column or row. The expansion down the j-th
column is
det(A) = a1j C1j + a2j C2j + · · · + anj Cnj
and the cofactor expansion across the i-th row is
det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .
Although this theorem is fundamental for the development of determinants, we shall not
prove it here, as it would lead to a rather lengthy workout.
Before moving on, notice that the plus or minus sign in the (i, j)-cofactor depends on
the position of aij in the matrix, regardless of aij itself. The factor (−1)i+j determines the
following checkerboard pattern of signs
 
+ − + ···
− + − 
.
 
+ − +
..
 
..
. .
46 CHAPTER 6. DETERMINANTS

Example 6.10. Use a cofactor expansion across the second row to compute det(A), where
 
4 −1 3
A = 0 0 2  .
1 0 7

Solution.

det(A) = a21 C21 + a22 C22 + a23 C23


= (−1)2+1 a21 det(A21 ) + (−1)2+2 a22 det(A22 ) + (−1)2+3 a23 det(A23 )
−1 3 4 3 4 −1
= −0 +0 −2
0 7 1 7 1 0
= −2[4 · 0 − 1 · (−1)] = −2 .

Example 6.11. Compute det(A), where


 
3 0 0 0 0
−2 5 0 0 0
 
A=  9 −6 4 −1 3 .

2 4 0 0 2
8 3 1 0 7

Solution. Notice that all entries but the first of row 1 are 0. Thus it will shorten our labours
if we expand across the first row:

5 0 0 0
−6 4 −1 3
det(A) = 3 .
4 0 0 2
3 1 0 7

Again it is advantageous to expand this 4 × 4 determinant across the first row:

4 −1 3
det(A) = 3 · 5 · 0 0 2 .
1 0 7

We have already computed the value of the above 3 × 3 determinant in the previous example
and found it to be equal to −2. Thus det(A) = 3 · 5 · (−2) = −30.
Notice that the matrix in the previous example was almost lower triangular. The method
of this example is easily generalised to prove the following theorem:
Theorem 6.12. If A is either an upper or a lower triangular matrix, then det(A) is the product
of the diagonal entries of A.

6.3 Properties of determinants


At several points in this module we have seen that elementary row operations play a fun-
damental role in matrix theory. It is only natural to enquire how det(A) behaves when an
elementary row operation is applied to A.
6.3. PROPERTIES OF DETERMINANTS 47

Theorem 6.13. Let A be a square matrix.

(a) If two rows of A are interchanged to produce B, then det(B) = − det(A).

(b) If one row of A is multiplied by α to produce B, then det(B) = α det(A).

(c) If a multiple of one row of A is added to another row to produce a matrix B then
det(B) = det(A).

Proof. These assertions follow from a slightly stronger result to be proved later in this chapter
(see Theorem 6.23).

Example 6.14.

1 2 3 4 5 6
(a) 4 5 6 = − 1 2 3 by (a) of the previous theorem.
7 8 9 7 8 9

0 1 2 0 1 2
(b) 3 12 9 = 3 1 4 3 by (b) of the previous theorem.
1 2 1 1 2 1

3 1 0 3 1 0
(c) 4 2 9 = 7 3 9 by (c) of the previous theorem.
0 −2 1 0 −2 1

The following examples show how to use the previous theorem for the effective computation
of determinants:

Example 6.15. Compute


3 −1 2 −5
0 5 −3 −6
.
−6 7 −7 4
−5 −8 0 9

Solution. Perhaps the easiest way to compute this determinant is to spot that when adding
two times row 1 to row 3 we get two identical rows, which, by another application of the
previous theorem, implies that the determinant is zero:

3 −1 2 −5 3 −1 2 −5
0 5 −3 −6 0 5 −3 −6
=
−6 7 −7 4 R3 + 2R1 0 5 −3 −6
−5 −8 0 9 −5 −8 0 9
3 −1 2 −5
0 5 −3 −6
= = 0,
R3 − R2 0 0 0 0
−5 −8 0 9

by a cofactor expansion across the third row.


48 CHAPTER 6. DETERMINANTS

Example 6.16. Compute det(A), where


 
0 1 2 −1
2 5 −7 3 
A= 0
.
3 6 2
−2 −5 4 −2

Solution. Here we see that the first column already has two zero entries. Using the previous
theorem we can introduce another zero in this column by adding row 2 to row 4. Thus

0 1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = .
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1

If we now expand down the first column we see that

1 2 −1
det(A) = −2 3 6 2 .
0 −3 1

The 3 × 3 determinant above can be further simplified by subtracting 3 times row 1 from row
2. Thus
1 2 −1
det(A) = −2 0 0 5 .
0 −3 1
Finally we notice that the above determinant can be brought to triangular form by swapping
row 2 and row 3, which changes the sign of the determinant by the previous theorem. Thus

1 2 −1
det(A) = (−2) · (−1) 0 −3 1 = (−2) · (−1) · 1 · (−3) · 5 = −30 ,
0 0 5

by Theorem 6.12.

We are now able to prove the first important general result about determinants, allowing
us to decide whether a matrix is invertible or not by computing its determinant (as such it is
a generalisation of the 2 × 2 case treated in Theorem 6.2(b)).

Theorem 6.17. A matrix A is invertible if and only if det(A) ̸= 0.

Proof. Bring A to row echelon form U (which is then necessarily upper triangular) using
elementary row operations. In the process we only ever multiply a row by a non-zero scalar,
so Theorem 6.13 implies that det(A) = γ det(U ) for some γ ̸= 0. If A is invertible, then
det(U ) = 1 by Theorem 6.12, since U is upper triangular with 1’s on the diagonal, and hence
det(A) = γ det(U ) = γ ̸= 0. If A is not invertible then at least one diagonal entry of U is
zero, so det(U ) = 0 by Theorem 6.12, and hence det(A) = γ det(U ) = 0.

Definition 6.18. A square matrix A is called singular if det(A) = 0. Otherwise it is said to


be nonsingular.

Corollary 6.19. A matrix is invertible if and only if it is nonsingular


6.3. PROPERTIES OF DETERMINANTS 49

Our next result shows what effect transposing a matrix has on its determinant:
Theorem 6.20. If A is an n × n matrix, then det(A) = det(AT ).
Proof. The proof is by induction on n (that is, the size of A). The theorem is obvious for
n = 1. Suppose now that it has already been proved for k × k matrices for some integer k.
Our aim now is to show that the assertion of the theorem is true for (k + 1) × (k + 1) matrices
as well. Let A be a (k + 1) × (k + 1) matrix. Note that the (i, j)-cofactor of A equals the
(i, j)-cofactor of AT , because the cofactors involve k × k determinants only, for which we
assumed that the assertion of the theorem holds. Hence
cofactor expansion of det(A) across first row
=cofactor expansion of det(AT ) down first column

so det(A) = det(AT ).
Let’s summarise: the theorem is true for 1 × 1 matrices, and the truth of the theorem for
k × k matrices for some k implies the truth of the theorem for (k + 1) × (k + 1) matrices.
Thus, the theorem must be true for 2 × 2 matrices (choose k = 1); but since we now know
that it is true for 2 × 2 matrices, it must be true for 3 × 3 matrices as well (choose k = 2);
continuing with this process, we see that the theorem must be true for matrices of arbitrary
size.
By the previous theorem, each statement of the theorem on the behaviour of determinants
under row operations (Theorem 6.13) is also true if the word ‘row’ is replaced by ‘column’,
since a row operation on AT amounts to a column operation on A.
Theorem 6.21. Let A be a square matrix.
(a) If two columns of A are interchanged to produce B, then det(B) = − det(A).
(b) If one column of A is multiplied by α to produce B, then det(B) = α det(A).
(c) If a multiple of one column of A is added to another column to produce a matrix B
then det(B) = det(A).
Example 6.22. Find det(A) where
 
1 3 4 8
−1 2 1 9
A=
2
.
5 7 0
3 −4 −1 5
Solution. Adding column 1 to column 2 gives
1 3 4 8 1 4 4 8
−1 2 1 9 −1 1 1 9
det(A) = = .
2 5 7 0 2 7 7 0
3 −4 −1 5 3 −1 −1 5
Now subtracting column 3 from column 2 the determinant is seen to vanish by a cofactor
expansion down column 2.
1 0 4 8
−1 0 1 9
det(A) = = 0.
2 0 7 0
3 0 −1 5
50 CHAPTER 6. DETERMINANTS

Our next aim is to prove that determinants are multiplicative, that is, det(AB) = det(A) det(B)
for any two square matrices A and B of the same size. We start by establishing a baby-version
of this result, which, at the same time, proves the theorem on the behaviour of determinants
under row operations stated earlier (see Theorem 6.13).

Theorem 6.23. If A is an n × n matrix and E an elementary n × n matrix, then

det(EA) = det(E) det(A)

with

−1
 if E is of type I (interchanging two rows)
det(E) = α if E is of type II (multiplying a row by α) .

1 if E is of type III (adding a multiple of one row to another)

Proof. By induction on the size of A. The case where A is a 2 × 2 matrix follows from
Theorem 6.2(a). Suppose now that the theorem has been verified for determinants of k × k
matrices for some k with k ≥ 2. Let A be (k + 1) × (k + 1) matrix and write B = EA.
Expand det(EA) across a row that is unaffected by the action of E on A, say, row i. Note
that Bij is obtained from Aij by the same type of elementary row operation that E performs
on A. But since these matrices are only k × k, our hypothesis implies that

det(Bij ) = r det(Aij ) ,

where r = −1, α, 1 depending on the nature of E.


Now by a cofactor expansion across row i
k+1
X
det(EA) = det(B) = aij (−1)i+j det(Bij )
j=1
k+1
X
= aij (−1)i+j r det(Aij )
j=1

= r det(A) .

In particular, taking A = Ik+1 we see that det(E) = −1, α, 1 depending on the nature of E.
To summarise: the theorem is true for 2 × 2 matrices and the truth of the theorem for
k × k matrices for some k ≥ 2 implies the truth of the theorem for (k + 1) × (k + 1) matrices.
By the principle of induction the theorem is true for matrices of any size.

Using the previous theorem we are now able to prove the second important general result
of this chapter (and a generalisation of the 2 × 2 case treated in Theorem 6.2(a)):

Theorem 6.24. If A and B are square matrices of the same size, then

det(AB) = det(A) det(B) .

Proof. Case I: If A is not invertible, then neither is AB (for otherwise A(B(AB)−1 ) = I,


which by the corollary to the Invertible Matrix Theorem would force A to be invertible). Thus,
by Theorem 6.17,
det(AB) = 0 = 0 · det(B) = det(A) det(B) .
6.3. PROPERTIES OF DETERMINANTS 51

Case II: If A is invertible, then by the Invertible Matrix Theorem A is a product of elementary
matrices, that is, there exist elementary matrices E1 , . . . , Ek , such that

A = Ek Ek−1 · · · E1 .

For brevity, write |A| for det(A). Then, by the previous theorem,

|AB| = |Ek · · · E1 B| = |Ek ||Ek−1 · · · E1 B| = . . .


= |Ek | · · · |E1 ||B| = . . . = |Ek · · · E1 ||B|
= |A||B| .

Corollary 6.25. If A is an invertible matrix then


1
det(A−1 ) = .
det(A)

Proof. Since A is invertible, we have A−1 A = I. Taking determinants of both sides gives
det(A−1 A) = det(I) = 1. By Theorem 6.24 we know that det(A−1 A) = det(A−1 ) det(A),
and so in fact we have det(A−1 ) det(A) = 1. Moreover, det(A) ̸= 0 because A is invertible
(by Theorem 6.17), and so we can divide both sides of the preceding equation by det(A) to
obtain the required property
1
det(A−1 ) = .
det(A)

You might also like