0% found this document useful (0 votes)

24 views264 pages

Linear Algebra Notes

Notes for Linear Algebra proof based courses.

Uploaded by

joslu.aleceb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views264 pages

Linear Algebra Notes

Notes for Linear Algebra proof based courses.

Uploaded by

joslu.aleceb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 264

Sets and Number System Notation

I’ll review some basics about sets and some of the number systems we’ll use. If you’re familiar with
these things, you can skim this section and refer to it later when you need to.

A set is informally described as a collection of objects. The objects are elements of the set, or
members of the set. So a set is kind of like a bag, and its elements are things in the bag.

However, it’s easy to get into trouble with that informal description. Trouble can be fun, so let’s see
how you can get into trouble here. Russell’s paradox is named after the British mathematician Bertrand
Russell. Here’s how it goes. Suppose X is the set whose members are sets which are not members of
themselves. (Thus, X is a bag, and the members of X — the things in the bag — are themselves bags.)
Now is X a member of X? It either is or it isn’t.
If X is a member of X, then X is a set which is not a member of itself, by the definition of X. So X is
both a member of itself and not a member of itself.
If X is not a member of X, then X is a set which is not a member of itself, so it should be a member
of X, by definition of X. Again, X is both a member of itself and not a member of itself.
You may have to think about this a bit, since the words can make your eyes glaze over!
You can see that both alternatives lead to a contradiction. Something is wrong, and it lies in the way
we created the set X. It turns out that, to deal with sets in a rigorous way, you have to be careful about
how sets can be created. The Zermelo-Fraenkel-Choice Axioms do this, and they are one of the most
commonly accepted foundations for mathematics. If you’re interested in learning more, you should take a
course or read a book in set theory or mathematical logic.
We will stay out of trouble by avoiding weird sets like the one in Russell’s paradox.

Let’s start by reviewing common set constructions and notations. If A and B are sets, then:

(a) x ∈ A means that x is an element or member of A. Likewise, x1 , x2 ∈ X means that x1 and x2

are elements of X.

We’ll often use schematic diagrams like this one (sometimes called Venn diagrams) to picture arbitrary
sets. The sets are drawn as rectangles or ovals or other closed shapes, and you think of the elements of the
sets as being stuff inside. In this case, the dot labelled “x” is a particular element of A.

(b) A and B are equal if they have exactly the same elements. In that case, we write A = B.

(c) A ⊂ B means that A is a subset of B — that is, every element of A is also an element of B. It is
possible in this case that A = B; if we don’t want to allow A = B, we’ll write A ( B and say that A is a
proper subset of B.

1
(d) A ∩ B is the intersection of A and B — that is, all the elements which A and B have in common.

A∩ B
A

(e) A ∪ B is the union of A and B — that is, all the elements which are in A or are in B (or are in
both).
A∪B

(f) A − B is the complement of B relative to A — that is, all the elements of A which are not elements
of B.
A-B

B
A

If there is some “big set” S and everything under discussion is contained in S, you can denote the
complement of a set A relative to S (sometimes called the absolute complement) by A. In order words,
A is short for S − A when the “big set” S is understood.

__
A

(g) The empty set is the set with no elements. It is denoted ∅ or { }. (But “{∅}” is not the empty
set — if the empty set is like an empty bag, then “{∅}” is like a bag containing an empty bag.)
(h) If S and T are sets, their (Cartesian) product consists of all ordered pairs (s, t), where s ∈ S
and t ∈ T . “Ordered” means, for example, that (2, 8) and (8, 2) are different ordered pairs.
For example, if S = {a, b} and T = {1, 2, 3}, then

S × T = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3)}.

Likewise, S × T × U consists of ordered triples (s, t, u), where s ∈ S, t ∈ T , and u ∈ U .

We’ll sometimes abbreviate S × S to S 2 , and so on.

2
The sets which make up standard number systems have special symbols.
(a) Z is the set of integers (positive, negative, and 0).
(b) Q is the set of rational numbers, which can be represented as quotients of integers. (Thus, Q includes
3
Z, since (for instance) 3 = .) The use of the letter “Q” is apparently due to the Italian mathematician
1
Giuseppe Peano and stands for the Italian word for “quotient”.
(c) R is the set of real numbers, which are often represented as decimals (finite or infinite). R includes
Q (and so it includes Z).
(d) C is the set of complex
√ numbers: Numbers that can be written in the form a + bi, where a and b
are real numbers and i = −1. The old phrase “imaginary numbers” is not that good — there’s nothing
“imaginary” about C! — and so is less often used. C includes R — for instance, the real number 13 can be
written as 13 + 0 · i — so C also includes Q and Z. Using our subset notation, we could write Z ⊂ C, for
instance.
The convention is to use boldface capital letters (like “Z” or “Q”) or “blackboard bold” capital letters
(like Z, Q, R, C) for “number systems”. You may see other letters (like the quaternions H) used as well.
I’ll use blackboard bold, since it is easier to distinguish blackboard bold letters from ordinary capital letters
than boldface. It’s also the way you write these symbols on blackboards or in handwriting.
I’ll use these number systems informally based on your prior experience with them; actually constructing
them rigorously is pretty involved and belongs in other courses (though we can use matrices to construct
the complex numbers from the real numbers).
You can specify a finite set by listing its elements, placed between braces (“curly brackets”). Here’s a
set consisting of 5 elements:

19
π, −117, 32.83, , the pepperoni pizza in my refrigerator .
17

A set does not have duplicate elements, so don’t write things like “{1, 2, 2, 3}”. The order in which
you list the elements doesn’t matter: {a, b, c} is the same set as {c, a, b}. (It’s like a bag of stuff, as we
noted at the start.)
Sets can have other sets as elements:
{1, 2, {3, 4}}.
You can picture this set as a bag. Inside the bag are 1, 2, and another bag which contains 3 and 4. The
number of elements in this set is 3; in other words, in determining the number of elements in a big set,
you don’t peek inside any little sets inside the the big set.
You can sometimes specify infinite sets by listing elements:

{1, 2, 3, 4, . . .}.

Most people would assume that you mean the elements to continue 5, 6, 7, and so on. Of course, when
you do this you’re assuming that the “pattern” the elements follow is clear; the “. . .” means “continue in
the same way”. But there’s some ambiguity here; perhaps the elements of the set above are actually

{1, 2, 3, 4, 10, 20, 30, 40, 100, 200, 300, 400, . . .}.

If there’s any chance of confusion, it’s better to use set constructor notation to make the “pattern”
the elements follow explicit. Set constructor notation has the form

T = {x ∈ S | P (x)}.

The braces “{” and “}” indicate that we’re building a set, and its name is T . (You may give a set a
name to make it easier to talk about, but the name isn’t required.) The first part “x ∈ S” tells us that the

3
elements of T come from some “big set” S. And “P (x)” is some property that a typical element x must
satisfy to belong to T .
For example, this is the set of positive integers:

{x ∈ Z | x > 0}.

This is the set of positive real numbers:

{x ∈ R | x > 0}.

You can see how the specification of the “big set” from which the elements come (x ∈ Z or x ∈ R) can
be important.
Sometimes it’s convenient to relax the rules for set constructor notation a bit. Here’s the set of even
integers:
{x ∈ Z | x = 2y for some y ∈ Z}.
It says that even integers are integers which are twice another integer. We can save writing (and a
variable) by writing the set this way:
{2y | y ∈ Z}.
This says even integers are numbers which are twice another integer. Since twice an integer is automat-
ically an integer, the word “numbers” must mean “integers”.
We’ll often use this kind of shortcut in writing sets. Here’s another example. It’s the set of all points
on the graph of y = x2 in the x-y-plane:
{(x, x2 ) | x ∈ R}.
Example. Consider the following set of integers:

S = {3n + 1 | n ∈ Z}.

(a) Is 46 an element of S?
(b) Is 17 an element of S?
(a) 46 is an element of S, since 46 = 3 · 15 + 1 and 15 ∈ Z.
(b) 17 is not an element of S. Suppose on the contrary that 17 ∈ S. Then 17 = 3n + 1 for some n ∈ Z.
Subtracting 1 from both sides, I get 16 = 3n for some integer n. This is impossible, since 16 is not “evenly
divisible” by 3.
2
You may have seen the notations R√ and R3 in a multivariable calculus course. R2 is the set of ordered
or ( 3, π) —the “x-y plane”. Likewise, R3 is the set of ordered triples
pairs of real numbers, like (−5, 11)
11
of real numbers, like (9, 0, −13) or , sin 1, 32 — 3-dimensional space. Using set constructor notation,
17
you can write these sets like this:

R2 = {(x, y) | x, y ∈ R} and R3 = {(x, y, z) | x, y, z ∈ R}.

Arbitrary collections of objects.

In some cases, we’ll need to deal with collections of objects which aren’t finite, but are infinite “in a big
way”. This presents notational issues, which I’ll discuss here. The objects might be elements of a sets, or
they might be sets themselves.
If we have just two objects, we could call them x and y. If we have a small finite number of objects, we
can use numerical subscripts. For instance, for 5 objects, we could use x1 , x2 , x3 , x4 , and x5 .
If we have a finite number of objects but the number is undetermined — say there are n of them — we
can still use subscripts: x1 , x2 , . . . xn .

4
If we have an infinite but countable number of objects, we can continue to use natural numbers as
subscripts:
x1 , x2 , . . . , xn , . . . .
The “. . .” on the right indicates that the number of objects is infinite.
In fact, you can take “countable” to mean that the objects are either finite in number, or can be
arranged in a sequential list as above. If the objects can’t be arranged in a sequential list as above, the
objects are uncountable. Thus, if you have an uncountable number of objects, you can’t use natural
number subscripts to index them. For example, suppose the objects are the real numbers. You can’t arrange
the real numbers in a sequential list.
So how do we describe an uncountable collection of objects?
To handle this kind of situation is actually simple. We take an index set — say I. We don’t assume
anything about the size of I, or what its elements look like. We use the elements of I as subscripts for our
objects. So if we’re discussing a particular object indexed by I, we refer to “xi , where i ∈ I”. If we need
another object indexed by I, we say “xj , where j ∈ I”. If we need 4 such objects, we can use subscripts
with subscripts, this way:
xi1 , xi2 , xi3 , xi4 , where i1 , i2 , i3 , i4 ∈ I.
If we wanted 4 possibly different objects, we’d have to specify that i1 , i2 , i3 , and i4 are distinct (though
maybe in a given situation our collection of objects contains duplicates, so different i’s might still give the
same object).
In these cases, we’re not saying what i, j, i1 , i2 , i3 , or i4 are, beyond that they’re elements of I, and
that I is some set.
Arbitrary intersections and unions.
As examples of arbitrary collections of objects, we’ll discuss arbitrary unions and intersections of
sets. Begin with an arbitrary collection of sets {Si }i∈I . This means that each Si is a set, and there is one
such set for each element i in the index set I. The index set I might be finite, or if it is infinite, it might be
countable or uncountable. \ \
The intersection of the sets {Si }i∈I is denoted Si . By definition, an element x is in Si if and
i∈I i∈I
only if x is in Si for all i ∈ I. In other words,
\
x∈ Si if and only if x ∈ Si for all i ∈ I.
i∈I

\ n
X
The big intersection symbol is like a summation symbol (“ ”) that you’ve probably seen elsewhere.
i∈I i=1
For just two sets U and V , the definition would say x ∈ U ∩ V if and only if x ∈ U and x ∈ V . This
agrees with the definition of “intersection” we gave earlier. [
In similar fashion, the union of the sets {Si }i∈I is denoted Si . By definition, an element x is in
[ i∈I
Si if and only if x is in Si for some i ∈ I. (“For some” means “for at least one”.) In other words,
i∈I

[
x∈ Si if and only if x ∈ Si for some i ∈ I.
i∈I

We won’t have to deal with arbitrary intersections and unions that often, so don’t worry if this seems
a bit abstract. When it comes up, you’ll see that the notation isn’t that difficult to handle.

c 2022 by Bruce Ikenaga 5

Functions
In this section, I’ll collect some results and terminology connected with functions. I’ll assume that you
have some experience with functions from earlier courses, and I won’t give a formal definition of “function”.
If X and Y are sets, the notation f : X → Y denotes a function named “f ” which takes its inputs from
the sets X and produces outputs in the set Y . The set X is callled the domain and the set Y is called the
codomain.
What I just described is notation for functions; I’m not giving a precise definition of what a function
is. You’ve seen functions in other courses — calculus, for instance. You were probably told at some point
that a function is a “rule” for producing outputs from inputs, and this intuitive idea is okay for this course.
However, if you’re someone who likes causing trouble for yourself, you can do so here by asking what a “rule”
is: It’s not so easy to say in a precise way. I’ll give a quick description of the solution, and let you read more
in a book on set theory or math proof if you’re interested. A function from a set X to a set Y is a subset of
the product X × Y , such that if (x, y1 ) and (x, y2 ) are in the subset, then y1 = y2 . For this course, you do
not need to understand the last sentence!
The sets X and Y are part of the definition of the function. In some elementary courses, all the
functions take real numbers (or at least some real numbers) to real numbers, and the domain and codomain
are “understood”. The “domain” (or natural domain, as it is sometimes called) is the largest set of
real numbers for which the expression defining the function is defined. In such a course, for example, the
1
“domain” of f (x) = would be all real numbers except for 1.
x−1
In this course, however, we’ll be careful and specify the domain and codomain explicitly. This becomes
important, for instance, in defining injective and surjective functions.

If f : X → Y is a function, the image of f (denoted im f ) is the set of all possible outputs of f . In

terms of set notation,
im f = {y ∈ Y | y = f (x) for some x ∈ X}.
Notice that the image of f is contained in the codomain Y , but it can be smaller than Y .
Y

im f

You might have heard the term range used in connection with functions, but we’ll avoid it because it
was (unfortunately) used in two different ways. Sometimes, it meant what I’ve called the codomain, but it
was also used to refer to what is now called the image. The codomain is simply a set which contains all
possible outputs of the function (the image), but not every element of the codomain must be an output of the
function.
Remark. (a) When you write the definition of a function like “f (x) = x2 + 1”, you can use whatever
variable you wish for the input. It would be the same to write “f (t) = t2 + 1”. You could use a word for
the input to the function (as is often done in compute programming) and write

f (input) = input2 + 1.

You could even use words for the whole definition: “The function f is defined by taking an input,
squaring it, and adding 1, and returning the result as the output.” This is the way math was written before
people started using symbols — and you can see why people started using symbols!

1
In math, unlike in computer programming, it has been traditional to save writing and use single letters
to name variables, such as the variables used to define functions. There are also some loose conventions
about what letters you use for various purposes. For instance, x is often used as the input variable in a
function definition, and i, j, and k are often used as index variables in summations.

(b) The variables used to define a function are only “active” within the function definition. Once you
reach the end of the expression “f (x) = x2 + 1”, there is no variable named “x” hanging around — you can’t
ask at that point “Is x equal to 3?” In computer programming terms, you might say that x isn’t a global
variable; it’s in scope only within the definition “f (x) = x2 + 1”.
You might think about the last paragraph if you start getting confused when we discuss composites and
inverse functions below.

Here are three key properties that a function may have.

Definition. Let f : X → Y be a function with domain X and codomain Y .

(a) f is injective if f (a) = f (b) implies a = b for a, b ∈ X.

(b) f is surjective if for all y ∈ Y , there is an x ∈ X such that f (x) = y.

(c) f is bijective if it is injective and surjective.

The older terms (which some people still use) are one-to-one instead of injective, onto instead of
surjective, and one-to-one correspondence for bijective.

Remark. You can restate the definition of injective this way: f is injective if a 6= b implies f (a) 6= f (b)
for a, b ∈ X. (This is the contrapositive of the original definition.) In this form, the definition says that
different inputs always give different outputs.

f f

Different inputs Different inputs

give different outputs give the same output

If different inputs can give the same output, the function is not injective.

The definition of surjective means that everything in the codomain is an output of f — that is, im f = Y .

X f Y X f Y

im f

im f = Y im f is smaller than Y

If im f is smaller than y, so some points in Y aren’t in im f , the function is not surjective.

To say that f is bijective means that f “pairs up” the elements of X and the elements of Y — one
element of X paired with exactly one element of Y .

2
This picture shows a bijection f from the set {1, 2, 3} to the set {a, b, c}.
f
1 a

b
2

c
3
It is defined by
f (1) = c, f (2) = a, f (3) = b.
It is injective because two different inputs always produce two different outputs. It is surjective because
every element of the codomain {a, b, c} is an output of f .
Example. (a) f : R → R is defined by f (x) = ex . Prove that f is injective but not surjective.
(b) f : R → R is defined by
x + 1 if x ≤ 0
n
f (x) = .
x if x > 0
Prove that f is surjective but not injective.
(c) f : R → R is defined by f (x) = x3 + 13. Prove that f is bijective.
(a) Suppose f (a) = f (b). Then ea = eb , so ln ea = ln eb , and hence a = b. Thus, f is injective.
f is not surjective, because there is no x ∈ R such that f (x) = −5. This would imply ex = −5, but ex
is positive for all x.
(b) Let y ∈ R. If y ≤ 0, then y − 1 ≤ −1. Hence, f (y − 1) = (y − 1) + 1 = y. If y > 0, then f (y) = y. In
both cases, I’ve found an input to f which produces the output y. So f is surjective.
f is not injective: f (0) = 0 + 1 = 1, and f (1) = 1, so f (0) = f (1) but 0 6= 1.
The graph of f is shown below.
y

-1

In the case of functions R → R, you can interpret surjectivity this way: Every horizontal line hits the
graph at least once. You can interpret injectivity this way: No horizontal line hits the graph more than
once. Look at the graph and see how it shows that f is surjective, but not injective.
While this is visually appealing, this is a special case — do not remember these ideas as the definitions
of injective and surjective. For instance, they would not apply if you had a function R2 → R3 .
(c) Suppose f (a) = f (b). Then
a3 + 13 = b3 + 13
a3 = b 3
(a3 )1/3 = (b3 )1/3
a=b

3
Hence, f is injective.
Suppose y ∈ R. I need x ∈ R such that f (x) = y. I will work backwards to “guess” what x should be,
then verify my guess.
f (x) = y means x3 + 13 = y, so x3 = y − 13, and x = (y − 13)1/3 . That’s my “guess” for x; I still have
to show it works. (Why doesn’t what I did prove it? Because I worked backwards, so I have to check that
all the steps I took are reversible.) Here’s the check:

f ((y − 13)1/3 ) = [(y − 13)1/3 ]3 + 13 = y − 13 + 13 = y.

Hence, f is surjective. Since f is injective and surjective, it’s bijective.

Example. f : R2 → R2 is defined by

f (x, y) = (x3 + 2, x + ey ).

Show that f is injective, but not surjective.

To show f is injective, suppose f (a, b) = f (c, d). Then

(a3 + 2, a + eb ) = (c3 + 2, c + ed ).

Equating the first components, I have

a3 + 2 = c3 + 2
a3 = c3
a=c

Equating the second components and using a = c, I have

a + eb = c + ed
eb = ed
ln eb = ln ed
b=d

Hence, (a, b) = (c, d), so f is injective.

To show f is not surjective, I must find a pair in R2 which is not an output of f . There are lots of pairs
which work; I will use (10, 0). So suppose f (x, y) = (10, 0). Then

(x3 + 2, x + ey ) = (10, 0).

Equating the first components, I have

x3 + 2 = 10
x3 = 8
x=2
Equating the second components and using x = 2, I have

x + ey = 0
2 + ey = 0
ey = −2

This is a contradiction, since ey > 0 for all y. So there is no input (x, y) such that f (x, y) = (10, 0), and
hence f is not surjective.

4
The thought process I used in choosing (10, 0) was this. I saw that the expression ey was restricted
in the values it can take: It’s always positive. So I tried to get a contradiction by forcing it to equal a
negative number. I could do this by forcing x in x + ey to be positive. I could force x > 0 by setting the
first component x3 + 2 to a value so that solving for x would produce a positive number — 10 happens to
work.

Definition. Let X, Y , and Z be sets, and let f : X → Y and g : Y → Z be functions. The composite of
f and g is the function g ◦ f : X → Z defined by

(g ◦ f )(x) = g(f (x)).

X f Y g

g(f(x))
x
f(x)

Note that “g ◦ f ” does not mean multiplication. I will often write “g(f (x))” for the composite, to ensure
that there’s no confusion. Also, “g ◦ f ” is read from right to left, so it means “do f first, then g”.
Example. (a) Suppose f : R → R is defined by f (x) = x2 and g : R → R is defined by g(x) = x + 2. Find
g(f (x)), f (g(x)), g(g(x)), and f (f (3)).
(b) Suppose f : R → R2 is given by f (x) = (x + 3, ex ) and g : R2 → R is given by g(s, t) = s − t. Find
g(f (x)) and f (g(s, t)).
(c) Suppose f : R2 → R2 is given by f (x, y) = (x + 3y, 2x − y) and g(s, t) = (st, s + t). Find g(f (x, y)) and
f (g(s, t)).
(a)
g(f (x)) = g(x2 ) = x2 + 2.
f (g(x)) = f (x + 2) = (x + 2)2 .
g(g(x)) = g(x + 2) = (x + 2) + 2 = x + 4.
f (f (3)) = f (32 ) = f (9) = 92 = 81.
Note that g(f (x)) 6= f (g(x)).
(b)
g(f (x)) = g(x + 3, ex ) = x + 3 − ex .
f (g(s, t)) = f (s − t) = ((s − t) + 3, es−t ).

(c)

g(f (x, y)) = g(x + 3y, 2x − y) = ((x + 3y)(2x − y), (x + 3y) + (2x − y)) = ((x + 3y)(2x − y), 3x + 2y).

f (g(s, t)) = f (st, s + t) = (st + 3(s + t), 2st − (s + t)).

For example, to compute g(x + 3y, 2x − y), I substituted x + 3y for s and 2x − y for t in the definition
of g(s, t).

5
Definition. Let A and B be sets, and let f : A → B and g : B → A be functions. f and g are inverses if

f (g(b)) = b for all b ∈ B and g(f (a)) = a for all a ∈ A.

The inverse of f is denoted f −1 . (Note that “f −1 ” is not a “−1 power”, in the sense of a reciprocal.)
Using this notation, I can write the equations in the inverse definition as

f (f −1 (b)) = b for all b ∈ B and f −1 (f (a)) = a for all a ∈ A.

The identity function on a set X is the function idX defined by idX (x) = x for all x ∈ X. We may
write “id” instead of “idX ” if it’s clear what set is intended.
Using this definition and the notation for the composite of functions, we can write the two equations
which say that f and g are inverses as

f ◦ f = idB and g ◦ f = idA .

Example. Define f : R → R by f (x) = x3 + 7 and g : R → R by g(x) = (x − 7)1/3 . Show that f and g are
inverses.
Let x ∈ R.
f (g(x)) = f ((x − 7)1/3 ) = [(x − 7)1/3 ]3 + 7 = x − 7 + 7 = x.
g(f (x)) = g(x3 + 7) = [(x3 + 7) − 7]1/3 = (x3 )1/3 = x.
Since f (g(x)) = x and g(f (x)) = x for all x ∈ R, it follows that f and g are inverses.

Example. Let f : R → R be given by f (x) = 10 − x1/5 . Find f −1 and verify that it’s the inverse by
checking the inverse definition.
I guess f −1 by working backwards. Suppose f −1 (y) = x. Then

f (f −1 (y)) = f (x), so y = f (x) = 10 − x1/5 .

Solve the last equation for x:

y = 10 − x1/5
y + x1/5 = 10
x1/5 = 10 − y
x = (10 − y)5
Thus, my guess is f −1 (y) = x = (10 − y)5 . I got this by assuming f −1 (y) = x. I need to verify that my
guess works by checking the inverse definition.

f (f −1 (y)) = f ((10 − y)5 ) = 10 − [(10 − y)5 ]1/5 = 10 − (10 − y) = y.

f −1 (f (x)) = f −1 (10 − x1/5 ) = [10 − (10 − x1/5 )]5 = (x1/5 )5 = x.

Since f (f −1 (y)) = y for all y ∈ R and f −1 (f (x)) = x for all x ∈ R, it follows that f −1 (y) = (10 − y)5 is
the inverse function of f .
Note: While I needed to use different letters x and y for the output and input variables for f −1 , you
can use any variable you want in writing the definition of a function. So once I have my guess for f −1 , I
could have written the definition as f −1 (x) = (10 − x)5 and checked the inverse definition using that.

Not every function has an inverse — in fact, we’ll see below that having an inverse is equivalent to being
bijective.

6
Example. Define f : R → R by f (x) = x2 + 10. Prove that f does not have an inverse.

If you have some experience with proof writing, you may know that a negative statement (“does not
have an inverse”) is often proved by contradiction: You assume the negation of the statement to be proved
and try to get a contradiction. If the statement to be proved is a negative statement, its negation will be a
positive statement.
So suppose that f has an inverse f −1 . Then f −1 (f (x)) = x for all x. In particular, f −1 (f (2)) = 2 and
f (f (−2)) = −2. But f (2) = 14 and f (−2) = 14, so I get
−1

f −1 (14) = 2 and f −1 (14) = −2.

This is a contradiction, because f −1 is a function, so it can’t produce two different outputs (2 and −2)
for the same input (14). Hence, f does not have an inverse.

Note: Instead of 2 and −2, you could choose 5 and −5 or 17 and −17, and so on. Lots of pairs of
numbers would work. You could also get a contradiction using the equation f (f −1 (x)) = x: Take x = 6 and
see what happens!

Example. Let f : R2 → R2 be given by f (x, y) = (x1/3 − 4, −x + y). Find f −1 and verify that it’s the
inverse by checking the inverse definition.
I’ll guess f −1 by working backwards, then verify that my guess works. Suppose f −1 (a, b) = (x, y). Then

f (f −1 (a, b)) = f (x, y), so (a, b) = f (x, y) = (x1/3 − 4, −x + y).

The second equality gives two equations:

a = x1/3 − 4, b = −x + y.

The first equation gives x1/3 = a + 4, so x = (a + 4)3 . Plugging this into the second equation gives

b = −(a + 4)3 + y, so y = (a + 4)3 + b.

Thus, my guess is
f −1 (a, b) = (x, y) = ((a + 4)3 , (a + 4)3 + b).
I check the inverse definition:

f (f −1 (a, b)) = f ((a + 4)3 , (a + 4)3 + b) = ([(a + 4)3 ]1/3 − 4, −(a + 4)3 + (a + 4)3 + b) =

(a + 4 − 4, b) = (a, b).
f −1 (f (x, y)) = f −1 (x1/3 −4, −x+y) = ([(x1/3 −4)+4]3, [(x1/3 −4)+4]3 +(−x+y)) = ((x1/3 )3 , (x1/3 )3 −x+y) =
(x, x − x + y) = (x, y).
The inverse definition checks, so f −1 (a, b) = (x, y) = ((a + 4)3 , (a + 4)3 + b).
Note: While the names of the input variables to a function are arbitrary, I used a and b as the input
variables to f −1 rather than x and y to avoid confusion — I was already using x and y as the input variables
for f .

The next result says that having an inverse is the same as being bijective.
Theorem. Let X and Y be sets, and let f : X → Y be a function. f is bijective if and only if it has an
inverse — that is, f −1 exists.

7
Proof. To prove a statement of the form A if and only if B, I must do two things: First, I assume that A
is true and prove that B is true; next, I assume that B is true and prove that A is true.
First, suppose that f −1 exists, so

f (f −1 (y)) = y for all y ∈ Y and f −1 (f (x)) = x for all x ∈ X.

I need to prove that f is bijective.

To show that f is injective, suppose that f (x1 ) = f (x2 ) for x1 , x2 ∈ X. Then

f −1 (f (x1 )) = f −1 (f (x2 )), so x1 = x2 .

Therefore, f is injective.
To show f is surjective, suppose y ∈ Y . I must find an element of X which f maps to y. This is easy,
since f −1 (y) ∈ X, and
f (f −1 (y)) = y.
Therefore, f is surjective. Hence, f is bijective.
Next, suppose f is bijective. I must show that f −1 exists. Now f −1 should be a function from Y to X,
so I need to start with y ∈ Y and define where f −1 maps it. Since f is bijective, it is surjective. Therefore,
f (x) = y for some x ∈ X. I’d like to define f −1 (y) = x.
There’s a possible problem. In general, it’s possible that f (x1 ) = y and f (x2 ) = y for x1 , x2 ∈ X. In
that case, how would I define f −1 (y)?
Fortunately, f is injective. So if f (x1 ) = y and f (x2 ) = y, then f (x1 ) = f (x2 ), and so x1 = x2 by
injectivity. In other words, there is one and only one x ∈ X such that f (x) = y, and it’s safe for me to define
f −1 (y) = x.
(Notice how I used both parts of the definition of bijective — surjectivity and injectivity — to define
f −1 .)
All I have to do is to check the inverse definition. First, if x ∈ X, then by definition f −1 (f (x)) is the
element of X which f takes to f (x) — but that element is x. Hence, f −1 (f (x)) = x.
Next, if y ∈ Y , then by definition f −1 (y) is the element x ∈ X such that f (x) = y. So

f −1 (y) = x and f (f −1 (y)) = f (x) = y.

Thus, the function f −1 I’ve defined really is the inverse of f .

Thus, to show a function is bijective, you just need to show the function has an inverse. Often this is
easy, because you can tell what the inverse should be given the function’s definition. All you have to do is
check that the “obvious” inverse really is the inverse.
For instance, if f : R → R is defined by f (x) = x3 , it’s obvious that the opposite of “cubing” is “cube
rooting”. So I expect the inverse to be f −1 (x) = x1/3 . The check is easy:

f (f −1 (x)) = f (x1/3 ) = (x1/3 )3 = x and f −1 (f (x)) = f −1 (x3 ) = (x3 )1/3 = x.

Since f has an inverse, it’s bijective.

c 2022 by Bruce Ikenaga 8

8-22-2020
Commutative Rings and Fields
Different algebraic systems are used in linear algebra. The most important are commutative rings
with identity and fields. I’ll begin by stating the axioms for a ring. They will look abstract, because they
are! But don’t worry — lots of examples will follow.
Definition. A ring is a set R with two binary operations addition (denoted +) and multiplication
(denoted ·). These operations satisfy the following axioms:
1. Addition is associative: If a, b, c ∈ R, then

a + (b + c) = (a + b) + c.

2. There is an identity for addition, denoted 0. It satisfies

0+a=a and a + 0 = a for all a ∈ R.

3. Every every of R has an additive inverse. That is, if a ∈ R, there is an element −a ∈ R which
satisfies
a + (−a) = 0 and (−a) + a = 0.
4. Addition is commutative: If a, b ∈ R, then

a + b = b + a.

5. Multiplication is associative: If a, b, c ∈ R, then

a · (b · c) = (a · b) · c.

6. Multiplication distributes over addition: If a, b, c ∈ R, hen

a · (b + c) = a · b + a · c.

It’s common to drop the “·” in “a · b” and just write “ab”. I’ll do this except where the “·” is needed
for clarity.
As a convenience, we can define subtraction using additive inverses. If R is a ring and a, b ∈ R, then
a − b is defined to be a + (−b). That is, subtraction is defined as adding the additive inverse.
You might notice that we now have three of the usual four arithmetic operations: Addition, subtraction,
and multiplication. We don’t necessarily have a “division operation” in a ring; we’ll discuss this later.
If you’ve never seen axioms for a mathematical structure laid out like this, you might wonder: What
am I supposed to do? Do I memorize these? Actually, if you look at the axioms, they say things that are
“obvious” from your experience. For example, Axiom 4 says addition is commutative. So as an example for
real numbers,
117 + 33 = 33 + 117.
You can see that, as abstract as they look, these axioms are not that big a deal. But when you do
mathematics carefully, you have to be precise about what the rules are. You will not have much to do in
this course with writing proofs from these axioms, since that belongs in an abstract algebra course. A good
rule of thumb might be to try to understand by example what an axiom says. And if it seems “obvious” or
“familiar” based on your experience, don’t worry about it. Where you should pay special attention is when
things don’t work in the way you expect.
If you look at the axioms carefully, you might notice that some familiar properties of multiplication are
missing. We will single them out next.

1
Definition. A ring R is commutative if the multiplication is commutative. That is, for all a, b ∈ R,

ab = ba.

Note: The word “commutative” in the phrase “commutative ring” always refers to multiplication —
since addition is always assumed to be commutative, by Axiom 4.

Definition. A ring R is a ring with identity if there is an identity for multiplication. That is, there is an
element 1 ∈ R such that
1 · a = a and a · 1 = a for all a ∈ R.

Note: The word “identity” in the phrase “ring with identity” always refers to an identity for multipli-
cation — since there is always an identity for addition (called “0”), by Axiom 2.

A commutative ring which has an identity element is called a commutative ring with identity.

In a ring with identity, you usually also assume that 1 6= 0. (Nothing stated so far requires this, so you
have to take it as an axiom.) In fact, you can show that if 1 = 0 in a ring R, then R consists of 0 alone —
which means that it’s not a very interesting ring!

Here are some number systems you’re familiar with:

(a) The integers Z.

(b) The rational numbers Q.

(c) The real numbers R.

(d) The complex numbers C.

Each of these is a commutative ring with identity. In fact, all of them except Z are fields. I’ll discuss
fields below.
By the way, it’s conventional to use a capital letter with the vertical or diagonal stroke “doubled” (as
in Z or R) to stand for number systems. It is how you would write them by hand. If you’re typing them,
you usually use a special font; a common one is called Blackboard Bold.

You might wonder why I singled out the commutativity and identity axioms, and didn’t just make
them part of the definition of a ring. (Actually, many people add the identity axiom to the definition of
a ring automatically.) In fact, there are situations in mathematics where you deal with rings which aren’t
commutative, or (less often) lack an identity element. We’ll see, for instance, that matrix multiplication is
usually not commutative.
The idea is to write proofs using exactly the properties you need. In that way, the things that you prove
can be used in a wider variety of situations. Suppose I had included commutativity of multiplication in the
definition of a ring. Then if I proved something about rings, you would not know whether it applied to
noncommutative rings without carefully checking the proof to tell whether commutativity was used or not.
If you really need a ring to be commutative in order to prove something, it is better to state that assumption
explicitly, so everyone knows not to assume your result holds for noncommutative rings.

The next example (or collection of examples) of rings may not be familiar to you. These rings are the
integers mod n. For these rings, n will denote an integer. Actually, n can be any integer if I modify the
discussion a little, but to keep things simple, I’ll take n ≥ 2.
The integers mod n is the set
Zn = {0, 1, 2, . . . , n − 1}.

n is called the modulus.

For example,
Z2 = {0, 1} and Z6 = {0, 1, 2, 3, 4, 5}.

2
Zn becomes a commutative ring with identity under the operations of addition mod n and multipli-
cation mod n. I won’t prove this; I’ll just show you how to work with these operations, which is sufficient
for a linear algebra course. You’ll see a rigorous treatment of Zn in abstract algebra.
(a) To add x and y mod n, add them as integers to get x + y. Then divide x + y by n and take the
remainder — call it r. Then x + y = r.
(b) To multiply x and y mod n, multiply them as integers to get xy. Then divide xy by n and take the
remainder — call it r. Then xy = r.
Since modular arithmetic may be unfamiliar to you, let’s do an extended example. Suppose n = 6, so
the ring is Z6 .
4+5 = 9 (Add them as integers . . . )
= 3 (Divide 9 by 6 and take the remainder, which is 3)
Hence, 4 + 5 = 3 in Z6 .
You can picture arithmetic mod 6 this way:

0
5 1

4 2
3

You count around the circle clockwise, but when you get to where “6” would be, you’re back to 0. To
see how 4 + 5 works, start at 0. Count 4 numbers clockwise to get to 4, then from there, count 5 numbers
clockwise. You’ll find yourself at 3.
Here is multiplication:

2·5 = 10 (Multiply them as integers . . . )

= 4 (Divide 10 by 6 and take the remainder, which is 4)

Hence, 2 · 5 = 4 in Z6 .
You can see that as you do computations, you might in the middle get numbers outside {0, 1, 2, 3, 4, 5}.
But when you divide by 6 and take the remainder, you’ll always wind up with a number in {0, 1, 2, 3, 4, 5}.
Try it with a big number:
80 = 6 · 13 + 2 = 2.
Using our circle picture, if you start at 0 and do 80 steps clockwise around the circle, you’ll find yourself
at 2. (Maybe you don’t have the patience to actually do this!) When we divide by 6 then “discard” the
multiples of 6, that is like the fact that you return to 0 on the circle after 6 steps.
Notice that if you start with a number that is divisible by 6, you get a remainder of 0:

84 = 6 · 14 + 0 = 0.

We see that in doing arithmetic mod 6, multiples of 6 are equal to 0. And in general, in doing arithmetic
mod n, multiples of n are equal to 0.
Other arithmetic operations work as you’d expect. For example,

34 = 81 (Take the power as an integer . . . )

= 3 (Divide 81 by 6 and take the remainder, which is 3)

Hence, 34 = 3 in Z6 .

3
Negative numbers in Z6 are additive inverses. Thus, −2 = 4 in Z6 , because 4 + 2 = 0. To deal with
negative numbers in general, add a positive multiple of 6 to get a number in the set {0, 1, 2, 3, 4, 5}. For
example,
(−3) · 5 = −15 (Multiply them as integers . . . )
= −15 + 18 (Add 18, which is 3 · 6)
= 3
Hence, (−3) · 5 = 3 in Z6 .
The reason you can add 18 (or any multiple of 6) is that 18 divided by 6 leaves a remainder of 0. In
other words, “18 = 0” in Z6 , so adding 18 is like adding 0. In a similar way, you can always convert a
negative number mod n to a positive number in {0, 1, . . . n − 1} by adding multiples of n. For instance,

−14 = −14 + 18 = 4.

Remember that multiples of 6 (like 18) are 0 mod 6!

Recall that subtraction is defined as adding the additive inverse. Thus, to do 1 − 2 in Z6 , use the fact
that the additive inverse of 2 (that is, −2) is equal to 4:

1 − 2 = 1 + 4 = 5.

We haven’t discussed division yet, but maybe the last example tells you how to do it. Just as subtraction
is defined as adding the additive inverse, division should be defined as multiplying by the multiplicative
inverse. Let’s give the definition.
Definition. Let R be a ring with identity, and let x ∈ R. The multiplicative inverse of x is an element
x−1 ∈ R which satisifies
x · x−1 = 1 and x−1 · x = 1.
1
If we were dealing with real numbers, then 3−1 = , for instance. But going back to the Z6 example,
3
we don’t have fractions in Z6 . So what is (say) 5−1 in Z6 ? By definition, 5−1 is the element (if there is one)
in Z6 which satisfies
5 · 5−1 = 1.
(I could say 5−1 · 5 = 1, but multiplication is commutative in Z6 , so the order doesn’t matter.)
We just check cases. Remember that if I get a product that is 6 or bigger, I have to reduce mod 6 by
dividing and taking the remainder.
5·0 =0
5·1 =5
5 · 2 = 10 = 4
5 · 3 = 15 = 3
5 · 4 = 20 = 2
5 · 5 = 25 = 1
I got 25 = 1 by dividing 25 by the modulus 6 — it goes in 4 times, with a remainder of 1.
Thus, according to the definition, 5−1 = 5. In other words, 5 is its own multiplicative inverse. This isn’t
unheard of: You know that in the real numbers, 1 is its own multiplicative inverse.
This also means that if you want to divide by 5 in Z6 , you should multiply by 5.
What about 4−1 in Z6 ? Unfortunately, if you take cases as I did with 5, you’ll see that for every number
n in Z6 , you do not have 4 · n = 1. Here’s a proof by contradiction which avoids taking cases. Suppose
4n = 1. Multiply both sides by 3:
4n = 1
3 · 4n = 3 · 1
12n = 3
0=3

4
I made the last step using the fact that 12n is a multiple of 6 (since 12 = 6 · 2), and multiples of 6 are
equal to 0 mod 6. Since “0 = 3” is a contradiction, 4 · n = 1 is impossible. So 4−1 is undefined in Z6 .
It happens to be true that in Z6 , the elements 0, 2, 3, and 4 do not have multiplicative inverses; 1 and
5 do.
And in Z10 , the elements 0, 2, 4, 5, 6, and 8 do not have multiplicative inverses; 1, 3, 7, and 9 do.
Do you see a pattern?

You probably don’t need much practice working with familiar number systems like the real numbers R,
so we’ll give some examples which involve arithmetic in Zn .

Example. (a) Reduce 22 to a number in {0, 1, 2, 3} in Z4 .

(b) Reduce −21 to a number in {0, 1, 2, 3} in Z4 .

(c) Compute 5 + 11 in Z12 .

(d) Compute 13 · 5 in Z17 .

(e) Compute 11 − 14 in Z10 .

(f) Compute 43 in Z11 .

(g) Compute
25! = 1 · 2 · 3 · 4 · · · · · 25 in Z23 .

It’s understood for a Zn problem your final answer should be a number in {0, 1, . . . n − 1}. You can
simplify as you do each step, or simplify at the end (divide by n and take the remainder).

(a) 22 = 4 · 5 + 2 = 2.

(b) −21 = −21 + 24 = 3.

Notice that 24 is a multiple of 4, so it’s equal to 0 in Z4 . You can also do this by dividing by 4 if you
do it carefully:
−21 = 4 · (−6) + 3 = 3.

(d) 13 · 5 = 65 = 17 · 3 + 14 = 14.

(e) 11 − 14 = −3 = −3 + 20 = 17.
Notice that I added a multiple of 10 (since 20 = 10 · 2) to get a positive number.

(f) 43 = 64 = 11 · 5 + 9 = 9.

(g) 1 · 2 · 3 · 4 · · · · · 25 includes all the numbers from 1 to 25; in particular, it includes 23. So the product is
a multiple of the modulus 23, and

25! = 1 · 2 · 3 · 4 · · · · · 25 = 0.

Example. (a) Find 7−1 in Z10 .

(b) Prove that 6 does not have a multiplicative inverse in Z10 .

(a) By trial and error, 7 · 3 = 21 = 20 + 1 = 1 in Z10 . Therefore, 7−1 = 3.

5
(b) Suppose 6n = 1 for some n in Z10 . Then

6n = 1
5 · 6n = 5 · 1
30n = 5
0=5
The last step follows from the fact that 30n is a multiple of 10, so it equals 0 mod 10. Since “0 = 5” is
a contradiction, 6n = 1 is impossible, and 6 does not have a multiplicative inverse.

Example. (a) Show that 2 doesn’t have a multiplicative inverse in Z4 .

(b) Show that 14 doesn’t have a multiplicative inverse in Z18 .
(a) Try all possibilities:
2 · 1 = 2, 2 · 2 = 0, 2 · 3 = 2.
There is no element of Z4 whose product with 2 gives 1. Hence, 2 doesn’t have a multiplicative inverse
in Z4 .
(b) Suppose 14x = 1 for x ∈ Z18 . Then
14x = 1
9 · 14x = 9 · 1
0=9
(Note that 9 · 14 = 136 = 7 · 18 = 0 in Z18 .) The last line above is a contradiction, so 14 does not have
a multiplicative inverse in Z18 .
You may have noticed that the elements in Zn which have multiplicative inverses are the elements which
are relatively prime to n.
You might wonder whether there is a systematic way to find multiplicative inverses in Zn . The best way
is to use the Extended Euclidean Algorithm; you might see it if you take a course in abstract algebra. In
this course, I’ll usually keep the examples small enough that trial and error is okay for finding multiplicative
inverses when you need them. But here’s an approach that you might prefer. Suppose you want to find 7−1
in Z11 . Consider multiples of 11, plus 1. Stop with the first such number that’s divisible by 7:
11 + 1 = 12 (Not divisible by 7)
22 + 1 = 23 (Not divisible by 7)
33 + 1 = 34 (Not divisible by 7)
44 + 1 = 45 (Not divisible by 7)
55 + 1 = 56 (56 = 7 · 8)
From this, I get 7−1 = 8, because

7 · 8 = 56 = 55 + 1 = 11 · 5 + 1 = 1.

Example. Find 8−1 in Z13 .

In Z13 , I have 8 · 5 = 1, so 8−1 = 5. You could do this by trial and error, since Z13 isn’t that big:

8 · 1 = 8, 8 · 2 = 16 = 3, 8 · 3 = 24 = 11, 8 · 4 = 32 = 6, 8 · 5 = 40 = 1.

Alternatively, take multiples of 13 and add 1, stopping when you get a number divisible by 8:
13 · 1 + 1 = 14 Not divisible by 8
13 · 2 + 1 = 27 Not divisible by 8
13 · 3 + 1 = 40 Divisible by 8

6
40
Then = 5, so 8−1 = 5.
8
Even this approach is too tedious to use with large numbers. The systematic way to find inverses is to
use the Extended Euclidean Algorithm.
We saw that in a commutative ring with identity, an element x might not have multiplicative inverse
x−1 . That in turn would prevent you from “dividing” by x. From the point of view of linear algebra, this is
inconvenient. Hence, we single out rings which are “nice” in that every nonzero element has a multiplicative
inverse.
Definition. A field F is a commutative ring with identity in which 1 6= 0 and every nonzero element has a
multiplicative inverse.
1
By convention, you don’t write “ ” instead of “x−1 ” unless the ring happens to be a ring with “real”
x
fractions (like Q, R, or C). You don’t write fractions in (say) Z7 .
If an element x has a multiplicative inverse, you can divide by x by multiplying by x−1 . Thus, in a field,
you can divide by any nonzero element. (You’ll learn in abstract algebra why it doesn’t make sense to divide
by 0.)
The rationals Q, the reals R, and the complex numbers C are fields. Many of the examples will use
these number systems.
The ring of integers Z is not a field. For example, 2 is a nonzero integer, but it does not have a
1
multiplicative inverse which is an integer. ( is not an integer — it’s a rational number.)
2
Q, R, and C are all infinite fields — that is, they all have infinitely many elements. But (for example)
Z5 is a field.
For applications, it’s important to consider finite fields like Z5 . Before I give some examples, I need
some definitions.
Definition. Let R be a commutative ring with identity. The characteristic of R is the smallest positive
integer n such that n · 1 = 0.
Notation: char R = n.
If there is no positive integer n such that n · 1 = 0, then char R = 0.
In fact, if char R = n, then n · x = 0 for all x ∈ R.
Z, Q, R, and C are all rings of characteristic 0. On the other hand, char Zn = n.
Definition. An integer n > 1 is prime if its only positive divisors are 1 and n.
The first few prime numbers are

2, 3, , 5, 7, 11, . . . .

An integer n > 1 which is not prime is composite. The first few composite numbers are

4, 6, 8, 9, . . . .

The following important results are proved in abstract algebra courses.

Theorem. The characteristic of a field is either 0 or a prime number.
Theorem. If p is prime and n is a positive integer, there is a field of characteristic p having pn elements.
This field is unique up to ring isomorphism, and is denoted GF (pn ) (the Galois field of order pn ).
The only unfamiliar thing in the last result is the phrase “ring isomorphism”. This is another concept
whose precise definition you’ll see in abstract algebra. The statement means, roughly, that any two fields
with pn elements are “the same”, in that you can get one from the other by just renaming or reordering the
elements.

7
Since the characteristic of Zn is n, the first theorem implies the following result:
Corollary. Zn is a field if and only if n is prime.
The Corollary tells us that Z2 , Z13 , and Z61 are fields, since 2, 3, and 61 are prime.
On the other hand, Z6 is not a field, since 6 isn’t prime (because 6 = 2 · 3). In fact, we saw it directly
when we showed that 4 does not have a multiplicative inverse in Z6 . Note that Z6 is a commutative ring
with identity.
For simplicity, the fields of prime characteristic that I use in this course will almost always be finite.
But what would an infinite field of prime characteristic look like?
As an example, start with Z2 = {0, 1}. Form the field of rational functions Z2 (x). Thus, elements
p(x)
of Z2 (x) have the form where p(x) and q(x) are polynomials with coefficients in Z2 . Here are some
q(x)
examples of elements of Z2 (x):
1 x2 + x + 1
, , 1, x7 + x3 + 1.
x x100 + 1
You can find multiplicative inverses of nonzero elements by taking reciprocals; for instance,

x2 + x + 1 x100 + 1
−1
= .
x100 + 1 x2 + x + 1

I won’t go through and check all the axioms, but in fact, Z2 (x) is a field. Moreover, since 2 · 1 = 0 in
Z2 (x), it’s a field of characteristic 2. It has an infinite number of elements; for example, it contains

1, x, x2 , x3 , ....

What about fields of characteristic p other than Z2 , Z3 , and so on? As noted above, these are called
Galois fields. For instance, there is a Galois field with 53 = 125 elements. To keep the computations simple,
we will rarely use them in this course. But here’s an example of a Galois field with 22 = 4 elements, so you
can see what it looks like.
GF (4) is the Galois field with 4 elements, and here are its addition and multiplication tables:

+ 0 1 a b · 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a

Notice that
1 + 1 = 0, a + a = 0, b + b = 0.
You can check by examining the multiplication table that multiplication is commutative, that 1 is the
multiplicative identity, and that the nonzero elements (1, a, and b) all have multiplicative inverses. For
instance, a−1 = b, because a · b = 1.
Since we’ve already seen a lot of weird things with these new number systems, we might as well see
another one.
Example. Find the roots of x2 + 5x + 6 in Z10 .
Make a table:

x 0 1 2 3 4 5 6 7 8 9
2
x + 5x + 6 6 2 0 0 2 6 2 0 0 2

8
For instance, plugging x = 4 into x2 + 5x + 6 gives

42 + 5 · 4 + 6 = 42 = 40 + 2 = 2.

The roots are x = 2, x = 3, x = 7 and x = 8.

You would normally not expect a quadratic to have 4 roots! This shows that algebraic facts you may
know for real numbers may not hold in arbitrary rings (note that Z10 is not a field).
Linear algebra deals with structures based on fields, and you’ve now seen most of the fields that will
come up in the examples. The modular arithmetic involved in working with Zn may be new to you, but it’s
not that hard with a little practice. And as I noted, most of the examples involving finite fields will use Zp
for p prime, rather than the more general Galois fields, or infinite fields of characteristic p.

c 2020 by Bruce Ikenaga 9

Matrix Arithmetic
Most of linear algebra involves mathematical objects called matrices. A matrix is a finite rectangular
array of numbers:

0
 
 
1 −3 0 5  5  h 17
 3 1i 0 0
0 −17 9   −5  −4 0
 
5 2 0 0
−1 −2 4 −3 3
 
−3.14

In this case, the numbers are elements of Q (or R). In general, the entries will be elements of some
commutative ring or field.
In this section, I’ll explain operations with matrices by example. I’ll discuss and prove some of the
properties of these operations later on.
Dimensions of matrices. An m × n matrix is a matrix with m rows and n columns. Sometimes this is
expressed by saying that the dimensions of the matrix are m × n.

0 0
 

1 2 3 0 0
[π]
4 5 6 0 0
 
0 0
a 2 × 3 matrix a 4×2 matrix a 1 × 1 matrix

A 1 × n matrix is called an n-dimensional row vector. For example, here’s a 3-dimensional row
vector: √
[4 17 − 5 42 ]
Likewise, an n × 1 matrix is called an n-dimensional column vector. Here’s a 3-dimensional column
vector:  
0
 17 − e2 
√
3
617
An n × n matrix is called a square matrix. For example, here is a 2 × 2 square matrix:

1 x
z2 32

Notice that by convention the number of rows is listed first and the number of columns second. This
convention also applies to referring to specific elements of matrices. Consider the following matrix:
 √ 
11 −0.4 33 2
a+b x2 sin y j−k 
−8 0 13 114.71

The (2, 4)th element is the element in row 2 and column 4, which is j − k. (Note that the first row is
row 1, not row 0, and similarly for columns.) The (3, 3)th element is the element in row 3 and column 3,
which is 13.
Equality of matrices. Two matrices are equal if they have the same dimensions and the corresponding
entries are equal. For example, suppose

7 0 3 a b 3
= .
5 −4 10 5 −4 c

1
Then if I match corresponding elements, I see that a = 7, b = 0, and c = 10.
Definition. If R is a commutative ring, then M (n, R) is the set of n × n matrices with entries in R.
Note: Some people use the notation Mn (R).
For example, M (2, R) is the set of 2 × 2 matrices with real entries. M (3, Z5 ) is the set of 3 × 3 matrices
with entries in Z5 .
Adding and subtracting matrices. For these examples, I’ll assume that the matrices have entries in R.
You can add (or subtract) matrices by adding (or subtracting) corresponding entries.

1 −1 0 7 1 3 1 + 7 −1 + 1 0+3 8 0 3
+ = = .
2 −4 6 0 3 −6 2 + 0 −4 + 3 6 + (−6) 2 −1 0

Matrix addition is associative — symbolically,

(A + B) + C = A + (B + C).

This means that if you are adding several matrices, you can group them any way you wish:

1 −2 0 2 4 3 1 0 4 3 5 3
+ + = + = .
5 −11 9 8 −7 6 14 −3 −7 6 7 3

1 −2 0 2 4 3 1 −2 4 5 5 3
+ + = + = .
5 −11 9 8 −7 6 5 −11 2 14 7 3
Here’s an example of subtraction:
     
1 −3 6 3 −5 −6
2 0  − 2 π  =  0 −π .
√ √
4 2 0 0 4 2

Note that matrix addition is commutative:

7 1 3 1 −1 0 8 0 3
+ = .
0 3 −6 2 −4 6 2 −1 0

Symbolically, if A and B are matrices with the same dimensions, then

A + B = B + A.

Of course, matrix subtraction is not commutative.

You can only add or subtract matrices with the same dimensions. These two matrices can’t be added
or subtracted:
1 2 0 0 0
and .
3 4 0 0 0
Adding or subtracting matrices over Zn . You add or subtract matrices over Zn by adding or subtracting
corresponding entries, but all the arithmetic is done in Zn .
For example, here is an addition and a subtraction over Z5 :

1 2 4 4 3 0 1+4 2+3 4+0 0 0 4
+ = = .
0 3 3 2 1 3 0+2 3+1 3+3 2 4 1

3 4 2 1 3−2 4−1 1 3 1 3
− = = = .
2 0 4 1 2−4 0−1 −2 −1 3 4

2
Note that in the second example, there were some negative numbers in the middle of the computation,
but the final answer was expressed entirely in terms of elements of Z5 = {0, 1, 2, 3, 4}.
Definition. A zero matrix 0 is a matrix all of whose entries are 0.
0 0 0
 

0 0 0 0 0
0 0 0 0 0
 
0 0 0
There is an m × n zero matrix for every pair of positive dimensions m and n.
If you add the m × n zero matrix to another m × n matrix A, you get A:

31 97 0 0 31 97
+ = .
24 −53 0 0 24 −53
In symbols, if 0 is a zero matrix and A is a matrix of the same size, then

A + 0 = A and 0 +A = A.
A zero matrix is said to be an identity element for matrix addition.
Note: At some point, I may just write “0” instead of “0” (with the boldface) for a zero matrix, and rely
on context to tell it apart from the number 0.
Multiplying matrices by numbers. You can multiply a matrix by a number by multiplying each entry
by the number. Here is an example with real numbers:
     
3 −3 7 · 3 7 · (−3) 21 −21
7 ·  4 −1  =  7 · 4 7 · (−1)  =  28 −7  .
0 2 7·0 7·2 0 14
Things work in the same way over Zn , but all the arithmetic is done in Zn . Here is an example over Z5 :

2 1 3·2 3·1 1 3
3· = = .
4 1 3·4 3·1 2 3
Notice that, as usual with Z5 , I simplified my final answer so that all the entries of the matrix were in
the set {0, 1, 2, 3, 4}.
Unlike the operations I’ve discussed so far, matrix multiplication (multiplying two matrices) does not
work the way you might expect: You don’t just multiply corresponding elements of the matrices, the way
you add corresponding elements to add matrices.
To explain matrix multiplication, I’ll remind you first of how you take the dot product of two vectors;
you probably saw this in a multivariable calculus course. (If you’re seeing this for the first time, don’t worry
— it’s easy!) Here’s an example of a dot product of two 3-dimensional vectors of real numbers:
 
2
[ −3 7 9 ]  0  = (−3) · 2 + 7 · 0 + 9 · 10 = −6 + 0 + 90 = 84.
10

-3 7 9

3
Note that the vectors must be the same size, and that the product is a number. This is actually an
example of matrix multiplication, and we’ll see that the result should technically be a 1 × 1 matrix. But for
dot products, we will write
84 instead of [ 84 ] .

If you’ve seen dot products in multivariable calculus, you might have seen this written this way:

(−3, 7, 9) · (2, 0, 10) = 84.

But for what I’ll do next, I want to distinguish between the row (first) vector and the column (second)
vector.

Multiplying matrices. To compute the product AB of two matrices, take the dot products of the rows
of A with the columns of B. In this example, assume all the matrices have real entries.
 
1 6
2 1 4  (2 + 5 + 8 = 15) ·
5 1 =
−1 0 3 · ·
2 −2

 
1 6
2 1 4  15 (12 + 1 − 8 = 5)
5 1 =
−1 0 3 · ·
2 −2
 
1 6
2 1 4  15 5
5 1 =
−1 0 3 (−1 + 0 + 6 = 5) ·
2 −2
 
1 6
2 1 4  15 5
5 1  =
−1 0 3 5 (−6 + 0 − 6 = −12)
2 −2
 
1 6
2 1 4  15 5
5 1 = .
−1 0 3 5 −12
2 −2

This schematic picture shows how the rows of the first matrix are “dot-producted” with the columns of
the second matrix to produce the 4 elements of the product:

You can see that you take the dot products of the rows of the first matrix with the columns of the second
matrix.

4
In order for the multiplication to work, the matrices must have compatible dimensions: The number of
columns in A should equal the number of rows of B. Thus, if A is an m × n matrix and B is an n × p matrix,
AB will be an m × p matrix. For example, this won’t work:
 
−5 −5
−2 1 0 8 
2 13  (Won’t work!)
1 11 13 9
1 7

Do you see why? The rows of the first matrix have 4 entries, but the columns of the second matrix have
3 entries. You can’t take the dot products, because the entries won’t match up.
Here are two more examples, again using matrices with real entries:
 
2 0
[ −2 3 0 ]  9 −4  = [ 23 −12 ] .
1 5

4 −2 1 1 4 10
= .
1 1 0 −3 1 −2
Here is an example with matrices in M (2, Z3 ). All the arithmetic is done in Z3 .

2 1 1 2 4 5 1 2
= = .
0 2 2 1 4 2 1 2

Notice that I simplify the final result so that all the entries are in Z3 = {0, 1, 2}.
If you multiply a matrix by a zero matrix, you get a zero matrix:

0 0 9 −3 43 0 0 0
= .
0 0 5 0 −1.2 0 0 0

In symbols, if 0 is a zero matrix and A is a matrix compatible with it for multiplication, then

A·0= 0 and 0 · A = 0.

Matrix multiplication takes a little practice, but it isn’t hard. The big principle to take away is that you
take the dot products of the rows of the first matrix with the columns of the second matrix. This picture of
matrix multiplication will be very important for our work with matrices.
I should say something about a point I mentioned earlier: Why not multiply matrices like this?

4 −3 2 6 4 · 2 (−3) · 6
gives ?
5 2 0 4 5·0 2·4

You could define a matrix multiplication this way, but unfortunately, it would not be useful for applica-
tions (such as solving systems of linear equations). Or when we study the relationship between matrices and
linear transformations, we’ll see that the matrix multiplication we defined using dot products corresponds
to the composite of transformations. So even though the “corresponding elements” definition seems simpler,
it doesn’t match up well with the way matrices are used.
Here’s a preview of how matrices are related to systems of linear equations.
Example. Write the system of equations which correspond to the matrix equation
 
a
2 3 0   13
b = .
−1 5 17 0
c

5
Multiply out the left side:
2a + 3b 13
= .
−a + 5b + 17c 0
Two matrices are equal if their corresponding entries are equal, so equate corresponding entries:

2a + 3b = 13
−a + 5b + 17c = 0

Example. Write the following system of linear equations as a matrix multiplication equation:

x + 4y − 6z = 3
−2x + 10z = 3
x + y + z = 5

Take the 9 coefficients of the variables x, y, and z on the left side and make them into a 3×3 matrix. Put
the variables into a 3-dimensional column vector. Make the 3 numbers on the right side into a 3-dimensional
column vectors. Here’s what we get:
    
1 4 −6 x 3
 −2 0 10   y  =  3  .
1 1 1 z 5

Work out the multiplication on the left side of this equation and you’ll see how this represents the
original system.
This is really important! It allows us to represent a system of equations as a matrix equation. Eventually,
we’ll figure out how to solve the original system by working with the matrices.

Identity matrices. There are special matrices which serve as identities for multiplication: The n × n
identity matrix is the square matrices with 1’s down the main diagonal — the diagonal running from
northwest to southeast — and 0’s everywhere else. For example, the 3 × 3 identity matrix is
 
1 0 0
I = 0 1 0.
0 0 1

If I is the n × n identity and A is a matrix which is compatible for multiplication with A, then

AI = A and IA = A.

For example,
3 −7 10  3 −7 10
  
1 0 0
1 1
π −1.723   0 1 0 = π −1.723  .
   
2 0 0 1 2
0 19 712 0 19 712
Matrix multiplication obeys some of the algebraic laws you’re familiar with. For example, matrix multi-
plication is associative: If A, B, and C are matrices and their dimensions are compatible for multiplication,
then
(AB)C = A(BC).
However, matrix multiplication is not commutative in general. That is, AB 6= BA for all matrices A,
B.
One trivial way to get a counterexample is to let A be 3 × 5 and let B be 5 × 3. Then AB is 3 × 3 while
BA is 5 × 5. Since AB and BA have different dimensions, they can’t be equal.

6
However, it’s easy to come up with counterexamples even when AB and BA have the same dimensions.
For example, consider the following matrices in M (2, R):

1 −1 1 0
A= and B = .
2 3 4 1

Then you can check by doing the multiplications that

−3 −1 1 −1
AB = but BA = .
14 3 6 −1

Many of the properties of matrix arithmetic are things you’d expect — for example, that matrix addition
is commutative, or that matrix multiplication is associative. You should pay particular attention when things
don’t work the way you’d expect, and this is such a case. It is very significant that matrix multiplication is
not always commutative.
Transposes. If A is a matrix, the transpose AT of A is obtained by swapping the rows and columns of A.
For example,  
T 1 4
1 2 3
= 2 5.
4 5 6
3 6
Notice that the transpose of an m × n matrix is an n × m matrix.
Example. Consider the following matrices with real entries:

0 1
 

1 0 1 2 c c  2 −1 
A= , B= , C = .
2 1 −1 0 1 1 0 0
1 1

(a) Compute CB + AT .
(b) Compute AC − 2B.
(a)

0 1 1 1 1 2 2 3
       
T
2 −1  c c 1 0 1 2  2c − 1 2c − 1   0 1   2c − 1 2c 
CB+AT =  + = +  = .

0 0 1 1 2 1 −1 0 0 0 1 −1 1 −1

1 1 c+1 c+1 2 0 c+3 c+1

(b)

0 1
 

1 0 1 2 2 −1  c c 2 3 2c 2c 2 − 2c 3 − 2c
AC − 2B = −2 = − = .
2 1 −1 0 0 0 1 1 2 1 2 2 0 −1

1 1

The inverse of a matrix. The inverse of an n × n matrix A is a matrix A−1 which satisfies

AA−1 = I and A−1 A = I.

There is no such thing as matrix division in general, because some matrices do not have inverses. But
if A has an inverse, you can simulate division by multiplying by A−1 . This is often useful in solving matrix
equations.
An n × n matrix which does not have an inverse is called singular.

7
We’ll discuss matrix inverses and how you find them in detail later. For now, here’s a formula that we’ll
use frequently.
Proposition. Consider the following matrix with entries in a commutative ring with identity R.

a b
A= .
c d

Suppose that ad − bc is invertible in R. Then the inverse of A is

d −b
A−1 = (ad − bc)−1 .
−c a

Note: The number ad − bc is called the determinant of the matrix. We’ll discuss determinants later.
Remember that not every element of a commutative ring with identity has an inverse. For example,
4−1 is undefined in Z6 . If we’re dealing with real numbers, then ad − bc has a multiplicative inverse if and
only if it’s nonzero.
Proof. To show that the formula gives the inverse of A, I have to check that AA−1 = I and A−1 A = I:

a b d −b −1 ad − bc 0 1 0
AA =−1
· (ad − bc)−1
= (ad − bc) = ,
c d −c a 0 ad − bc 0 1

d −b a b −1 ad − bc 0 1 0
A A = (ad − bc)
−1 −1
= (ad − bc) = .
−c a c d 0 ad − bc 0 1
This proves that the formula gives the inverse of a 2 × 2 matrix.
Here’s an example of this formula for a matrix with real entries. Notice that 3 · (−1) − 5 · (−2) = 7. So
−1
3 5 1 −1 −5
= · .
−2 −1 7 2 3

(If I have a fraction outside a matrix, I may choose not to multiply it into the matrix to make the result
look nicer. Generally, if there is an integer outside a matrix, I will multiply it in.)
Recall that a matrix which does not have an inverse is called singular. Suppose we have a real matrix

a b
A= .
c d

The formula above will produce an inverse if ad−bc is invertible, which for real numbers means ad−bc 6=
0. So the matrix is singular if ad − bc = 0.
Example. For what values of x is the following real matrix singular?

x−1 3
4 x−5

The matrix is singular — not invertible — if

ad − bc = (x − 1)(x − 5) − (3)(4) = 0.

Solve for x:
(x2 − 6x + 5) − 12 = 0
x2 − 6x − 7 = 0
(x − 7)(x + 1) = 0

8
This gives x = 7 or x = −1. The matrix is singular for x = 7 and for x = −1.

Be careful! The following matrix is singular over Z6 :

5 1
3 1

However, its determinant is 5 · 1 − 1 · 3 = 2, which is nonzero. The point is that 2 is not invertible in
Z6 , even though it’s nonzero.

Example. (a) Find 2−1 in Z5 .

(b) Find the inverse of the following matrix over Z5 .

3 1
1 1

(You should multiply any numbers outside the matrix into the matrix, and simplify all the numbers in
the final answer in Z5 .)

(a) Since 2 · 3 = 6 = 1 in Z5 , I have 2−1 = 3.

(b) First, the determinant is 3 · 1 − 1 · 1 = 2. I saw in (a) that 2−1 = 3. So

−1
3 1 1 −1 1 −1
=2 −1
· =3· =
1 1 −1 3 −1 3

3 −3 3 2
= .
−3 9 2 4
I reduced all the numbers in Z5 at the last step; for example, −3 = 2 in Z5 . As a check,

3 2 3 1 11 5 1 0
= = .
2 4 1 1 10 6 0 1

That is, when I multiply the inverse and the original matrix, I get the identity matrix. Check for yourself
that it also works if I multiply them in the opposite order.

We’ll discuss solving systems of linear equations later, but here’s an example of this which puts a lot of
the ideas we’ve discussed together.

Example. (Solving a system of equations) Solve the following system over R for x and y using the
inverse of a matrix.
x + 3y = 7
2x − 2y = −2

Write the equation in matrix form:

1 3 x 7
= .
2 −2 y −2

Using the formula for the inverse of a 2 × 2 matrix, I find that

−1
1 3 1 −2 −3
=− .
2 −2 8 −2 1

9
Multiply both sides by the inverse matrix:

1 −2 −3 1 3 x 1 −2 −3 7
− =− .
8 −2 1 2 −2 y 8 −2 1 −2

On the left, the square matrix and its inverse cancel, since they multiply to I. (Do you see how that is
like “dividing by the matrix”?) On the right,

1 −2 −3 7 1
− = .
8 −2 1 −2 2

Therefore,
x 1
= .
y 2
The solution is x = 1, y = 2.

c 2020 by Bruce Ikenaga 10

Properties of Matrix Arithmetic
I’ve given examples which illustrate how you can do arithmetic with matrices. Now I’ll give precise
definitions of the various matrix operations. This will allow me to prove some useful properties of these
operations. If you look at the definitions, you’ll see the ideas we showed earlier by example.
The proofs will probably look pretty abstract to you, somewhat like our earlier proofs for properties of
number systems. How you approach them depends on what you intend to learn. If you’re mainly interested in
doing computations, you could skim these properties and definitions without worrying about the proofs. At
least try to understand, by example if necessary, what they say. Get an idea of what kinds of manipulations
“are allowed” with matrices.
If you’re trying to learn this material in some depth (particularly if you intend to do more advanced
math), try to read at least some of the proofs. Perhaps challenge yourself by trying to write some of the
proofs yourself, for example the proofs that I’ve omitted.
If A is a matrix, the element in the ith row and j th column will be denoted Aij . (Sometimes I’ll switch
to lower-case letters and use aij instead of Aij ). Thus,
(A + B)ij is the (i, j)th entry of A + B.

(AB)ij is the (i, j)th entry of AB.

(AT )ij is the (i, j)th entry of AT .
(kA + B T )ij is the (i, j)th entry of kA + B T .
Keep in mind that a thing with an “ij” subscript is a number — without the subscript, you have a
matrix.
Remark. To avoid confusion, use a comma between the indices where appropriate. “A24 ” clearly means
the entry in row 2, column 4. However, for the entry in row 13, column 6, write “A13,6 ”, not “A136 ”.
Here are the formal definitions of the matrix operations. When I write something like “for all i and j”,
you should take this to mean “for all i such that 1 ≤ i ≤ m, where m is the number of rows, and for all j
such that 1 ≤ j ≤ n, where n is the number of columns”.
Definition. (Equality) Let A and B be matrices. Then A = B if and only if A and B have the same
dimensions and aij = bij for all i and j.
This definition says that two matrices are equal if they have the same dimensions and corresponding
entries are equal.
Definition. (Sums and Differences) Let A and B be matrices. If A and B have the same dimensions,
then the sum A + B and the difference A − B are defined, and their entries are given by

(A + B)ij = Aij + Bij and (A − B)ij = Aij − Bij for all i, j.

This definition says that if two matrices have the same dimensions, you can add or subtract them by
adding or subtracting corresponding entries.
Definition. The m × n zero matrix 0 is the m × n matrix whose (i, j)th entry is given by

0ij = 0 for all i, j.

Proposition. Let A, B, and C be m × n matrices, and let 0 denote the m × n zero matrix. Then:
(a) (Associativity of Addition)

(A + B) + C = A + (B + C).

1
(b) (Commutativity of Addition)

A + B = B + A.

(c) (Identity for Addition)

0+A=A and A + 0 = A.
Remark. You’ll see that the idea in many of these proofs for matrices is to reduce the proof to a known
property of numbers (such as associativity or commutativity) by looking at the entries of the matrices. This
is what the definition of equality of matrices allows us to do.
You can also see that the properties in this Proposition look “obvious” — they are statements that
you’d “expect” to be true, even if no one told you. That is typical for most of the material in this section,
which is why you should not be intimidated by all the symbols.
Most of the proofs that follow are fairly simple. Hence, I won’t write out all of them, and I won’t always
do them in step-by-step detail like the ones above.
Proof. Each of the properties is a matrix equation. The definition of matrix equality says that I can prove
that two matrices are equal by proving that their corresponding entries are equal. I’ll follow this strategy in
each of the proofs that follows.
(a) To prove that (A + B) + C = A + (B + C), I have to show that their corresponding entries are equal:

((A + B) + C)ij = (A + (B + C))ij .

(Do you understand what this says? ((A + B) + C)ij is the (i, j)th entry of (A + B) + C, while
(A + (B + C))ij is the (i, j)th entry of A + (B + C).)
Since this is the first proof of this kind that I’ve done, I’ll show the justification for each step.

((A + B) + C)ij = (A + B)ij + Cij (Definition of matrix addition)

= (Aij + Bij ) + Cij (Definition of matrix addition)
= Aij + (Bij + Cij ) (Associativity for numbers)
= Aij + (B + C)ij (Definition of matrix addition)
= (A + (B + C))ij (Definition of matrix addition)

“Associativity” refers to associativity of addition for numbers.

Therefore, (A + B) + C = A + (B + C), because their corresponding elements are equal.
(b) To prove that A + B = B + A, I have to show that their corresponding entries are equal:

(A + B)ij = (B + A)ij .

By definition of matrix addition,

(A + B)ij = Aij + Bij (Definition of matrix addition)

= Bij + Aij (Commutativity for numbers)
= (B + A)ij (Definition of matrix addition)

“Commutativity” refers to commutativity of addition of numbers.

Therefore,
A + B = B + A.

(A + 0)ij = Aij .

2
By definition of matrix addition and the zero matrix,

(A + 0)ij = Aij + 0ij (Definition of matrix addition)

= Aij + 0 (Definition of zero matrix)
= Aij (Basic arithmetic fact)

“Basic arithmetic fact” refers to the fact that if x is a number, then x + 0 = x.

Therefore, A + 0 = A.
In part (b), I showed that addition is commutative. Therefore,

0 + A = A + 0 = A.

Definition. (Multiplication by Numbers) If A is a matrix and k is a number, then kA is the matrix

having the same dimensions as A, and whose entries are given by

(kA)ij = k · Aij for all i, j.

(It’s considered ugly to write a number on the right side of a matrix if you want to multiply. For the
record, I’ll define Ak to be the same as kA.)
This definition says that to multiply a matrix by a number, multiply each entry by the number.
Definition. If A is a matrix, then −A is the matrix having the same dimensions as A, and whose entries
are given by
(−A)ij = −Aij .
Proposition. Let A and B be matrices with the same dimensions, and let k be a number. Then:
(a) k(A + B) = kA + kB and k(A − B) = kA − kB.
(b) 0 · A = 0.
(c) 1 · A = A.
(d) (−1) · A = −A.
(e) A − B = A + (−B).

Note that in (b), the 0 on the left is the number 0, while the 0 on the right is the zero matrix.
Proof. I’ll prove (a) and (c) by way of example and leave the proofs of the other parts to you.
First, I want to show that k(A + B) = kA + kB. I have to show that corresponding entries are equal,
which means
(k(A + B))ij = (kA + kB)ij .
I apply the definitions of matrix addition and multiplication of a matrix by a number:

(k(A + B))ij = k · (A + B)ij = k(Aij + Bij ) = kAij + kBij ,

(kA + kB)ij = (kA)ij + (kB)ij = kAij + kBij .

Therefore, (k(A + B))ij = (kA + kB)ij , so k(A + B) = kA + kB.
Next, I want to show that k(A − B) = kA − kB. I just repeat the last proof with “−” in place of “+”.
I have to show that corresponding entries are equal, which means

(k(A − B))ij = (kA − kB)ij .

I apply the definitions of matrix subtraction and multiplication of a matrix by a number:

(k(A − B))ij = k · (A − B)ij = k(Aij − Bij ) = kAij − kBij ,

3
(kA − kB)ij = (kA)ij − (kB)ij = kAij − kBij .
Therefore, (k(A − B))ij = (kA − kB)ij , so k(A − B) = kA − kB.
For (c), I want to show that 1 · A = A. As usual, this means I must show that they have the same
(i, j)th entries:
(1 · A)ij = Aij .
I use the definition of multiplying a matrix by a number:

(1 · A)ij = 1 · Aij = Aij .

This proves that 1 · A = A.

Suppose A and B are matrices with compatible dimensions for multiplication. Where does the (i, j)th
entry of AB come from? It comes from multiplying the ith row of A with the j th column of B:

j
 
 .
.. .. ..  · · · B1j ···
. . ···  B2j ···
i  Ai1
 Ai2 · · · Ain   ..
 

.. .. ..  . 
. . . ··· Bnj ···

Corresponding elements are multiplied, and then the products are summed:

Ai1 B1j + Ai2 B2j + · · · + Ain Bnj .

Look at the pattern of the subscripts in this sum. You can see that the “inner” matching subscripts are
going from 1 to n, while the “outer” “i” and “j” don’t change. Hence, I can write the (i, j)th entry of the
product in summation form as
Xn
Aik Bkj .
k=1

That is
n
X
(AB)ij = Aik Bkj .
k=1

Definition. (Multiplication) Let A be an m × n matrix and let B be an n × p matrix. The product AB

is the m × p matrix whose (i, j)th entry is given by
n
X
(AB)ij = Aik Bkj for all i, j.
k=1

It’s often useful to have a symbol which you can use to compare two quantities i and j — specifically,
a symbol which equals 1 when i = j and equals 0 when i 6= j.
Definition. The Kronecker delta is defined by

1 if i = j
δij = .
0 if i =
6 j

For example,
δ12 = 0, δ17,17 = 1, δ84 = 0.

n
X
Lemma. δij aj = ai .
j=1

4
Proof. To see what’s happening, write out the sum:
n
X
δij aj = δi1 a1 + δi2 a2 + δi3 a3 + · · · + δin an .
j=1

By definition, each δ with unequal subscripts is 0. The only δ that is not 0 is the one with equal
subscripts. Since i is fixed, the δ that is not 0 is δii , which equals 1. Thus,

δi1 a1 + δi2 a2 + δi3 a3 + · · · + δii ai + · · · + δin an = 0 · a1 + 0 · a2 + 0 · a3 + · · · + 1 · ai + · · · + 0 · an = ai .

Definition. The n × n identity matrix In is the n × n matrix whose (i, j)th entry is given by

Iij = δij for all i, j.

I will write I instead of In if there’s no risk of confusion.

Before I give the next result, I’ll discuss a technique which is often used when you have two summations,
one inside the other. For example, suppose you have
2 X
X 3
(ai + bj ).
i=1 j=0

If I write out the terms, they look like this:

(a1 + b0 ) + (a1 + b1 ) + (a1 + b2 ) + (a1 + b3 )+

(a2 + b0 ) + (a2 + b1 ) + (a2 + b2 ) + (a2 + b3 )

You can see I “cycled” through the inner (“j”) index from 0 to 3 first, while holding i = 1. Then I
changed i to 2, and cycled through the j index again. To interchange the order of summation means that I
will get the same sum if I cycle through i first, then j. That is,
2 X
X 3 3 X
X 2
(ai + bj ) = (ai + bj ).
i=1 j=0 j=0 i=1

3 X
X 2
Here is how (ai + bj ) looks if I write out the terms:
j=0 i=1

(a1 + b0 ) + (a2 + b0 )+
(a1 + b1 ) + (a2 + b1 )+
(a1 + b2 ) + (a2 + b2 )+
(a1 + b3 ) + (a2 + b3 )

You can see that I get exactly the same terms, just in a different order. Therefore, the sums are the
same.
In general, if aij is some expression involving i and j, then
m X
X n n X
X m
aij = aij .
i=1 j=1 j=1 i=1

A similar result holds for three or more sums.

I will use interchanging the order of summation in proving associativity of matrix multiplication.
Proposition.

5
(a) (Associativity of Matrix Multiplication) If A, B, and C are matrices which are compatible for
multiplication, then
(AB)C = A(BC).
(b) (Distributivity of Multiplication over Addition) If A, B, C, D, E, and F are matrices
compatible for addition and multiplication, then

A(B + C) = AB + AC and (D + E)F = DF + EF.

k(AB) = (kA)B = A(kB) and (jk)A = j(kA).

(d) (Identity for Multiplication) If A is an m × n matrix, then

AIn = A and Im A = A.

The “compatible for addition” and “compatible for multiplication” assumptions mean that the matrices
should have dimensions which make the operations in the equations legal — but otherwise, there are no
restrictions on what the dimensions can be.
Proof. I’ll prove (a) and part of (d) by way of example, and leave the proofs of the other parts to you.
Before starting, I should say that this proof is rather technical, but try to follow along as best you can.
I’ll use i, j, k, and l as subscripts.
Suppose that A is an m × n matrix, B is an n × p matrix, and C is a p × q matrix. I want to prove that
(AB)C = A(BC). I have to show that corresponding entries are equal, i.e.

((AB)C)il = (A(BC))il .

By definition of matrix multiplication,

 
p
X p
X n
X
((AB)C)il = (AB)ik Ckl =  Aij Bjk  Ckl ,
k=1 k=1 j=1

p
n n
!
X X X
(A(BC))il = Aij (BC)jl = Aij Bjk Ckl .
j=1 j=1 k=1

If you stare at those two terrible double sums for a while, you can see that they involve the same A, B,
and C terms, and they involve the same summations — but in different orders. I’m allowed to convert one
into the other by interchanging the order of summation, and using the distributive law:
 
p p X p p
n n n X n
!
X X X X X X
 Aij Bjk  Ckl = (Aij Bjk Ckl ) = (Aij Bjk Ckl ) = Aij Bjk Ckl .
k=1 j=1 k=1 j=1 j=1 k=1 j=1 k=1

Therefore, ((AB)C)il = (A(BC))il , and so (AB)C = A(BC). Wow!

Next, I’ll prove the second part of (d), namely that Im A = A. As usual, I must show that corresponding
entries are equal:
(Im A)ij = Aij .
By definition of matrix multiplication and the identity matrix,
m
X
(Im A)ij = δik Akj .
k=1

6
Using the lemma I proved on the Kronecker delta, I get
m
X
δik Akj = Aij .
k=1

Thus, (Im A)ij = Aij , and so Im A = A.

Definition. Let A be an m × n matrix. The transpose of A is the n × m matrix whose (i, j)th entry is
given by
(AT )ij = Aji for all i, j.
Proposition. Let A and B be matrices of the same dimension, and let k be a number. Then:
(a) (AT )T = A.

(b) (A + B)T = AT + B T .
(c) (kA)T = kAT .
Proof. I’ll prove (b) by way of example and leave the proofs of the other parts for you.
I want to show that (A + B)T = AT + B T . I have to show the corresponding entries are equal:

(A + B)T = (AT + B T )ij .

Now
(A + B)T

ij
= (A + B)ji = Aji + Bji .

(AT + B T )ij = (AT )ij + (B T )ij = Aji + Bji .

Thus, (A + B)T ij
= (AT + B T )ij , so (A + B)T = AT + B T .

Proposition. Suppose A and B are matrices which are compatible for multiplication. Then

(AB)T = B T AT .

Proof. I’ll derive this using the matrix multiplication formula.

n
X
(AB)ij = Aik Bkj .
k=1

Let (AT )ij denote the (i, j)th entry of AT , and likewise for B and AB. Then
n
X n
X n
X
[(AB)T ]ji = (AB)ij = Aik Bkj = (AT )ki (B T )jk = (B T )jk (AT )ki .
k=1 k=1 k=1

The product on the right is the (j, i)th entry of B T AT , while [(AB)T ]ji is the (j, i)th entry of (AB)T .
Therefore, (AB)T = B T AT , since their corresponding entries are equal.
Definition.
(a) A matrix X is symmetric if X T = X.
(b) A matrix X is skew symmetric if X T = −X.
Both definitions imply that X is a square matrix.
Using the definition of transpose, I can express these definitions in terms of elements.

7
X is symmetric if
Xij = Xji for all i, j.
X is skew symmetric if
Xij = −Xji for all i, j.

Visually, a symmetric matrix is symmetric across its main diagonal (the diagonal running from north-
west to southeast). For example, this real matrix is symmetric:
 
0 √2 −9
 2 3 4 .
−9 4 5

Here’s a skew symmetric real matrix:

 
0 −3 −2
3 0 17  .
2 −17 0

Entries which are symmetrically located across the main diagonals are negatives of one another. The
entries on the main diagonal must be 0, since they must be equal to their negatives.
The next result is pretty easy, but it illustrates how you can use the definitions of symmetry and skew
symmetry in writing proofs. In these proofs, in contrast to earlier proofs, I don’t need to write out the
entries of the matrices, since I can use properties I’ve already proved.
Proposition.
(a) The sum of symmetric matrices is symmetric.
(b) The sum of skew symmetric matrices is skew symmetric.
Proof. (a) Let A and B be symmetric. I must show that A + B is symmetric. Now

(A + B)T = AT + B T = A + B.

The first equality follows from a property I proved for transposes. The second equality follows from the
fact that A is symmetric (so AT = A) and B is symmetric (so B T = B).
Since (A + B)T = A + B, it follows that A + B is symmetric.
(b) Let A and B be skew symmetric, so AT = −A and B T = −B. I must show that A+B is skew symmetric.
Now
(A + B)T = AT + B T = −A + (−B) = −(A + B).
Therefore, A + B is skew symmetric.

c 2020 by Bruce Ikenaga 8

5-30-2021
Row Reduction
Row reduction (or Gaussian elimination) is the process of using row operations to reduce a
matrix to row reduced echelon form. This procedure is used to solve systems of linear equations, invert
matrices, compute determinants, and do many other things.
I’ll begin by describing row operations, after which I’ll show how they’re used to do a row reduction.

Warning: Row reduction uses row operations. There are similar operations for columns which can be used
in other situations (like computing determinants), but not here.

There are three kinds of row operations. (Actually, there is some redundancy here — you can get away
with two of them.) In the examples below, assume that we’re using matrices with real entries (but see the
notes under (b)).

(a) You may swap two rows.

Here is a swap of rows 2 and 3. I’ll denote it by r2 ↔ r3 .

   
1 2 3 1 2 3
4 5 6 → 7 8 9
7 8 9 4 5 6

(b) You may multiply (or divide) a row by a number that has a multiplicative inverse.

If your number system is a field, a number has a multiplicative inverse if and only if it’s nonzero. This
is the case for most of our work in linear algebra.
If your number system is more general (e.g. a commutative ring with identity), it requires more
care to ensure that a number has a multiplicative inverse.

Here is row 2 multiplied by π. I’ll denote this operation by r2 → πr2 .

   
1 2 3 1 2 3
4 5 6 →  4π 5π 6π 
7 8 9 7 8 9

(c) You may add a multiple of a row to another row.

For example, I’ll add 2 times row 1 to row 2. Notation: r2 → r2 + 2r1 .

   
1 −1 5 1 −1 5
 −2 4 2  → 0 2 12 
1 3 1 1 3 1

Notice in “r2 → r2 + 2r1 ” row 2 is the row that changes; row 1 is unchanged. You figure the “2r1 ” on
scratch paper, but you don’t actually change row 1.
You can do the arithmetic for these operations in your head, but use scratch paper if you need to. Here’s
the arithmetic for the last operation:

r2 = −2 4 2
2r1 = 2 −2 10
r2 + 2r1 = 0 2 12

Since the “multiple” can be negative, you may also subtract a multiple of a row from another row.

1
For example, I’ll subtract 4 times row 1 from row 2. Notation: r2 → r2 − 4r1 .
   
1 2 3 1 2 3
4 5 6 → 0 −3 −6 
7 8 9 7 8 9

Notice that row 1 was not affected by this operation. Likewise, if you do r17 → r17 − 56r31 , row 17
changes and row 31 does not.

Operation (c) is probably the one that you will use the most.

You may wonder: Why are these operations allowed, but not others? We’ll see that these row operations
imitate operations we can perform when we’re solving a system of linear equations.

Example. In each case, tell whether the operation is a valid single row operation on a matrix with real
numbers. If it is, say what it does (in words).

(a) r5 ↔ r3 .

(b) r6 → r6 − 7.

(d) r6 → 5r6 + 11r2 .

(e) r3 → r3 + r4 and r4 → r4 + r3 .

(a) This operation swaps row 5 and row 3.

(b) This isn’t a valid row operation. You can’t add or subtract a number from the elements in a row.

(d) This isn’t a valid row operation, though you could accomplish it using two row operations: First, multiply
row 6 by 5; next, add 11 times row 2 to the new row 6.

(e) This is not a valid row operation. It’s actually two row operations, not one. The only row operation that
changes two rows at once is swapping two rows.

Matrices can be used to represent systems of linear equations. Row operations are intended to mimic
the algebraic operations you use to solve a system. Row-reduced echelon form corresponds to the “solved
form” of a system.
A matrix is in row reduced echelon form if the following conditions are satisfied:

(a) The first nonzero element in each row (if any) is a “1” (a leading entry).

(b) Each leading entry is the only nonzero element in its column.

(d) The leading entries form a “stairstep pattern” from northwest to southeast:

0 1 6 0 0 2 ...
 
0 0 0 1 0 −1 . . . 
0 0 0 0 1 4 ...
 
0 0 0 0 0 0 ...
 
.. .. .. .. .. ..
. . . . . .

In this matrix, the leading entries are in positions (1, 2), (2, 4), (3, 5), . . . .

2
Here are some matrices in row reduced echelon form. This is the 3 × 3 identity matrix:
 
1 0 0
0 1 0
0 0 1

The leading entries are in the (1, 1), (2, 2), and (3, 3) positions.
This matrix is in row reduced echelon form; its leading entries are in the (1, 1) and (2, 3) positions.
 
1 5 0 2
0 0 1 3
0 0 0 0

A leading entry must be the only nonzero number in its column. In this case, the “5” in the (1, 2)
position does not violate the definition, because it is not in the same column as a leading entry. Likewise for
the “2” and “3” in the fourth column.
This matrix has more rows than columns. It is in row reduced echelon form.
1 0
 
0 1
0 0
 
0 0
This row reduced echelon matrix has leading entries in the (1, 1), (2, 2), and (3, 4) positions.
1 0 6 0 2
 
 0 1 1 0 −3 
0 0 0 1 4
 
0 0 0 0 0
The nonzero numbers in the third and fifth columns don’t violate the definition, because they aren’t in
the same column as a leading entry.
A zero matrix is in row-reduced echelon form, though it won’t normally come up during a row reduc-
tion.  
0 0 0 0
0 0 0 0
0 0 0 0
Note that conditions (a), (b), and (d) of the definition are vacuously satisfied, since there are no leading
entries.
Just as you may have wondered why only certain operations are allowed as row operations, you might
wonder what row reduced echelon form is for. “Why do we want a matrix to look this way?” As with the
question about row operations, the rationale for row reduced echelon form involves solving systems of linear
equations. If you want, you can jump ahead to the section on solving systems of linear equations and see
how these questions are answered. In the rest of this section, I’ll focus on the process on row reducing a
matrix and leave the reasons for later.
Example. The following real number matrices are not in row reduced echelon form. In each case, explain
why.
 
1 0 0
(a)  0 7 0
0 0 1
 
1 0 −3
(b)  0 1 5 .
0 0 1

3
 
0 0 0 0
(c)  0 1 0 −1 .
0 0 1 9
 
1 37 2 −1
(d)  0 1 −3 0 .
0 0 0 0
 
0 1 7 10
(e)  1 0 4 −5 .
0 0 0 0

(a) The first nonzero element in row 2 is a “7”, rather than a “1”.

(b) The leading entry in row 3 is not the only nonzero element in its column.

(c) There is an all-zero row above a nonzero row.

(d) The leading entry in row 2 is not the only nonzero element in its column.

(e) The leading entries do not form a “stairstep pattern” from northwest to southeast.

Row reduction is the process of using row operations to transform a matrix into a row reduced echelon
matrix. As the algorithm proceeds, you move in stairstep fashion from “northwest” to “southeast” through
different positions in the matrix. In the description below, when I say that the current position is (i, j),
I mean that your current location is in row i and column j. The current position refers to a location,
not the element at that location (which I’ll sometimes call the current element or current entry). The
current row means the row of the matrix containing the current position and the current column means
the column of the matrix containing the current position.

Some notes:

1. There are many ways to arrange the algorithm. For instance, another approach gives the LU -
decomposition of a matrix.

2. Trying to learn to row reduce by following the steps below is pretty tedious, and most people will
want to learn by doing examples. The steps are there so that, as you’re learning to do this, you have some
idea of what to do if you get stuck. Skim the steps first, then move on to the examples; go back to the steps
if you get stuck.

3. As you gain experience, you may notice shortcuts you can take which don’t follow the steps below.
But you can get very confused if you focus on shortcuts before you’ve really absorbed the sense of the
algorithm. I think it’s better to learn to use a correct algorithm “by the book” first, the test being whether
you can reliably and accurately row reduce a matrix. Then you can consider using shortcuts.

4. The algorithm is set up so that if you stop in the middle of a row reduction — maybe you want to
take a break to have lunch — and forget where you were, you can restart the algorithm from the very start.
The algorithm will quickly take you through the steps you already did without redoing the computations and
leave you where you left off.

5. There’s no point in doing row reductions by hand forever, and for larger matrices (as would occur
in real world applications) it’s impractical. At some point, you’ll use a computer. However, I think it’s
important to do enough examples by hand that you understand the algorithm.

Algorithm: Row Reducing a Matrix

We’ll assume that we’re using a number system which is a field. Remember that this means that every
nonzero number has a multiplicative inverse — or equivalently, that you can divide by any nonzero number.

4
Step 1. Start with the current position at (1, 1).
Step 2. Test the element at the current position. If it’s nonzero, go to Step 2(a); if it’s 0, go to Step 2(b).
Step 2(a). If the element at the current position is nonzero, then:
(i) Divide all the elements in the current row by the current element. This makes the current element
1.
(ii) Add or subtract multiples of the current row from the other rows in the matrix so that all the
elements in the current column (except for the current element) are 0.
(iii) Move the current position to the next row and the next column — that is, move one step down
and one step to the right. If doing either of these things would take you out of the matrix, then stop: The
matrix is in row-reduced echelon form. Otherwise, return to the beginning of Step 2.
Step 2(b). If the element at the current position is 0, then look at the elements in the current column
below the current element. There are two possibilities.
(i) If all the elements below the current element are 0, then move the current position to the next
column (in the same row) — that is, move one step right. If doing this would take you out of the matrix,
then stop: The matrix is in row-reduced echelon form. Otherwise, return to the beginning of Step 2.
(ii) If some element below the current element is nonzero, then swap the current row and the row
containing the nonzero element. Then return to the beginning of Step 2.

Example. For each of the following real number matrices, assume that the current position is (1, 1). What
is the next step in the row-reduction algorithm?

(a)  
2 −4 0 8
 3 0 0 1
−1 0 1 1
(b)  
1 −2 0 4
 3 0 0 1
−1 0 1 1
(c)  
1 −2 0 4
 0 6 0 −11 
0 −2 1 5
(d)  
0 2 1 −1
 0 8 −11 4 
7 5 0 −2
(e)  
0 1 0 3
 0 1 5 17 
0 2 −3 0

(a) The element in the current position is 2, which is nonzero. So I divide the first row by 2 to make the
element 1:    
2 −4 0 8 1 −2 0 4
 3 →
0 0 1 1
 3 0 0 1
r1 → 2 r1
−1 0 1 1 −1 0 1 1

5
(b) The element in the current position is 1. There are nonzero elements below it in the first column, and I
want to turn those into zeros. To turn the 3 in row 2, column 1 into a 0, I need to subtract 3 times row 1
from row 2:    
1 −2 0 4 1 −2 0 4
→  0
 3 0 0 1 6 0 −11 
r2 → r2 − 3r1
−1 0 1 1 −1 0 1 1
(c) The element in the current position is 1, and it’s the only nonzero element in its column. So I move the
current position down one row and to the right one column. No row operation is performed, and the matrix
doesn’t change.    
1 −2 0 4 1 −2 0 4
 0 6 0 −11  to  0 6 0 −11 
0 −2 1 5 0 −2 1 5
(d) The element in the current position is 0. I look below it and see a nonzero element in the same column
in row 3. So I swap row 1 and row 3; the current position remains the same, and I return to the start of
Step 2.    
0 2 1 −1 7 5 0 −2
 0 →
8 −11 4   0 8 −11 4 
r1 ↔ r3
7 5 0 −2 0 2 1 −1
(e) The element in the current position is 0. There are no nonzero elements below it in the same column. I
don’t perform any row operations; I just move the current position to the next column (in the same row).
The matrix does not change.
   
0 1 0 3 0 1 0 3
 0 1 5 17  to 0 1 5 17 
0 2 −3 0 0 2 −3 0

There are two questions which arise with this algorithm:

1. Why does the algorithm terminate? (Could you ever get stuck and never finish?)

2. When the algorithm does terminate, why is the final matrix in row-reduced echelon form?

The first question is easy to answer. As you execute the algorithm, the current position moves through
the matrix in “stairstep” fashion:

The cases in Step 2 cover all the possibilities, and in each case, you perform a finite number of row
operations (no larger than the number of rows in the matrix, plus one) before you move the current position.
Since you’re always moving the current position to the right (Step 2(b)(i)) or to the right and down (Step
2(a)(iii)), and since the matrix has only finitely many rows and columns, you must eventually reach the edge
of the matrix and the algorithm will terminate.
As for the second question, I’ll give a very informal argument using the matrix with the “stairstep”
path pictured above.

6
First, if you moved the current position down and to the right, the previous current element was a 1,
and every other element in its column must be 0. In the matrix with the “stairstep” path I gave above, this
means that each spot where a curved arrow starts must be a 1, and all the other elements in the column
with a 1 must be 0. Hence, the matrix must look like this:
1 ∗ ∗ 0 0 ∗ 0 ∗
 
0 ∗ ∗ 1 0 ∗ 0 ∗
0 ∗ ∗ 0 1 ∗ 0 ∗
 
0 ∗ ∗ 0 0 ∗ 1 ∗
 
0 ∗ ∗ 0 0 ∗ 0 ∗
 
0 ∗ ∗ 0 0 ∗ 0 ∗
(The ∗’s stand for elements which I don’t know.)
Next, notice that if you moved the current position to the right (but not down), then the previous
current element and everything below it must have been 0. In terms of the picture, every spot where a right
arrow starts must be a 0, and all the elements below it must be 0. Now I know that the matrix looks like
this:
1 ∗ ∗ 0 0 ∗ 0 ∗
 
0 0 0 1 0 ∗ 0 ∗
0 0 0 0 1 ∗ 0 ∗
 
0 0 0 0 0 0 1 ∗
 
0 0 0 0 0 0 0 0
 
0 0 0 0 0 0 0 0
In either case, you don’t move your current position until the column you’re in is “fixed” — that is,
your column and everything to the left of it is in row reduced echelon form. When the algorithm terminates,
the whole matrix is in row reduced echelon form.
Notice that this matrix is in row-reduced echelon form.
Row reduction is a key algorithm in linear algebra, and you should work through enough examples so
that you understand how it works. So here we go with some full examples — try to do as much of the work
by yourself as possible. I will start by giving lots of details, but gradually give fewer so that this doesn’t
become too long and tedious.
Example. Row reduce the following matrix over R to row-reduced echelon form.
 
0 0 1 −1 −2
 2 −4 −2 4 18 
−1 2 3 −5 −16

I will follow the steps in the algorithm I described earlier. Since this is our first example, I’ll explain
the steps and the reasons for them in a lot of detail. This will make the solution a bit long!
We start out in the upper-left hand corner (row 1, column 1), where the entry is 0. According to the
algorithm, I look below that entry to see if I can find a nonzero entry in the same column. The entry in row
2, column 1 is 2, which is nonzero. So I swap row 1 and row 2:
   
0 0 1 −1 −2 2 −4 −2 4 18
 2 −4 −2 4 →
18   0 0 1 −1 −2 
r1 ↔ r2
−1 2 3 −5 −16 −1 2 3 −5 −16

Note that the current position hasn’t changed: It is still row 1, column 1. But the entry there is now
2, which is nonzero. The algorithm says that if the entry is nonzero but not equal to 1, then divide the row
by the entry. So I divide row 1 by 2:
   
2 −4 −2 4 18 → 1 −2 −1 2 9
 0 0 1 −1 −2  1  0 0 1 −1 −2 
r1 → r1
−1 2 3 −5 −16 2 −1 2 3 −5 −16

7
Now the element in the current position is 1 — it is what I want for the leading entry in each row.
The algorithm then tells me to look below and above the current entry in its column. Wherever I see a
nonzero entry above or below, I add or subtract a multiple of the row to make the nonzero entry 0. Most of
the row operations you do will be operations of this kind.
There are no entries above the 1 is row 1, column 1. Below it there is 0 in row 2 and −1 in row 3. The
0 in row 2 is already what I want. So I just have to fix the −1 in row 3. What multiple of 1 do you add to
−1 to get 0? Obviously, you just add (the “multiple” is 1). So I add row 1 to row 3:
   
1 −2 −1 2 9 1 −2 −1 2 9
 0 →
0 1 −1 −2  0 0 1 −1 −2 
r3 → r3 + r1
−1 2 3 −5 −16 0 0 2 −3 −7

Notice that row 3 changed but row 1 did not.

Let’s look at where we are. The current position is still row 1, column 1. The first column is “fixed”,
in the sense that it’s as it should be for row-reduced echelon form. So we move on, and the algorithm tells
me to move the current position one step down and one step to the right. That puts the current position at
row 2, column 2.
The element in row 2, column 2 is 0. So I’m in the situation I was in at the very start. I look below this
element in column 2 and try to find a nonzero element — but there is none. The element in row 3, column
2 is also 0. Note: I don’t look at the −2 in row 1, column 2 — I can only look below the current entry if it
is 0.
So I can’t do a row swap as I did earlier. The algorithm tells me that, in this case, I should just move
the current position one step to the right. Now it is row 2, column 3, and the current entry is 1.
Since the current entry is 1, I don’t need to divide row 2 by anything. I look below and above the
current entry; I see 2 below and −1 above. So I add or subtract multiple of row 2 to or from row 3 and row
1 to make the 2 and −1 into 0.
First, I’ll add row 2 to row 1 to make the −1 into 0. Next, I’ll subtract 2 times row 2 from row 3 to
make the 2 into 0.
     
1 −2 −1 2 9 1 −2 0 1 7 1 −2 0 1 7
0 0 → →
1 −1 −2   0 0 1 −1 −2   0 0 1 −1 −2 
r1 → r1 + r2 r3 → r3 − 2r2
0 0 2 −3 −7 0 0 2 −3 −7 0 0 0 −1 −3

You can do these in either order.

If you’re worried about the −2 in row 1, column 2, you don’t need to worry. It is not a problem because
it is not in the same column as a leading entry (the 1s in row 1, column 1 and row 2, colum n 3).
Column 3 is now done. The algorithm tells me to move down one step and right one step. This puts
the current position at row 3, column 4. The entry there is −1. So I divide row 3 by −1 (or multiply by −1,
which is the same thing):
   
1 −2 0 1 7 1 −2 0 1 7
 0 0 1 −1 −2  →  0 0 1 −1 −2 
r3 → −1r3
0 0 0 −1 −3 0 0 0 1 3

Finally, I use the 1 I just created in row 3, column 4 to “zero out” the two entries above it in column 4:
     
1 −2 0 1 7 1 −2 0 0 4 1 −2 0 0 4
 0 0 1 −1 → →
−2  0 0 1 −1 −2  0 0 1 0 1
r1 → r1 − r3 r2 → r2 + r3
0 0 0 1 3 0 0 0 1 3 0 0 0 1 3

The last matrix is the row-reduced echelon form.

As I noted at the start, the solution above is pretty long, but that’s only because I explained every step.
After the first few examples, I will just do the steps without explanation. And eventually, I’ll just give the
starting matrix and the row-reduced echelon matrix without showing the steps.

8
As you do row reduction problems, refer to the algorithm and examples if you forget what steps you
should do. With practice, you’ll find that you know what to do, and your biggest problem may be avoiding
arithmetic mistakes.
At times, you may see shortcuts you can take which don’t follow the algorithm exactly. I think you
should avoid shortcuts at the start, or you will not learn the algorithm properly. The algorithm is guaranteed
to work; shortcuts can lead you to go in circles. When you can consistently get the right answer, you can
take shortcuts where appropriate.
Example. Row reduce the following matrix to row-reduced echelon form over Z3 :
 
2 2 0 1 2
2 1 2 1 0
1 0 2 1 0

I start with the current position at row 1, column 1. The element there is 2, which is nonzero. If I were
working with real numbers, I would divide row 1 by 2. Since I’m working over Z3 , I have to multiply row 1
by 2−1 — multiplying by the inverse of 2 is the equivalent in this case to dividing by 2. Now

2 · 2 = 1 in Z3 .

This means that 2−1 = 2. So I multiply row 1 by 2:

   
2 2 0 1 2 1 1 0 2 1
2 →
1 2 1 0 2 1 2 1 0
r1 → 2r1
1 0 2 1 0 1 0 2 1 0

Now the element at the current position is 1, which is what I want for the leading entry in row 1. I use
it to clear out the numbers below it in the first column.
Since there is a 2 in row 2 and 2 + 1 = 0, I add row 1 to row 2. Since there is a 1 in row 3 and 1 + 2 ·1 = 0,
I add 2 times row 1 to row 3. Here’s what happens:
     
1 1 0 2 1 1 1 0 2 1 1 1 0 2 1
2 1 2 1 0 → 2 1 2 1 0 → 0 2 2 0 1
r3 → r3 + 2r1 r2 → r2 + r1
1 0 2 1 0 0 2 2 2 2 0 2 2 2 2

You can see that column 1 is now fixed: The leading entry 1 in row 1 is the only nonzero element in
the column. I move the current position down one row and to the right one column. The current position
is now at row 2, column 2. The entry there is 2, which is nonzero. As I did with the first row, I make turn
the 2 into 1 by multiplying row 2 by 2:
   
1 1 0 2 1 1 1 0 2 1
0 2 2 0 1 → 0 1 1 0 2
r2 → 2r2
0 2 2 2 2 0 2 2 2 2

Now I use the leading entry 1 in row 2, column 2 to clear out the second column. I need to add 2 times
row 2 to row 1, and I need to add row 2 to row 3. Here’s the work:
     
1 1 0 2 1 1 0 2 2 2 1 0 2 2 2
0 1 1 0 2 → 0 1 1 0 2 → 0 1 1 0 2
r1 → r1 + 2r2 r3 → r3 + r2
0 2 2 2 2 0 2 2 2 2 0 0 0 2 1

Column 2 is now fixed: The leading entry in row 2 is 1, and it is the only nonzero element in column 2.
I move the current position down one row and to the right one column. The current position is now at row
3, column 3 — but the element there is 0. I can’t make this into 1 by multiplying by the inverse, because 0
has no multiplicative inverse.

9
In this situation, the next thing you try is to look below the current position. If there is a nonzero
element in the same column in a lower row, you swap that row with the current row to get a nonzero element
into the current position. Unfortunately, we’re in the bottom row, so there are no rows below the current
position.
Note: You do not swap either of the rows above (that is, row 1 or row 2) with row 3. While that would
get a nonzero entry into the current position, swapping in those ways will mess up either the first or second
column.
The algorithm says that in this situation, we should move the current position one column right (staying
in the same row). The current position is now row 3, column 4. Fortunately, the element there is 2, which
is nonzero. As before, I can turn it into a 1 by multiplying row 3 by 2:
   
1 0 2 2 2 1 0 2 2 2
0 →
1 1 0 2 0 1 1 0 2
r3 → 2r3
0 0 0 2 1 0 0 0 1 2

Finally, I use the 1 in row 3, column 4 to clear out column 4. The only nonzero element above the
leading entry is the 2 in row 1. So I add row 3 to row 1:
   
1 0 2 2 2 1 0 2 0 1
0 →
1 1 0 2 0 1 1 0 2
r1 → r1 + r3
0 0 0 1 2 0 0 0 1 2

The last matrix is in row reduced echelon form.

While row reducing matrices over Zn requires a little more thought in doing the arithmetic, you can see
(for the examples we’ll do) that the arithmetic is pretty simple. You don’t have to worry about computations
with fractions or decimals, for instance.
Here are some additional examples. I’m not explaining the individual steps, though they are labelled.
If you’re convinced by now that you could do the arithmetic for the row operations yourself, you might want
to go through the examples by just thinking about what steps to do, and taking the arithmetic that is shown
for granted. Or perhaps do a few of the row operations by hand for yourself to practice.

Example. Row reduce the following matrix over R to row reduced echelon form:
 
1 1 −4 2
 3 4 −10 7 
−4 −1 22 5

I could have asked to row reduce over Q, the rational numbers, since all of the entries in the matrices
will be rational numbers.
Here’s the row reduction; check at least a few of the steps yourself.
     
1 1 −4 2 1 1 −4 2 1 1 −4 2
 3 → → →
4 −10 7  3 4 −10 7  0 1 2 1 
r3 → r3 + 4r1 r2 → r2 − 3r1 r1 → r1 − r2
−4 −1 22 5 0 3 6 13 0 3 6 13
     
1 0 −6 1 1 0 −6 1 → 1 0 −6 1
0 → 1 →
1 2 1  0 1 2 1  0 1 2 1
r3 → r3 − 3r2 r3 → r3 r1 → r1 − r3
0 3 6 13 0 0 0 10 10 0 0 0 1
   
1 0 −6 0 1 0 −6 0
0 1 2 1 → 0 1 2 0
r2 → r2 − r3
0 0 0 1 0 0 0 1

10
The next example is over Z5 . Remember that Z5 = {0, 1, 2, 3, 4} and you do all the arithmetic mod 5.
So, for example, 3 + 4 = 2 and 4 · 2 = 3. Be careful as you do problems to pay attention to the number
system!
Some calculators and math software have programs that do row reduction, but assume that the row
reduction is done with real numbers. If you use such a program on an example like the next one, you could
very well get the wrong answer (even if the answer looks like it could be right). Be careful if you’re using
such a program to check your answers. There is software which does row reduction in Zn , but you may have
to do some searching to find programs like that.
Example. Row reduce the following matrix over Z5 to row reduced echelon form:
 
1 0 1 4
3 1 0 1
2 4 4 1

Here are some reminders about arithmetic in Z5 :

1 + 4 = 0, 2 + 3 = 0,

2 · 3 = 1, 4 · 4 = 1.
Here’s the row reduction:
     
1 0 1 4 1 0 1 4 1 0 1 4
3 1 0 1 → 3 1 0 1 → 0 1 2 4 →
r3 → r3 + 3r1 r2 → r2 + 2r1 r3 → r3 + r2
2 4 4 1 0 4 2 3 0 4 2 3
     
1 0 1 4 1 0 1 4 1 0 0 1
0 1 2 4 → 0 1 2 4 → 0 1 2 4 →
r3 → 4r3 r1 → r1 + 4r3 r2 → r2 + 3r3
0 0 4 2 0 0 1 3 0 0 1 3
 
1 0 0 1
0 1 0 3
0 0 1 3
As an example, here’s the scratch work for the first step:

r3 = 2 4 4 1
3r1 = 3 0 3 2
r3 + 3r1 = 0 4 2 3

You may need to do this kind of scratch work as you do row reductions, particularly when you’re doing
arithmetic in Zn .
Time for a bit of honesty! You may have noticed that the examples (in earlier sections as well as this
one) have tended to have “nice numbers”. It’s intentional: The examples are constructed so that the numbers
are “nice”. If the numbers and the computations were ugly, it would make it harder to see the principles,
and it wouldn’t really add anything.
As a small example, here’s a row reduction with rational numbers:
   
1 −6 4 −2 1 −6 4 −2
 −1 −5 0 → →
4   −1 −5 0 4 
r3 → r3 − 2r1 r2 → r2 + r1
2 7 −3 1 0 19 −11 5

1 −6 4 −2
   
1 −6 4 −2 → 4 2  →
0 −11 4 2  1 
0 1 − − 
r2 → − r2  11 11 r1 → r1 + 6r2
0 19 −11 5 11 0 19 −11 5

11
 20 34 
 20 34  1 0 −
1 0 − 11 11 
 11 11  →

 4 2  →
 4 2  0 1 − −  11
0 1 −

−  r3 → r3 − 19r2
  11 r
11  3
 → − r3
11 11 
45 93 45
0 19 −11 5 0 0 −
11 11
 20 34   2 
1 0 − 1 0 0
11 11  3 
→ →
 

0 1 4 2  20 
 4 2  4
− −  → r3  0 1 − − 
 11 r
11  1
 r1 − 11 r
11  2
 → r2+ r3

31 11  31 11
0 0 1 − 0 0 1 −
15 15
 2 
1 0 0
 3 

0 14 
1 0 − 

 15 
31

0 0 1 −
15
You can see that the computations are getting messy even though the original matrix had entries which
were integers. Imagine having to add and reduce all those fractions! Then imagine what it would look like
if the starting matrix was
 53 11 
−117
 2 31   
 8 −7.193 81.4 −217.911
3 17 

 9
 or  31.1745 π 71.3
√
.
 1111 5  1001.7 −0.00013 2
3 213

153 −
4 7
The ugly computations would add nothing to your understanding of the idea of row reduction.
It’s true that matrices that come up in the real world often don’t have nice numbers. They might also
be very large: Matrices with hundreds of rows or columns (or more). In those cases, computers are used to
do the computations, and then other considerations (like accuracy or round-off) come up. You will see these
topics in courses in numerical analysis, for instance.
Row reduction is used in many places in linear algebra. Many of those uses involve solving systems of
linear equations, which we will look at next.

c 2021 by Bruce Ikenaga 12

Solving Systems of Linear Equations
Systems of linear equations occur frequently in math and in applications. I’ll explain what they
are, and then how to use row reduction to solve them.

Systems of linear equations

If a1 , a2 , . . . , an , b are numbers and x1 , x2 , . . . , xn are variables, a linear equation is an equation of
the form
a1 x1 + a2 x2 + · · · + an xn = b.
The a’s are referred to as the coefficients and b is the constant term. If the number of variables is
small, we may use unsubscripted variables (like x, y, z).
For the moment I won’t be specific about what number system we’re using; if you want, you can think
of some commutative ring with identity. In a particular situation, the number system will be specified.
And as usual, any equation which is algebraically equivalent to the one above is considered a linear
equation; for example,
a1 x1 + a2 x2 + · · · + an xn − b = 0.
But I will almost always write linear equations in the first form, with the variable terms on one side
and the constant term on the other.
A system of linear equations is a collection of linear equations

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
....
..
am1 x1 + am2 x2 + · · · + amn xn = bm

All the a’s and b’s are assumed to come from some number system.
For example, here are some systems of linear equations with coefficients in R:
 

x+y−z = 3
 x+y = 13 
5x − 2y = 7
2x + y = 10
−2x + y = 3 −x − y + 5z = −7 x−y = 5
 

(The curly braces around each system are just here to make them stand out; if there’s no chance of
confusion, they aren’t necessary.)
A solution to a system of linear equations is a set of values for the variables which makes all the
equations true. For example,

5x − 2y = 7
x = 13, y = 29 is a solution to .
−2x + y = 3

You can check this by plugging in:

5 · 13 − 2 · 29 = 65 − 58 = 7
−2 · 13 + 29 = −26 + 29 = 3

Note that we consider the pair (x, y) = (13, 29) a single solution to the system. In fact, it’s the only
solution to this system of equations.
Likewise, you can check that (x, y, z) = (1, 3, −1) and (x, y, z) = (2, 0, −1) are solutions to

x+y−z = 3

.
−x − y + 5z = −7

1
They aren’t the only solutions: In fact, this system has an infinite number of solutions.
This system has no solutions:  
 x + y = 13 
2x + y = 10
x−y = 5
 

Here’s one way to see this. Add the first and third equations to get 2x = 18, which means x = 9. Plug
x = 9 into the second equation 2x + y = 10 to get 18 + y = 10, or y = −8. But now if you plug x = 9,
y = −8 into the first equation x + y = 13, you get 9 + (−8) = 13, which is a contradiction.
What happened in these examples is typical of what happens in general: A system of linear equation
can have exactly one solution, many solutions, or no solutions.

Solving systems by row reduction

Row reduction can be used to solve systems of linear equations. In fact, row operations can be viewed
as ways of manipulating whole equations.
For example, here’s a system of linear equations over R:

x + 2y = −4
2x + 3y = 5

The row operation that swaps two rows of a matrix corresponds to swapping the equations. Here we
swap equations 1 and 2:

x + 2y = −4 → 2x + 3y = 5
2x + 3y = 5 r1 ↔ r2 x + 2y = −4

It may seems pointless to do this with equations, but we saw that swapping rows was sometimes needed
in reducing a matrix to row reduced echelon form. We’ll see the connection shortly.
The row operation that multiplies (or divides) a row by a number corresponds to multiplying (or
dividing) an equation by a number. For example, I can divide the second equation by 2:

x + 2y = −4
( )
x + 2y = −4 →
3 5
2x + 3y = 5 r2 → r2 /2 x + y =
2 2
The row operation that adds a multiple of one row to another corresponds to adding a multiple of one
equation to another. For example, I can add −2 times the first equation to the second equation (that is,
subtract 2 times the first equation from the second equation):

x + 2y = −4 → x + 2y = −4
2x + 3y = 5 r2 → r2 − 2r1 −y = 13

You can see that row operations certainly correspond to “legal” operations that you can perform on
systems of equations. You can even see in the last example how these operations might be used to solve a
system.
To make the connection with row reducing matrices, suppose we take our system and just write down
the coefficients:
x + 2y = −4 1 2 −4
→
2x + 3y = 5 2 3 5
This matrix is called the augmented matrix or auxiliary matrix for the system.
Watch what happens when I row reduce the matrix to row-reduced echelon form!

1 2 −4 → 1 2 −4 → 1 2 −4 → 1 0 22
2 3 5 r2 → r2 − 2r1 0 −1 13 r2 → −1r2 0 1 −13 r1 → r1 − 2r2 0 1 −13

2
Since row operations operate on rows, they will “preserve” the columns for the x-numbers, the y-
numbers, and the constants: The coefficients of x in the equations will always be in the first column,
regardless of what operations I perform, and so on for the y-coefficients and the constants. So at the end, I
can put the variables back to get equations, reversing what I did at the start:
x y constants

1 0 22 1·x + 0·y = 22
→
0 1 −13 0·x + 1·y = −13
That is, x = 22 and y = −13 — I solved the system!
Geometrically, the graphs of x + 2y = −4 and 2x + 3y = 5 are lines. Solving simultaneously amounts to
finding where the lines intersect — in this case, in the point (22, −13).

(I will often write solutions to systems in the form (x, y) = (22, −13), or even (22, −13). In the second
form, it’s understood that the variables occur in the same order in which they occurred in the equations.)
You can see several things from this example. First, we noted that row operations imitate the operations
you might perform on whole equations to solve a system. This explains why row operations are what they
are.
Second, you can see that having row reduced echelon form as the target of a row reduction corresponds
to producing equations with “all the variables solved for”. Think particularly of the requirements that
leading entries (the 1’s) must be the only nonzero elements in their columns, and the “stairstep” pattern of
the leading entries.
Third, by working with the coefficients of the variables and the constants, we save ourselves the trouble
of writing out the equations. We “get lazy” and only write exactly what we need to write to do the problem:
The matrix preserves the structure of the equations for us. (This idea of “detaching coefficients” is useful in
other areas of math.)
Let’s see some additional examples.
Example. Use row reduction to solve the following system of linear equations over R:
x + y + z = 5
x + 3y + z = 9
4x − y + z = −2

Write down the matrix for the system and row reduce it to row reduced echelon form:
     
1 1 1 5 1 1 1 5 1 1 1 5 →
1 3 1 9  → 1 3 → 1
1 9  0 2 0 4 
r3 → r3 − 4r1 r2 → r2 − r1 r2 → r2
4 −1 1 −2 0 −5 −3 −22 0 −5 −3 −22 2
     
1 1 1 5 1 0 1 3 1 0 1 3 →
0 1 → → 1
0 2  0 1 0 2  0 1 0 2 
r1 → r1 − r2 r3 → r3 + 5r2 r3 → − r3
0 −5 −3 −22 0 −5 −3 −22 0 0 −3 −12 3

3
   
1 0 1 3 1 0 0 −1
0 →
1 0 2 0 1 0 2 
r1 → r1 − r3
0 0 1 4 0 0 1 4
The last matrix represents the equations x = −1, y = 2, z = 4. So the solution is (−1, 2, 4).
In this case, the three equations represent planes in space, and the point (−1, 2, 4) is the intersection of
the three planes.

Everything works in similar fashion for systems of linear equations over a finite field. Here’s an example
over Z3 .
Example. Solve the following system of linear equations over Z3 :

x + 2y = 2
2x + y + z = 1
2x + z = 2

Remember that all the arithmetic is done in Z3 . So, for example,

1 + 2 = 0, 2 + 2 = 1, 2 · 2 = 1.

The last equation tells you that when you would think of dividing by 2, you multiply by 2 instead.
Write down the corresponding matrix and row reduce it to row reduced echelon form:
     
1 2 0 2 1 2 0 2 1 2 0 2
2 1 → → →
1 1 2 1 1 1 0 0 1 0
r3 → r3 + r1 r2 → r2 + r1 r2 ↔ r3
2 0 1 2 0 2 1 1 0 2 1 1
     
1 2 0 2 1 2 0 2 1 0 2 1
0 → → →
2 1 1 0 1 2 2 0 1 2 2
r2 → 2r2 r1 → r1 + r2 r1 → r1 + r3
0 0 1 0 0 0 1 0 0 0 1 0
   
1 0 0 1 1 0 0 1
0 1 2 →
2 0 1 0 2
r2 → r2 + r3
0 0 1 0 0 0 1 0
The last matrix represents the equations x = 1, y = 2, z = 0. Hence, the solution is (1, 2, 0).

It’s possible for a system of linear equations to have no solutions. Such a system is said to be incon-
sistent. You can tell a system of linear equations is inconsistent if at any point one of the equations gives
3
a contradiction, such as “0 = 13” or “0 = − ”.
2
4
Example. Solve the following system of equations over R:

3x + y + 3z = 2
x + 2z = −3
2x + y + z = 4

Form the coefficient matrix and row reduce:

   
3 1 3 2 1 0 2 0
1 0 2 −3  → 0 1 −3 0 
2 1 1 4 0 0 0 1

The corresponding equations are

x + 2z = 0
y − 3z = 0
0 = 1

The last equation says 0 = 1. Hence, the system has no solutions.

Note: If you’re row reducing to solve a system and you notice that a row of the current matrix would
give an equation that is a contradiction (like “0 = 1” or “0 = −5”), you can stop row reducing and conclude
that there is no solution; you don’t need to go all the way to row reduced echelon form.
Besides a single solution or no solutions, a system of linear equations can have many solutions. When
this happens, we’ll express the solutions in parametrized or parametric form. Here’s an example over
the real numbers.
Example. Solve the following system of linear equations over R:

a + 2b + d = 3
−a − 2b + c − 2e =
−2a − 4b + c − d − 2e = −5

Write down the corresponding matrix and row reduce:

   
1 2 0 1 0 3 1 2 0 1 0 3
 −1 −2 → →
1 0 −2 −2   −1 −2 1 0 −2 −2 
r3 → r3 + 2r1 r2 → r2 + r1
−2 −4 1 −1 −2 −5 0 0 1 1 −2 1
   
1 2 0 1 0 3 1 2 0 1 0 3
0 →
0 1 1 −2 1  0 0 1 1 −2 1 
r3 → r3 − r2
0 0 1 1 −2 1 0 0 0 0 0 0
The corresponding equations are

a + 2b + d = 3
c + d − 2e = 1

Notice that the variables are not solved for as particular numbers. This means that the system will have
more than one solution. In this case, we’ll write the solution in parametric form.
The variables a and c correspond to leading entries in the row reduced echelon matrix. b, d, and e are
called free variables. Solve for the leading entry variables a and c in terms of the free variables b, d, and e:

a = −2b − d + 3
c = −d + 2e + 1

5
Next, assign parameters to each of the free variables b, d, and e. Often variables like s, t, u, and v are
used as parameters. Try to pick variables “on the other side of the alphabet” from the original variables to
avoid confusion. I’ll use
b = s, d = t, e = u.
Plugging these into the equations for a and c gives

a = −2s − t + 3, c = −t + 2u + 1.

All together, the solution is

a = −2s − t + 3
b = s
c = −t + 2u + 1
d = t
e = u

(I lined up the variables so you can see the structure of the solution. You don’t need to do this in
problems for now, but it will be useful when we discuss how to find the basis for the null space of a matrix.)
Each assignment of numbers to the parameters s, t, and u produces a solution. For example, if s = 1,
t = 0, and u = 2,
a = (−2) · 1 − 0 + 3 = 1, c = −0 + 2 · 2 + 1 = 5.
The solution (in this case) is (a, b, c, d, e) = (1, 1, 5, 0, 2). Since you can assign any real number to each
of s, t, and u, there are infinitely many solutions.
Geometrically, the 3 original equations represented sets in 5-dimensional space R5 . The solution to the
system represents the intersection of those 3 sets. Since the solution has 3 parameters, the intersection is
some kind of 3-dimensional set. Since the sets are in 5-dimensional space, it would not be easy to draw a
picture!
Example. Solve the following system over Z5 :

w + x + y + 2z = 1
2x + 2y + z = 0
2w + 2y + z = 1

Write down the matrix for the system and row reduce:
     
1 1 1 2 1 1 1 1 2 1 1 1 1 2 1
0 → → →
2 2 1 0 0 2 2 1 0 0 1 1 3 0
r3 → r3 − 2r1 r2 → 3r2 r1 → r1 − r2
2 0 2 1 1 0 3 0 2 4 0 3 0 2 4
     
1 0 0 4 1 1 0 0 4 1 1 0 0 4 1
0 → → →
1 1 3 0 0 1 1 3 0 0 1 1 3 0
r3 → r3 − 3r2 r3 → 3r3 r2 → r2 − r3
0 3 0 2 4 0 0 2 3 4 0 0 1 4 2
 
1 0 0 4 1
0 1 0 4 3
0 0 1 4 2
Putting the variables back, the corresponding equations are

w + 4z = 1, x + 4z = 3, y + 4z = 2.

Solve for the leading entry variables w, x, and y in terms of the free variable z. Note that 4z + z = 0,
so I can solve by adding z to each of the 3 equations:

w = z + 1, x = z + 3, y = z + 2.

6
To write the solution in parametric form, assign a parameter to the “non-leading entry” variable z.
Thus, set z = t:
w = t + 1, x = t + 3, y = t + 2, z = t.
Remember that the number system is Z5 = {0, 1, 2, 3, 4}. So there are 5 possible values for the parameter
t, and each value of t gives a solution. For instance, if t = 1, I get

w = 2, x = 4, y = 3, z = 1.

It follows that this system of equations has 5 solutions.

Remark. A system of linear equations over the real numbers can have no solutions, a unique solution, or
infinitely many solutions. (That is, such a system can’t have exactly 3 solutions.)
On the other hand, if p is prime, a system of linear equations over Zp will have pk solutions, for k ≥ 0,
or no solutions. For example, a system over Z5 can have no solutions, one (that is, 50 ) solution, 5 solutions,
25 solutions, and so on.

Remember that if A is a square (n × n) matrix, the inverse A−1 of A is an n × n matrix such that

AA−1 = I and A−1 A = I.

Here I is the n × n identity matrix. Note that not every square matrix has an inverse.

Row reduction provides a systematic way to find the inverse of a matrix. The algorithm gives another
application of row reduction, so I’ll describe the algorithm and give an example. We’ll justify the algorithm
later.
To invert a matrix, adjoin a copy of the identity matrix to the original matrix. “Adjoin” means to place
a copy of the identity matrix of the same size as the original matrix on the right of the original matrix to
form a larger matrix (the augmented matrix).
Next, row reduce the augmented matrix. When the block corresponding the original matrix becomes
the identity, the block corresponding to the identity will have become the inverse.
Example. Invert the following matrix over R:

2 −1
.
5 −3

Place a copy of the 2 × 2 identity matrix on the right of the original matrix, then row reduce the
augmented matrix:

1 1
 

→
" 1 1 #
1 − 0
2 −1 1 0 1 1 − 0 → 2 2 →
2 2
 
5 −3 0 1 r1 → r1 r2 → r2 − 5r1
 1 5 
r2 → −2r2
2 5 −3 0 1 0 − − 1
2 2
" 1 1 #
→

1 − 0 1 1 0 3 −1
2 2 r1 → r1 + r2 0 1 5 −2
0 1 5 −2 2
You can see that the original matrix in the left-hand 2 × 2 block has turned into the 2 × 2 identity
matrix. The identity matrix that was in the right-hand 2 × 2 block has at the same time turned into the
inverse.
Therefore, the inverse is
3 −1
5 −2

7
You can check that

2 −1 3 −1 1 0 3 −1 2 −1 1 0
= and = .
5 −3 5 −2 0 1 5 −2 5 −3 0 1

Of course, there’s a formula for inverting a 2 × 2 matrix. But the procedure above works with (square)
matrices of any size. To explain why this algorithm works, I’ll need to examine the relationship between row
operations and inverses more closely.
You might have discussed lines and planes in 3 dimensions in a calculus course; if so, you probably
saw problems like the next example. The rule of thumb is: You can often find the intersection of things by
solving equations simultaneously.
Example. Find the line of intersection of the planes in R3 :

x + 2y − z = 4 and x − y − z = 3.

Think of the equations as a system of linear equations over R. Write down the coefficient matrix and
row reduce:
→

1 2 −1 4 → 1 2 −1 4 1
1 −1 −1 3 r2 → r2 − r1 0 −3 0 −1 r2 → − r2
3
10
 
1 2 −1 4 1 0 −1
" #
→ 3 
1

0 1 0 r1 → r1 − 2r2
 1 
3 0 1 0
3
The last matrix gives the equations
10 1
x−z = , y= .
3 3
10 1
Thus, x = z + and y = . Letting z = t, the parametric solution is
3 3
10 1
x= t+ , y= , z = t.
3 3
These are the parametric equations of the line of intersection.

The solution of simultaneous linear equations dates back 2000 years to the Jiuzhang Suanshu, a collection
of problems compiled in China. The systematic algebraic process of eliminating variables seems to be due to

8
Isaac Newton. Matrix notation developed in the second half of the 1800’s; what we call Gaussian elimination,
applied to matrices, was developed in the first half of the 1900’s by various mathematicians. Grcar [1] contains
a nice historical account.

[1] Joseph Grcar, Mathematics of Gaussian elimination, Notices of the American Mathematical Society,
58(6)(2011), 782–792.

c 2021 by Bruce Ikenaga 9

7-14-2021
Inverses and Elementary Matrices
Matrix inversion gives a method for solving some systems of equations. Suppose we have a system of n
linear equations in n variables:
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
an1 x1 + an2 x2 + · · · + ann xn = bn
Let      
a11 a12 ... a1n x1 b1
 a21 a22 ... a2n   x2   b2 
A=
 ... .. .. ..  , x=
 ...  ,
 b=
 ...  .

. . . 
an1 an2 . . . ann xn bn
The system can then be written in matrix form:

Ax = b.

(One reason for using matrix notation is that it saves writing!) If A has an inverse A−1 , I can multiply
both sides by A−1 :
A−1 Ax = A−1 b
I · x = A−1 b
x = A−1 b
I’ve solved for the vector x of variables.
Not every matrix has an inverse — an obvious example is the zero matrix, but here’s a nonzero non-
invertible matrix over the real numbers:
1 1
.
0 0

a b
Suppose this matrix had an inverse . Then
c d

1 1 a b 1 0 a+c b+d 1 0
= , so = .
0 0 c d 0 1 0 0 0 1

But equating entries in row 2, column 2 gives the contradiction 0 = 1. Hence, the original matrix does
not have an inverse.
If we want to know whether a matrix has an inverse, we could try to do what we did in this example —
set up equations and see if we can solve them. But you can see that it could be pretty tedious if the matrix
was large or the entries were messy. And we saw earlier that you can solve systems of linear equations using
row reduction.
In this section, we’ll see how you can use row reduction to determine whether a matrix has an inverse
— and, if it does, how to find the inverse. We’ll begin by explaining the connection between elementary row
operations and matrices.
Definition. An elementary matrix is a matrix which represents an elementary row operation. “Repre-
sents” means that multiplying on the left by the elementary matrix performs the row operation.
Here are the elementary matrices that represent our three types of row operations. In the pictures
below, the elements that are not shown are the same as those in the identity matrix. In particular, all of the

1
elements that are not on the main diagonal are 0, and all the main diagonal entries — except those shown
— are 1.
Multiplying by this matrix swaps rows i and j:

i j
 
1 0
0 1 
 .. 
 . 
 
 
i  0 ··· 1 
 .. .. 

 . . 

j 
 1 ··· 0 

 .. 

 . 

 1 0
0 1

The “i” and “j” on the borders of the matrix label rows and columns so you can see where the elements
are.
This is the same as the identity matrix, except that rows i and j have been swapped. In fact, you obtain
this matrix by applying the row operation (“swap rows i and j”) to the identity matrix. This is true for our
other elementary matrices.
Multiplying by this matrix multiplies row i by the number c:

i
 
1 0
0 1 
 .. 
 . 
 
 
i  c 
 .. 

 . 

 1 0
0 1

This is the same as the identity matrix, except that row i has been multiplied by c. Note that this is
only a valid operation if the number c has a multiplicative inverse. For instance, if we’re working over the
real numbers, c can be any nonzero number.
Multiplying by this matrix replaces row i with row i plus c times row j.

i j
 
1 0
0 1 
 .. 
 . 
 
 
i  1 ··· c 
 .. .. 

 . . 

j 
 0 ··· 1 

 .. 

 . 

 1 0
0 1

To get this matrix, apply the operation “add c times row j to row i” to the identity matrix.
While we could give formal proofs that these matrices do what we want — we would have to write
formulas for elements of the matrices, then use the definition of matrix multiplication — I don’t think the

2
proofs would be very enlightening. It’s good, however, to visualize the multiplications for yourself to see why
these matrices work: Take rows of the elementary matrix and picture them multiplying rows of the original
matrix. For example, consider the elementary matrix that swaps row i and row j.

i j
 
1 0
0 1 
 .. 
 . 
 
 
i  0 ··· 1 
 .. .. 

 . . 

j 
 1 ··· 0 

 .. 

 . 

 1 0
0 1

When you multiply the original matrix by row FOO of this matrix, you get row FOO of the product.
So multiplying the original matrix by first row of this matrix gives the first row of the product, and so on.
Let’s look at what happens when you multiply the original matrix by row i of this matrix.
j-th position

row i of the
elementary [ 0 0 ... 0 1 ... 0 ]
matrix

0 row 1
0 row 2
...

...

0
j-th position 1 row j
...
...

0 row m

original matrix

Row i has 0’s everywhere except for a 1 in the j th position. So when it multiplies the original matrix,
all the rows of the original matrix get multiplied by 0, except for the j th row, which is multiplied by 1. The
net result is the j th row of the original matrix. Thus, the ith row of the product is the j th row of the original
matrix.
If you picture this process one row at a time, you’ll see that the original matrix is replaced with the
same matrix with the i and j rows swapped.
Let’s try some examples.
This elementary matrix should swap rows 2 and 3 in a 3 × 3 matrix:
 
1 0 0
0 0 1
0 1 0

Notice that it’s the identity matrix with rows 2 and 3 swapped.
Multiply a 3 × 3 matrix by it on the left:
    
1 0 0 a b c a b c
0 0 1d e f  = g h i .
0 1 0 g h i d e f

Rows 2 and 3 were swapped — it worked!

3
This elementary matrix should multiply row 2 of a 2 × 2 matrix by 13:

1 0
0 13
Notice that it’s the identity matrix with row 2 multiplied by 13. (We’ll assume that we’re in a number
system where 13 is invertible.)
Multiply a 2 × 2 matrix by it on the left:

1 0 a b a b
= .
0 13 c d 13c 13d
Row 2 orf the original matrix was multiplied by 13.
This elementary matrix should add 5 times row 1 to row 3:
 
1 0 0
0 1 0
5 0 1
Notice that it’s the identity matrix with 5 times row 1 added to row 3.
Multiply a 3 × 3 matrix by it on the left:
    
1 0 0 a b c a b c
0 1 0d e f  =  d e f .
5 0 1 g h i g + 5a h + 5b i + 5c
You can see that 5 times row 1 was added to row 3.
The inverses of these matrices are, not surprisingly, the elementary matrices which represent the inverse
row operations. The inverse of a row operation is the row operation which undoes the original row operation.
Let’s look at the three operations in turn.
The inverse of swapping rows i and j is swapping rows i and j — to undo swapping two things, you
swap the two things back! So the inverse of the “swap row i and j” elementary matrix the the same matrix:
i j
 
1 0
0 1 
 .. 
 . 
 
 
i  0 ··· 1 
 .. .. 

 . . 

j 
 1 ··· 0 

 .. 

 . 

 1 0
0 1
The inverse of multiplying row i by c is dividing row i by c. To account for the fact that a number
system might not have “division”, I’ll say “multiplying row i by c−1 ”. Just take the original “multiply row
i by c” elementary matrix above and replace c with c−1 :
i
 
1 0
0 1 
 .. 
 . 
 
 
i  c−1 
 .. 

 . 

 1 0
0 1

4
The inverse of adding c times row j to row i is subtracting c times row j from row i. To write down the
inverse, I just replace c with −c in the matrix for “row i goes to row i plus c times row j”:

i j
 
1 0
0 1 
 .. 

 . 

 
i  1 · · · −c 
 .. .. 

 . . 

j 
 0 ··· 1 

 .. 

 . 

 1 0
0 1

Example. In each case, tell what row operation is performed by multiplying on the left by the elementary
matrix. Then find the inverse of the elementary matrix, and tell what row operation is performed by
multiplying on the left by the inverse. (All the matrices are real number matrices.)
 
1 0 2
(a)  0 1 0 
0 0 1
 
0 1 0
(b)  1 0 0 
0 0 1
 
1 0 0
(c)  0 17 0 
0 0 1
(a) Multiplying on the left by
 
1 0 2
0 1 0 adds 2 times row 3 to row 1.
0 0 1

The inverse
 −1  
1 0 2 1 0 −2
0 1 0 = 0 1 0  subtracts 2 times row 3 from row 1.
0 0 1 0 0 1

(b) Multiplying on the left by  

0 1 0
1 0 0 swaps row 1 and row 2.
0 0 1
The inverse  −1  
0 1 0 0 1 0
1 0 0 = 1 0 0 swaps row 1 and row 1.
0 0 1 0 0 1
(c) Multiplying on the left by  
1 0 0
0 17 0  multiplies row 2 by 17.
0 0 1

5
The inverse
 −1  1 0 0

1 0 0
 0 17 0  =  1 
0 0 divides row 2 by 17.
17
0 0 1 0 0 1

Definition. Matrices A and B are row equivalent if A can be transformed to B by a finite sequence of
elementary row operations.

Of course, row equivalent matrices must have the same dimensions.

Remark. Since row operations may be performed by multiplying by elementary matrices, A and B are row
equivalent if and only if there are elementary matrices E1 , . . . , En such that

E1 · · · En A = B.

Lemma. Row equivalence is an equivalence relation.

Proof. Let’s recall what it means (in this situation) to be an equivalence relation. I have to show three
things:

(a) (Reflexivity) Every matrix is row equivalent to itself.

(b) (Symmetry) If A row reduces to B, then B row reduces to A.

(a) is obvious, since I can row reduce a matrix to itself by performing the identity row operation.

For (b), suppose A row reduces to B. Then there are elementary matrices E1 , . . . En such that

E1 · · · En A = B.

Hence,
A = En−1 · · · E1−1 B.
Since the inverse of an elementary matrix is an elementary matrix, each Ei−1 is an elementary matrix.
This equation gives a sequence of row operations which row reduces B to A.

To prove (c), suppose A row reduces to B and B row reduces to C. Then there are elementary matrices
E1 , . . . , Em and F1 , . . . , Fn such that

E1 · · · Em A = B and F1 · · · Fn B = C.

Hence,
F1 · · · Fn E1 · · · Em A = C.
This equation gives a sequence of row operations which row reduces A to C.
Therefore, row equivalence is an equivalence relation.

Let’s recall the definition of invertibility and the inverse of a matrix.

Definition. An n × n matrix A is invertible if there is an n × n matrix B such that AB = BA = I, where

I is the n × n identity matrix.

In this case, B is the inverse of A (or A is the inverse of B), and we write A−1 for B (or B −1 for A).

We use the usual notation for integer powers of a square matrix.

6
Notation. If A is a square matrix, then
 n times

 z }| {
A· A···A
 if n > 0
An = I if n = 0 .

 n times

z }| {
A−1 · A−1 · · · A−1 if n < 0
Note that An for n < 0 only makes sense if A is invertible.
The usual rules for powers hold:
(a) Am An = Am+n .
(b) (Am )n = Amn .
The proofs involve induction and taking cases. I don’t think they’re that enlightening, so I will skip
them.
Example. Consider the real matrix
3 1
A= .
2 0
Compute A2 and A−2 .

2 3 1 3 1 11 3
A = A·A = = .
2 0 2 0 6 2
Using the formula for the inverse of a 2 × 2 matrix,

−1 1 0 −1 0 0.5
A = = .
3 · 0 − 2 · 1 −2 3 1 −1.5

Therefore,
0 0.5 0 0.5 0.5 −0.75
A−2 = A−1 · A−1 = = .
1 −1.5 1 −1.5 −1.5 2.75

Remember that matrix multiplication isn’t commutative, so AB is not necessarily equal to BA. So
what is (AB)2 ? Since X 2 is shorthand for X · X, and in this case X = AB, the best we can say is that

(AB)2 = ABAB.

Example. Give a specific example of two 2 × 2 real matrices A and B for which

(AB)2 6= A2 B 2 .

How should I construct this counterexample? On the one hand, I want to avoid matrices which are “too
special”, because I might accidentally get a case where the equation holds. (The example in this problem
just shows that the equation “(AB)2 = A2 B 2 ” isn’t always true; this doesn’t mean that it’s never true.) For
instance, I should not take A to be the identity matrix or the zero matrix (in which case the equation would
actually be true).
On the other hand, I’d like to keep the matrices simple — first, so that I don’t struggle to do the
computation, and second, so that a reader can easily see that the computation works. For instance, this
would be a bad idea: " #
−1.378
√ π
A= 2 .
1013
171
7
When you’re trying to construct a counterexample, try to keep these ideas in mind. In the end, you
may have to make a few trials to get a suitable counterexample — you will not know whether something
works without trying.
I will take
1 0 0 1
A= and B = .
1 1 1 0
Following the ideas above, I tried to make matrices which were simple without being too special.
Then
2 1 0 2 1 0 2 2 1 0
A = and B = so A B = .
2 1 0 1 2 1
On the other hand,
0 1 2 1 1
AB = so (AB) = .
1 1 1 2
So for these two matrices, A2 B 2 6= (AB)2 .
Proposition.
(a) If A and B are invertible n × n matrices, then AB is invertible, and its inverse is given by

(AB)−1 = B −1 A−1 .

(b) If A is invertible, then AT is invertible, and its inverse is given by

(AT )−1 = (A−1 )T .

Proof. (a) Remember that AB is not necessarily equal to BA, since matrix multiplication is not necessarily
commutative.
I have
(B −1 A−1 )(AB) = B −1 IB = B −1 B = I,
(AB)(B −1 A−1 ) = AIA−1 = AA−1 = I.
Since B −1 A−1 gives the identity when multiplied by AB, it means that B −1 A−1 must be the inverse of
AB — that is, (AB)−1 = B −1 A−1 .
(b) I have
T T
AT (A−1 )T = A−1 A = I T = I and (A−1 )T AT = AA−1 = I T = I.
Since (A−1 )T gives the identity when multiplied by AT , it means that (A−1 )T must be the inverse of
A — that is, (AT )−1 = (A−1 )T .
T

Remark. Look over the proofs of the two parts of the last proposition and be sure you understand why the
computations proved the things that were to be proved. The idea is that the inverse of a matrix is defined
by a property, not by appearance. By analogy, it is like the difference between the set of mathematicians (a
set defined by a property) and the set of people with purple hair (a set defined by appearance).
A matrix C is the inverse of a matrix D if it has the property that multiplying C by D (in both orders)
gives the identity I. So to check whether a matrix C really is the inverse of D, you multiply C by D (in
both orders) any see whether you get I.
Example. Suppose that A and B are n × n invertible matrices. Simplify the following expression:

(AB)−2 A (BA)2 A.

Note: (AB)−2 is not A−2 B −2 , nor is it B −2 A−2 .

8
(AB)−2 A(BA)2 A = [(AB)2 ]−1 A(BA)2 A
= (ABAB)−1 A(BABA)A
= B −1 A−1 B −1 A−1 ABABAA
= A2

Example. (Solving a matrix equation) Solve the following matrix equation for X, assuming that A and
B are invertible:
A2 X B A = A B.

A2 XBA = AB
A−2 A2 XBA = A−2 AB
XBA = A−1 B
XBAA−1 = A−1 BA−1
XB = A−1 BA−1
XBB −1 = A−1 BA−1 B −1
X = A−1 BA−1 B −1
Notice that I can multiply both sides of a matrix equation by the same thing, but I must multiply on
the same side of both sides. So when I multiplied by A−2 , I had to put A−2 on the left side of both sides of
the equation.
Once again, the reason I have to be careful is that in general, M N 6= N M — matrix multiplication is
not commutative.

Example. Give a specific example of two invertible 2 × 2 real matrices A and B for which

(A + B)−1 6= A−1 + B −1 .

One way to get a counterexample is to choose A and B so that A + B isn’t invertible. For instance,

1 0 −1 0
A= and B = .
0 1 0 −1

Then A and B happen to be equal to their inverses:

1 0 −1 0
A−1 = and B −1 = .
0 1 0 −1

But
0 0
A+B = .
0 0
The zero matrix is not invertible, because 0 · C = 0 for any matrix C — so for no matrix C can 0 · C = I.

But this feels like “cheating”, because the left side (A + B)−1 of the equation isn’t defined. Okay —
can we find two matrices A and B for which (A + B)−1 6= A−1 + B −1 and both sides of the equation are
defined? Thus, we need A, B, and A + B to all be invertible.
I’ll use
1 0 0 1
A= and B = .
1 1 1 0

9
Then
−1 1 0 −1 0 1
A = and B = .
−1 1 1 0
(You can find the inverses using the formula for the inverse of a 2 × 2 matrix which I gave when I
discussed matrix arithmetic. You can also find the inverses using row reduction.)
Thus,
1 1
A−1 + B −1 = .
0 1
On the other hand,
1 1 −1 −1 1
A+B = so (A + B) = .
2 1 2 −1
Thus, (A + B)−1 6= A−1 + B −1 even though both sides of the equation are defined.
The next result connects several of the ideas we’ve looked at: Row reduction, elementary matrices,
invertibility, and solving systems of equations.
Theorem. Let A be an n × n matrix. The following are equivalent:
(a) A is row equivalent to I.
(b) A is a product of elementary matrices.
(c) A is invertible.
(d) The only solution to the following system is the vector x = 0:

Ax = 0.

(e) For any n-dimensional vector b, the following system has a unique solution:

Ax = b.

Proof. When you are trying to prove several statements are equivalent, you must prove that if you assume
any one of the statements, you can prove any of the others. I can do this here by proving that (a) implies
(b), (b) implies (c), (c) implies (d), (d) implies (e), and (e) implies (a).
(a) ⇒ (b): Let E1 , . . . , Ep be elementary matrices which row reduce A to I:

E1 · · · Ep A = I.

Then
A = Ep−1 · · · E1−1 .
Since the inverse of an elementary matrix is an elementary matrix, A is a product of elementary matrices.
(b) ⇒ (c): Write A as a product of elementary matrices:

A = F1 · · · Fq .

Now
F1 · · · Fq · Fq−1 · · · F1−1 = I,
Fq−1 · · · F1−1 · F1 · · · Fq = I.
Hence,
A−1 = Fq−1 · · · F1−1 .

10
(c) ⇒ (d): Suppose A is invertible. The system Ax = 0 has at least one solution, namely x = 0.
Moreover, if y is any other solution, then

Ay = 0, so A−1 Ay = A−1 0, or y = 0.

That is, 0 is the one and only solution to the system.

(d) ⇒ (e): Suppose the only solution to Ax = 0 is x = 0. If A = (aij ), this means that row reducing the
augmented matrix
   
a11 a12 ··· a1n 0 1 0 ··· 0 0
 a21 a22 ··· a2n 0 0 1 ··· 0 0
 . .. .. ..  produces .
 .. .. . . . ..  .
 .. . . .  . . .. .
an1 an2 · · · ann 0 0 0 ··· 1 0

Ignoring the last column (which never changes), this means there is a sequence of row operations E1 ,
. . . , En which reduces A to the identity I — that is, A is row equivalent to I. (I’ve actually proved (d) ⇒
(a) at this point.)
Let b = hb1 , . . . bn i be an arbitrary n-dimensional vector. Then
  1 0 · · · 0 b′1 
a11 a12 ··· a1n b1
a
 21 a22 ··· a2n b2   0 1 · · · 0 b′2 
E1 · · · En 
 ... .. .. .. =. .
 . . .. . . .
. . . . . . .. .. 
an1 an2 · · · ann bn 0 0 · · · 1 b′n

Thus, z = hb′1 , . . . b′n i is a solution.

Suppose y is another solution to Ax = b. Then

A(y − z) = Ay − Az = b − b = 0.

Therefore, y − z is a solution to Ax = 0. But the only solution to Ax = 0 is 0, so y − z = 0, or y = z.

Thus, z = hb′1 , . . . b′n i is the unique solution to Ax = b.
(e) ⇒ (a): Suppose Ax = b has a unique solution for every b. As a special case, Ax = 0 has a unique solution
(namely x = 0). But arguing as I did in (d) ⇒ (e), I can show that A row reduces to I, and that is (a).
Remark. We’ll see that there are many other conditions that are equivalent to a matrix being invertible. For
instance, when we study determinants, we’ll find that a matrix is invertible if and only if its determinant
is invertible as a number.

If A is invertible, the theorem implies that A can be written as a product of elementary matrices. To
do this, row reduce A to the identity, keeping track of the row operations you’re using. Write each row
operation as an elementary matrix, and express the row reduction as a matrix multiplication. Finally, solve
the resulting equation for A.
Example. (Writing an invertible matrix as a product of elementary matrices) Express the following
real matrix as a product of elementary matrices:

2 −4
A= .
−2 3

First, row reduce A to I:

2 −4 → 1 −2 →
−2 3 r1 → r1 /2 −2 3 r2 → r2 + 2r1

11

1 −2 → 1 0
0 1 r1 → r1 + 2r2 0 1
Next, represent each row operation as an elementary matrix:
"1 #
1 0
r1 → r1 corresponds to 2 ,
2 0 1

1 0
r2 → r2 + 2r1 corresponds to ,
2 1

1 2
r1 → r1 + 2r2 corresponds to .
0 1
Using the elementary matrices, write the row reduction as a matrix multiplication. A must be multiplied
on the left by the elementary matrices in the order in which the operations were performed.
" 1 #
1 2 1 0 0
2 · A = I.
0 1 2 1 0 1

Solve the last equation for A, being careful to get the inverses in the right order:
"1 #−1 −1 −1 " 1 # "1 #−1 −1 −1
0 1 0 1 2 1 2 1 0 0 0 1 0 1 2
2 2 ·A= 2 · I,
0 1 2 1 0 1 0 1 2 1 0 1 0 1 2 1 0 1

"1 #−1 −1 −1

0 1 0 1 2
A= 2 .
0 1 2 1 0 1

Finally, write each inverse as an elementary matrix.

2 0 1 0 1 −2
A= .
0 1 −2 1 0 1

You can check your answer by multiplying the matrices on the right.
For two n × n matrices A and B to be inverses, I must have

AB = I and BA = I.

Since in general AB need not equal BA, it seems as though I must check that both equations hold in
order to show that A and B are inverses. It turns out that, thanks to the earlier theorem, we only need to
check that AB = I.
Corollary. If A and B are n × n matrices and AB = I, then A = B −1 and BA = I.
Proof. Suppose A and B are n × n matrices and AB = I. The system Bx = 0 certainly has x = 0 as a
solution. I’ll show it’s the only solution.
Suppose y is another solution, so
By = 0.
Multiply both sides by A and simplify:

ABy = A · 0
Iy = 0
y=0

12
Thus, 0 is a solution, and it’s the only solution.
Thus, B satisfies condition (d) of the Theorem. Since the five conditions are equivalent, B also satisfies
condition (c), so B is invertible. Let B −1 be the inverse of B. Then

AB = I
ABB −1 = IB −1
AI = B −1
A = B −1

This proves the first part of the Corollary. Finally,

BA = BB −1 = I.

This finishes the proof.

The proof of the theorem gives an algorithm for inverting a matrix A.
If A is invertible, there are elementary matrices E1 , . . . , Ep which row reduce A to I:

E1 · · · Ep A = I.

But this equation says that E1 · · · Ep is the inverse of A, since multiplying A by E1 · · · Ep gives the
identity. Thus,
A−1 = E1 · · · Ep = E1 · · · Ep · I.
We can interpret the last expression E1 · · · Ep · I as applying the row operations for E1 , . . . Ep to the
identity matrix I. And the same row operations row reduce A to I. So form an augmented matrix by placing
the identity matrix next to A:

A I

augmented matrix

Row reduce the augmented matrix. The left-hand block A will row reduce to the identity; at the same
time, the right-hand block I will be transformed into A−1 .
Example. Invert the following matrix over R:
 
1 2 −1
 −1 −1 3  .
0 1 1

Form the augmented matrix by putting the 3 × 3 identity matrix on the right of the original matrix:
 
1 2 −1 1 0 0
 −1 −1 3 0 1 0.
0 1 1 0 0 1

Next, row reduce the augmented matrix. The row operations are entirely determined by the block
on the left, which is the original matrix. The row operations turn the left block into the identity, while
simultaneously turning the identity in the right block into the inverse.
   
1 2 −1 1 0 0 1 2 −1 1 0 0
 −1 −1 3 0 1 0  → 0 1 2 1 →
1 0
r2 → r2 + r1 r3 → r3 − r2
0 1 1 0 0 1 0 1 1 0 0 1

13
   
1 2 −1 1 0 0 1 0 −5 −1 −2 0
0 1 → →
2 11 0 0 1 2 1 1 0
r1 → r1 − 2r2 r3 → −r3
0 0 −1 −1
−1 1 0 0 −1 −1 −1 1
   
1 0 −5 −1 −2 0 1 0 0 4 3 −5
0 1 → →
2 1 1 0  0 1 2 1 1 0 
r1 → r1 + 5r3 r2 → r2 − 2r3
0 0 1 1 1 −1 0 0 1 1 1 −1
 
1 0 0 4 3 −5
0 1 0 −1 −1 2  .
0 0 1 1 1 −1
Thus,
 −1  
1 2 −1 4 3 −5
 −1 −1 3  =  −1 −1 2  .
0 1 1 1 1 −1

In the future, I won’t draw the vertical bar between the two blocks; you can draw it if it helps you keep
the computation organized.
Example. (Inverting a matrix over Zp ) Find the inverse of the following matrix over Z3 :
 
1 0 2
1 1 1.
2 1 1

Form the augmented matrix and row reduce:

   
1 0 2 1 0 0 1 0 2 1 0 0
1 1 1 0 1 0 → 0 →
1 2 2 1 0
r2 → r2 − r1 r3 → r3 − 2r1
2 1 1 0 0 1 2 1 1 0 0 1
   
1 0 2 1 0 0 1 0 2 1 0 0
0 1 → →
2 2 1 0 0 1 2 2 1 0
r3 → r3 − r2 r1 → r1 − 2r3
0 1 0 1 0 1 0 0 1 2 2 1
   
1 0 0 0 2 1 1 0 0 0 2 1
0 1 →
2 2 1 0 0 1 0 1 0 1
r2 → r2 − 2r3
0 0 1 2 2 1 0 0 1 2 2 1
Therefore,
 −1  
1 0 2 0 2 1
1 1 1 = 1 0 1.
2 1 1 2 2 1

The theorem also tells us about the number of solutions to a system of linear equations.
Proposition. Let F be a field, and let Ax = b be a system of linear equations over F . Then:
(a) If F is infinite, then the system has either no solutions, exactly one solution, or infinitely many
solutions.
(b) If F is a finite field with pn elements, where p is prime and n ≥ 1, then the system has either no
solutions, exactly one solution, or at least pn solutions.
Proof. Suppose the system has more than one solution. I must show that there are infinitely many solutions
if F is infinite, or at least pn solutions if F is a finite field with pn elements.

14
Since there is more than one solution, there are at least two different solutions. So let x1 and x2 be two
different solutions to Ax = b:
Ax1 = b and Ax2 = b.
Subtracting the equations gives

A(x1 − x2 ) = Ax1 − Ax2 = b − b = 0.

Since x1 − x2 6= 0, x1 − x2 is a nontrivial solution to the system Ax = 0. Now if t ∈ F ,

A (x1 + t(x1 − x2 )) = Ax1 + t · A(x1 − x2 ) = b + 0 = b.

Thus, x1 + t(x1 − x2 ) is a solution to Ax = b. Moreover, the only way two solutions of the form
x1 + t(x1 − x2 ) can be the same is if they have the same t. For

x1 + t(x1 − x2 ) = x1 + t′ (x1 − x2 ) gives t(x1 − x2 ) = t′ (x1 − x2 ).

Since x1 6= x2 , I have x1 − x2 6= 0. So I can divide both sides by x1 − x2 , and I get t = t′ .

Thus, different t’s give different x1 + t(x1 − x2 )’s, each of which is a solution to the system.
If F has infinitely many elements, there are infinitely many possibilities for t, so there are infinitely
many solutions.
If F has pn elements, there are pn possibilities for t, so there are at least pn solutions. (Note that there
may be solutions which are not of the form x1 + t(x1 − x2 ), so there may be more than pn solutions. In fact,
I’ll be able to show later than the number of solutions will be some power of pn .)
For example, since R is an infinite field, a system of linear equations over R has no solutions, exactly
one solution, or infinitely many solutions.
On the other hand, since Z3 is a field with 3 elements, a system of linear equations over Z3 has no
solutions, exactly one solution, or at least 3 solutions. (And I’ll see later that if there’s more than one
solution, then there might be 3 solutions, 9 solutions, 27 solutions, . . . .)

c 2021 by Bruce Ikenaga 15

11-10-2021
Determinants - Axioms
Determinants are functions which take matrices as inputs and produce numbers. They are of enormous
importance in linear algebra, but perhaps you’ve also seen them in other courses. They’re used to define the
cross product of two 3-dimensional vectors. They appear in Jacobians which occur in the change-of-
variables formula for multiple integrals.
Determinants take as inputs n × n (square) matrices with entries in R, where R is a commutative ring
with identity. The set of such matrices is denoted M (n, R). (I will be careful to prove everything for a
commutative ring with identity — but for many things, you can pretend that R is just the real numbers R
if it helps you to understand.)
In this section, I’ll define determinants as functions satisfying three axioms. Mathematicians often
proceed in this way: Define an object by identifying properties which characterize it, rather than simply
writing down a formula.
Definition. A determinant function is a function D : M (n, R) → R which satisfies the following axioms:
1. D is a linear function in each row. That is, if a ∈ R and x, y ∈ Rn ,

← r1 → ← r1 → ← r1 →
     
.. .. ..
. . .
     
     
D  ← ax + y → = a · D← x → + D ← y →.
     

 ..   ..   .. 
 .   .   . 
← rn → ← rn → ← rn →

2. A matrix with two equal rows has determinant 0:

 
← r1 →
 .. 

 . 

← x →
 .. 
 = 0.
D . 
← x →
 

 .. 
 . 
← rn →

3. D(I) = 1, where I is the n × n identity matrix.

Note: Later on, you’ll see the following standard notations instead of “D” for determinants. You can
either write “det” in place of D, or put vertical bars around the matrix:

det A or |A|.

For now, I’ll use “D”.

If you’ve never seen something defined this way, you might be a bit uneasy. Am I going to give you a
formula? Not yet.
First, a formula may be fine for computing things, but it doesn’t really tell you what the thing is. In
that respect, it’s superficial, like judging someone by their appearance. Giving axioms for a thing can give
a deeper view of what the thing is, and does — its essence
Second, defining determinants using axioms makes it a lot easier to prove many of the important
properties of determinants — for example, that the determinant of a product of matrices is the product of
the determinants.

1
You still want a formula or a recipe for computing determinants — right? Well, in some of the examples
below, we’ll see how you can compute determinants just using the axioms, or using row reduction. Be
patient, and you’ll feel better shortly!
Let’s see how the axioms look like in particular cases.
The ﬁrst axiom (linearity) is probably the hardest to understand. It allows you to add or subtract, or
move constants in and out, in a single row, assuming that all the other rows stay the same. It is easier to
show you what this means than to describe it in words.
For example, here are two determinants being combined into one. The two third rows are added, and
the other two rows (which must be the same in both matrices) are unchanged:
     
a b c a b c a b c
D d e f +D d e f  = D d e f .
x1 x2 x3 y1 y2 y3 x1 + y1 x2 + y2 x3 + y3

In row 3, I used the fact that

(x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 ).

You can also take a single determinant apart into two determinants. In this example, we have subtraction
instead of addition. All the action takes place in row 2; the ﬁrst and third rows are the same in all of the
matrices.      
a b c a b c a b c
D  x1 − y1 x2 − y2 x3 − y3  = D  x1 x2 x3  − D  y1 y2 y3  .
d e f d e f d e f
Linearity also allows you to factor a constant out of a single row, in this case row 1:
   
kx1 kx2 kx3 x1 x2 x3
D a b c  = k ·D a b c .
d e f d e f

You can do the opposite: Multiply a constant outside the determinant into a single row:

a b 5a 5b
5·D =D .
c d c d

You could do this as well:

a b a b
5·D =D .
c d 5c 5d
Here’s an example where you “take apart” a determinant using linearity. Notice that the ﬁrst two rows
are the same in all the matrices; all the action takes place in the third row:
     
a b c a b c a b c
D d e f  = D d e f +D d e f =
x1 + ky1 x2 + ky2 x3 + ky3 x1 x2 x3 ky1 ky2 ky3
   
a b c a b c
D d e f +k·D d e f .
x1 x2 x3 y1 y2 y3
First, I used linearity to break the given determinant up into two determinants. Then I factored k out
of the third row of the second determinant.
Perhaps you feel that I haven’t really told you what a determinant is, because I haven’t given you a
formula or recipe for computing a determinant. Those axioms seem pretty abstract. What you’d like is to

2
start with a matrix and produce a number. In fact, the three axioms above are enough to be able to compute
determinants (though not very efficiently). Here’s an example.
Suppose I have a determinant function D for 2 × 2 real matrices — so D satisfies the three axioms
above. Using only the axioms, I’ll compute

1 −1
D .
3 2
First, I’ll break up the second row into a sum of a multiple of the first row and another vector:

(3, 2) = (3, −3) + (0, 5) = 3 · (1, −1) + (0, 5).

Then I can use linearity to break the determinant up into two pieces.

1 −1 1 −1 1 −1 1 −1 1 −1 1 −1
D =D =D +D = 3·D +5·D =
3 2 3 + 0 −3 + 5 3 −3 0 5 1 −1 0 1

1 −1 1 −1 1 −1 1 −1
3·D +5·D = 3·0+5·D =5·D .
1 −1 0 1 0 1 0 1
Notice that in the second equality the first row stays the same, while the new second rows are (3, −3)
and (0, 5).
Notice that linearity also allows me to factor 3 and 5 out of the second rows for the third equality.
The alternating axiom says that a matrix with two equal rows has determinant 0, and that gave me the
fifth equality.
Now I do a similar trick with the first row:

1 −1 1 + 0 0 + −1 1 0 0 −1
5·D =5·D =5·D +5·D =
0 1 0 1 0 1 0 1

0 1
5 · 1 + (−1) · 5 · D = 5 + (−5) · 0 = 5.
0 1
The second equality used linearity applied to the first row: (1 + 0, 0 + −1) = (1, 0) + (0, −1). The third
equality used the fact that the determinant of the identity matrix is 1, and used linearity to factor −1 out
of the second row. The fourth equality used the fact that the determinant of a matrix with two equal rows
is 0.
Thus,
1 −1
D = 5.
3 2
Notice that we computed a determinant using only the axioms for a determinant. We don’t have a
formula at the moment (though you may have seen a formula for 2 × 2 determinants before). It’s true that
the computation took a lot of steps, and this is not the best way to do this — but this example gives some
evidence that our axioms actually tell what determinants are.
The next two results are often useful in computations.
First, you might suspect that a matrix with an all-zero row has determinant 0, and it’s easy to prove
using linearity. Rather than give a formal proof, I’ll illustrate the idea with a particular example.
       
0 0 0 0+0 0+0 0+0 0 0 0 0 0 0
D  2 0 17  = D  2 0 17  = D  2 0 17  + D  2 0 17  ,
7 −6 1 7 −6 1 7 −6 1 7 −6 1

The last equation says “(stuff) = (stuff) + (stuff)”. This means that (stuff) = 0. So
 
0 0 0
D  2 0 17  = 0.
7 −6 1

3
The same idea can be used to prove the result in general.
The next result tells us what happens to a determinant when we swap two rows of the matrix.
Lemma. If D : M (n, r) → R) is a function which is linear in the rows (Axiom 1) and is 0 when a matrix
has equal rows (Axiom 2), then swapping two rows multiplies the value of D by −1:
.. ..
   
 .   . 
 ← ri →   ← rj → 
..
   
D

 = −D 
  .. .

 .   . 
 ← rj →   ← ri → 
.. ..
   
. .

Proof. The proof will use the ﬁrst and second axioms repeatedly. The idea is to swap rows i and j by
adding or subtracting rows.
In the diagrams below, the gray rectangles represent rows which are unchanged and the same in all of
the matrices. All the “action” takes place in row i and row j.

D = D + D =D = D - D =

(by Axiom 2) (by Axiom 1) (by Axiom 2) (by Axiom 1)

D =D + D = D = -D

(by Axiom 2) (by Axiom 1) (by Axiom 1)

Notice that in each addition or subtraction step (the steps that use Axiom 1), only one of row i or row
j changes at a time.
Remarks. (a) I’ll show later that it’s enough to assume (instead of Axiom 2) that D(A) = 0 vanishes
whenever two adjacent rows of A are equal. (This is a technical point which you can forget about until we
need it.)
(b) Suppose that D : M (n, r) → R is a function satisfying Axioms 1 and 3, and suppose that swapping two
rows multiplies the value of D by −1. Must D satisfy Axiom 2? In other words, is “swapping multiplies the
value by −1” equivalent to “equal rows means determinant 0”?
Assuming that swapping two rows multiplies the value of D by −1, I have
   
← r1 → ← r1 →
 ..   .. 

 . 


 . 

← x → ← x →
 ..   .. 
D(A) = D   .
 = −D 
  .
 = −D(A).

← x → ← x →
   
 ..   .. 
 .   . 
← rn → ← rn →

(I swapped the two equal x-rows, which is why the matrix didn’t change. But by assumption, this
useless swap multiplies D by −1.)

4
Hence, 2 · D(A) = 0.
If R is R, Q, C, or Zn for n prime and not equal to 2, then 2 · D(A) = 0 implies D(A) = 0. However, if
R = Z2 , then 2x = 0 for all x. Hence, 2 · D(A) = 0, no matter what D(A) is. I can’t conclude that D(A) = 0
in this case. Therefore, Axiom 2 need not hold. You can see, however, that it will hold if R is a ﬁeld of
characteristic other than 2.
Fortunately, since I took “equal rows means determinant 0” as an axiom for determinants, and since the
lemma shows that this implies that “swapping rows multiplies the determinant by −1”, I know that both of
these properties will hold for determinant functions.

Example. (Computing determinants using the axioms) Suppose that D : M (3, R) → R is a determi-
nant function and    
← a1 → ← a2 →
D  ← b →  = 5 and D  ← b →  = −3.
← c → ← c →
Compute  
← c →
D← b →.
← a1 + 4a2 →

   
← c → ← a1 + 4a2 →
D← b →  = −D  ← b → =
← a1 + 4a2 → ← c →
    
← a1 → ← a2 →
− D  ← b →  + 4D  ← b →  = −(5 + 4 · (−3)) = 7.
← c → ← c →

Determinants and elementary row operations.

Elementary row operations are used to reduce a matrix to row reduced echelon form, and as a conse-
quence, to solve systems of linear equations. We can use them to compute determinants with more ease than
using the axioms directly — and, even when we have some better algorithms (like expansion by cofactors),
row operations will be useful in simplifying computations. How are determinants aﬀected by elementary row
operations?

Adding a multiple of a row to another row does not change the determinant. Suppose, for example, I’m
performing the operation ri → ri + a · rj . Let

..
 
 . 
← ri →
..
 
A= .
 
 . 
 ← rj →
..
 
.

Then
.. .. .. ..
       
 .   .   .   . 
 ← ri + a · rj → ← ri →  ← rj → ← ri →
.. .. .. ..
       
D  = D +a·D  = D  = D(A).
       
 .   .   .   . 
← r j →  ← rj →  ← rj →  ← rj →
.. ..
       
.. ..
. . . .

5
Therefore, this kind of row operation leaves the determinant unchanged.
The alternating property implies that swapping two rows multiplies the determinant by −1. For example,

a b c d
D = −D .
c d a b

Our third kind of row operation involves multiplying a row by a number (which must be invertible in
the ring from which the entries of the matrix come). So if I wanted to multiply the second row of a real
matrix by 19, I could do this:

a b 1 a b 1 a b
D = · 19 · D = D .
c d 19 c d 19 19c 19d

1
Thus, multiplying the second row by 19 leaves a factor of outside.
19
However, when you’re using row operations to compute a determinant, you usually want to factor a
number out of a row, which you can do using the linearity axiom. Thus:

6a 6b a b
D = 6·D .
c d c d

Example. (Computing a determinant using row operations) Suppose D is a determinant function

on 2 × 2 real matrices. Use row operations to compute the following determinant:

1 5
D .
−3 4

1 5 = 1 5 = 1 5 =
D D 19 · D
−3 4 r2 → r2 + 3r1 0 19 (Linearity) 0 1 r1 → r1 − 5r2

1 0
19 · D = 19 · 1 = 19.
0 1

Example. (Computing a determinant using row operations) Suppose D is a determinant function

on 2 × 2 matrices with entries in Z5 . Use row operations to compute the following determinant:

2 1
D .
3 4

In Z5 , I have 1 = 2 · 3. So

2 1 2·1 2·3 = 1 3 =
D =D 2·D
3 4 3 4 (Linearity) 3 4 r2 → r2 + 2r1

1 3
2·D = 2 · 0 = 0.
0 0
I used the fact that a matrix with an all-zero row has determinant 0.
Your experience with row reducing matrices tells you that either the row reduced echelon form will be the
identity, or it will have an all-zero row at the bottom. In the second case, we’ve seen that the determinant is
0. In the ﬁrst case, there may be constants multiplying the determinant, and the determinant of the identity
is 1 — and so, you know the value of the determinant by multiplying everything together.

6
Row reduction gives you a way of computing determinants that is a little more practical than applying
the axioms directly. It should also convince you that, starting with the three determinant axioms, we now
have something which takes a square matrix and produces a number.
There are still some questions we need to address. We need to prove that there really are functions
which satisfy the three axioms. (Just being able to compute in particular cases is not a proof.) This is called
an existence question. We will answer this question by producing an algorithm which gives a determinant
function for square matrices. It is called expansion by cofactors.
Could there be multiple determinant functions? Could more than one function on square matrices
satisfy the axioms? This seems unlikely given that we were able to start with numerical matrices and
compute specific numbers — but maybe a different approach might produce a different answer. This is
called a uniqueness question.
We will show that, in fact, there is only one determinant function — a function satisfying the three
axioms — on square matrices. It can be computed in various ways, but you’ll get the same answer in all
cases.
Along the way, we’ll find another approach which uses permutations to compute determinants. We’ll
also prove some important properties of determinants, such as the rule for products.

c 2021 by Bruce Ikenaga 7

11-10-2021
Expansion by Cofactors
I gave the axioms for determinant functions, and it seems from our examples that we could compute
determinants using only the axioms, or using row operations. But this doesn’t prove that there actually are
any functions which satisfy the axioms; maybe our computations were just luck.
We can settle this question by actually constructing a function which satisﬁes the axioms for a de-
terminant. I’ll do this using expansion by cofactors, which also gives us another way of computing
determinants. The construction is inductive: We’ll get started by giving formulas for the determinants of
1 × 1 and 2 × 2 matrices. Note that it’s not enough to give a formula; we also have to check that the formula
gives a function which satisﬁes the determinant axioms.
Next, the cofactor construction will allow us to go from a determinant function on (n − 1) × (n − 1)
matrices to a determinant function on n × n matrices (matrices “one size bigger”). By induction, this will
give a determinant function on n × n matrices for all n ≥ 1.
Let’s get started with the easy cases.
For a 1 × 1 matrix, the determinant is just the single entry:

D [ a ] = a.

The axioms are easy to check. If a, x, y ∈ R, then

D [ ax + y ] = ax + y = a · D [ x ] + D [ y ] .

Thus, the linearity axiom holds.

Since the matrix only has one row, it can’t have two equal rows, and the second axiom holds vacuously.
Finally, the 1 × 1 identity matrix has determinant 1.
I could start the induction at this point, but it’s useful to give an explicit formula for the determinant
of a 2 × 2 matrix. You may have seen this in (say) a multivariable calculus course.
Proposition. Deﬁne D : M (2, R) → R by

a b
D = ad − bc.
c d

Then D is a determinant function on M (2, R).

Proof. I will check that the function is linear in each row.
′
ka + a′ kb + b′ a b a b′
D = kad + a′ d − kbc − b′ c = k(ad − bc) + (a′ d − b′ c) = kD +D .
c d c d c d

a b a b a b
D = a(kd + d′ ) − b(kc + c′ ) = k(ad − bc) + (ad′ − bc′ ) = kD +D ′ .
kc + c′ kd + d′ c d c d′
This proves linearity.
Suppose the two rows are equal. Then

a b
D = ab − ab = 0.
a b

Therefore, Axiom 2 holds.

Finally,
1 0
D = 1.
0 1

1
All three axioms have been verified, so D is a determinant function.
To save writing, we often write
1 2 1 2
for D .
3 4 3 4
This is okay, since we’ll eventually find out that there is only one determinant function on n×n matrices.
You can also write “det” for the determinant function.
For example, on M (2, R),
1 2
= 4 − 6 = −2.
3 4
Let’s move on to the inductive step for the main result. Earlier I discussed the connection between
“swapping two rows multiplies the determinant by −1” and “when two rows are equal the determinant is 0”.
The next lemma is another piece of this picture. It says for a function which is linear in the rows, if “when
two adjacent rows are equal the determinant is 0”, then “swapping two rows multiplies the determinant by
−1”. (Axiom 2 does not require that the two equal rows be adjacent.) It’s a technical result which is used
in the proof of the main theorem and nowhere else, and the proof is rather technical as well. You could skip
it and refer to the result when it’s needed in the main theorem proof.
Lemma. Let f : M (n, r) → R be a function which is linear in each row and satisfies f (A) = 0 whenever
two adjacent rows are equal. Then swapping any two rows multiplies the value of f by −1.
Proof. First, I’ll show that swapping two adjacent rows multiplies the value of f by −1. I’ll show the
required manipulations in schematic form. The adjacent rows are those with the x’s and y’s; as usual, the
vertical dots represent the other rows, which are the same in all of the matrices.
 ..   ..   ..   .. 
. . . .
← x →  ← y + (x − y) →  ← y → ← x− y →
       
f =f =f +f =
← y → ← y → ← y → ← y →
.. .. .. ..
. . . .
.. .. ..
 
   ..   
.  .  . .
← x−y

→
 ← x − y →

← x−y → + f ← x − y → =
  
f =f
 ← x + (y − x)
=f
← y → → ← x → ← y − x →
   

.. .. .. ..
 
. . . .
 ..   ..   ..  .. 
. . . .
← x → ← y → ← x − y → ← y →.
       
f −f −f  = −f 
← x → ← x → ← x − y → ← x →

.. .. .. ..
. . . .
Next, I’ll show that you can swap any two rows by swapping adjacent rows and odd number of times.
Then since each swap multiplies f by −1, and since (−1)(odd number) = −1, it follows that swapping two
non-adjacent rows multiplies the value of f by −1.
To illustrate the idea, suppose the rows to be swapped are rows 1 and n. I’ll indicate how to do the
swaps by just displaying the row numbers. First, I swap row 1 with the adjacent row below it n − 1 times
to move it from the top of the matrix to the n-th (bottom) position:

←− r1 −→ ←− r2 −→ ←− r2 −→
     
 ←− r2 −→   ←− r1 −→   ←− r3 −→ 
←− r3 −→  ←− r3 −→   ←− r1 −→ 
     
 
.  → .  → .  → ··· →
. . .
 . ←− rn−2 −→   . ←− rn−2 −→   . ←− rn−2 −→ 
  
 ←− r −→   ←− r −→   ←− r −→ 
n−1 n−1 n−1
←− rn −→ ←− rn −→ ←− rn −→

2
←− r2 −→ ←− r2 −→ ←− r2 −→
     
 ←− r3 −→   ←− r3 −→   ←− r3 −→ 
 ←− r4 −→   ←− r4 −→   ←− r4 −→ 
     
 .  → .  → . 
 . . .
 . ←− r1 −→   . ←− rn−1 −→   . ←− rn−1 −→ 
  
 ←− r −→   ←− r −→   ←− r −→ 
n−1 1 n
←− rn −→ ←− rn −→ ←− r1 −→

Next, I swap (the old) row n with the adjacent row above it n − 2 times to move it to the top of the
matrix:
←− r2 −→ ←− r2 −→ ←− r2 −→
     
 ←− r3 −→   ←− r3 −→   ←− r3 −→ 
 ←− r4 −→   ←− r4 −→   ←− r4 −→ 
     
. → .  → .  → ··· →
.  . .
 . ←− rn−1 −→   . ←− rn −→   . ←− rn−2 −→ 
  
 ←− r −→   ←− r −→   ←− r −→ 
n n−1 n−1
←− r1 −→ ←− r1 −→ ←− r1 −→

←− r2 −→ ←− r2 −→ ←− rn −→
     
 ←− r3 −→   ←− rn −→   ←− r2 −→ 
 ←− rn −→   ←− r3 −→   ←− r3 −→ 
     
.  → .  → . 
. . .
 . ←− rn−2 −→   . ←− rn−2 −→   . ←− rn−2 −→ 
  
 ←− r −→   ←− r −→   ←− r −→ 
n−1 n−1 n−1
←− r1 −→ ←− r1 −→ ←− r1 −→

The original rows 1 and n have swapped places, and I needed (n− 1)+ (n− 2) = 2n− 3 swaps of adjacent
rows to do this. Now 2n − 3 is an odd number, and (−1)2n−3 = −1.
In general, if you want to swap row i and row j, where i < j, following the procedure above will require
that you swap 2j − 2i − 1 adjacent rows, an odd number. Once again, (−1)2j−2i−1 = −1.
Thus, swapping two non-adjacent rows multiplies the value of f by −1.

Deﬁnition. Let A ∈ M (n, r). Let A(i | j) be the (n − 1) × (n − 1) matrix obtained by deleting the i-th row
and j-th column of A. If D is a determinant function, then D[A(i | j)] is called the (i, j)th minor of A.

In the picture below, the i-th row and j-th column are shown in gray; they will be deleted, and the
remaining (n − 1) × (n − 1) matrix is A(i | j). Its determinant is the (i, j)th minor.

column j

row i

The (i, j)th cofactor of A is (−1)i+j times the (i, j)th minor, i.e. (−1)i+j · D[A(i | j)].

Example. Consider the real matrix

 
1 2 3
A = 4 5 6.
7 8 9

Find the (2, 3)th minor and the (2, 3)th cofactor.

3
To ﬁnd the (2, 3)th minor, remove the 2nd row and the 3rd column (i.e. the row and column containing
the (2, 3)th element):  
1 2 ∗
∗ ∗ ∗.
7 8 ∗

The (2, 3)th minor is the determinant of what’s left:

1 2
= 8 − 14 = −6.
7 8

To get the (2, 3)th cofactor, multiply this by (−1)2+3 = (−1)5 = −1. The (2, 3)th cofactor is (−1)·(−6) =
6.

Note: The easy way to remember whether to multiply by +1 or −1 is to make a checkboard pattern
of +’s and −’s:  
+ − +
− + −.
+ − +

Use the sign in the (i, j)th position. For example, there’s a minus sign in the (2, 3)th position, which
agrees with the sign I computed using (−1)i+j .

The main result says that we may use cofactors to extend a determinant function on (n − 1) × (n − 1)
matrices to a determinant function on n × n matrices.

Theorem. (Expansion by cofactors) Let R be commutative ring with identity, and let C be a determinant
function on M (n − 1, R). Let A ∈ M (n, R). For any j ∈ {1, . . . , n}, deﬁne
n
X
D(A) = (−1)i+j Aij C (A(i | j)) .
i=1

Then D is a determinant function on M (n, R).

Notice that the summation is on i, which is the row index. The index j is ﬁxed, and it indexes columns.
This means you’re moving down the j th column as you sum. Consequently, this is a cofactor expansion
by columns.

Proof. C is the “known” determinant function on (n − 1) × (n − 1) matrices (M (n − 1, R)), whereas D

is the determinant function on n × n matrices (M (n, R)) that we’re trying to construct. We’re using C to
build D, and the problem here is to verify that D satisfies the determinant axioms. Thus, I need to show
that D is linear in each row, D is alternating, and D(I) = 1.
This proof is moderately difficult; I’ve tried to write out the details and illustrate with pictures, but
don’t get discouraged if you find it challenging to follow.

Linearity: I’ll prove linearity in row k. Let a ∈ R, x, y ∈ Rn . I want to prove that

← r1 → ← r1 → ← r1 →
     
.. .. ..
. . .
     
     
D  ← ax + y → = a · D← x →+ D← y →.
     
 ..   ..   .. 
 .   .   . 
← rn → ← rn → ← rn →

All the action is taking place in the k-th row — the one with the x’s and y’s — and the other rows are
the same in the three matrices.

4
Label the three matrices above:
← r1 → ← r1 → ← r1 →
     
.. .. ..
. . .
     
     
P =  ← ax + y →  , Q = ← x →, R = ← y →.
     
 ..   ..   .. 
 .   .   . 
← rn → ← rn → ← rn →

The equation to be proved is

D(P ) = aD(Q) + D(R).
If I expand the D(·) terms by cofactors in this equation, I get
n
X n
X n
X
(−1)i+j Aij C (P (i | j)) = a · (−1)i+j Aij C (Q(i | j)) + (−1)i+j Aij C (R(i | j))
i=1 i=1 i=1

I’m going to show that the ith term in the sum on the left equals the sum of the ith terms of the sums
on the right. To to this, I will consider two cases: i 6= k and i = k.
First, consider a term in the cofactor sum where i 6= k — that is, where the row that is deleted is not the
k th row. The ith row and j th column are deleted from each matrix, as shown in gray in the picture below.
Since C is a determinant function, I can apply linearity to the ax + y row, and I get the following equation:

row i
C =aC +C
row k

The matrices in this picture are, from left to right, P (i | j), Q(i | j), and R(i | j). The matrices are the
same except in the k th row (the one with the x’s and y’s).
Thus, the terms of the summation on the two sides agree for i 6= k:

(−1)i+j Aij C (P (i | j)) = a · (−1)i+j Aij C (Q(i | j)) + (−1)i+j Aij C (R(i | j))

Next, consider the case where i = k, so the row that is deleted is the k th row. Now P , Q, and R only
diﬀer in row k, which is the row which is deleted.

C =aC +C
row i = row k

Once it’s deleted, the resulting matrices are the same:

C (P (k | j)) = C (Q(k | j)) = C (R(k | j)) .

Therefore,

(−1)k+j (axj + yj )C (P (k | j)) = (−1)k+j axj C (Q(k | j)) + (−1)k+j yj C (R(k | j)) .

Thus, the terms on the left and right are the same for all i, and D is linear.

5
Alternating: I have to show that if two rows of A are the same, then D(A) = 0.
First, I’ll show that if rows 1 and 2 are equal, then D(A) = 0.
Suppose then that rows 1 and 2 are equal. Here’s a typical term in the cofactor expansion:

(−1)i+j Aij C (A(i | j)) .

Suppose row i, the row that is deleted, is a row other than row 1 or row 2.

row 1
row 2

row i

Then the matrix that results after deletion will have two equal rows, since row 1 and row 2 were equal.
Therefore, C (A(i | j)) = 0, and the term in the cofactor expansion is 0.
Thus, all the terms in the cofactor expansion are 0 except the ﬁrst and second (i = 1 and i = 2). These
terms are
(−1)1+j A1j C (A(1 | j)) + (−1)2+j A2j C (A(2 | j)) .

Now A1j = A2j , since the ﬁrst and second rows are equal. And since row 1 and row 2 are equal, I get
the same matrix by deleting either row 1 or row 2:

row 1 row 1
row 2 row 2

That is, A(1 | j) = A(2 | j). Therefore, C (A(1 | j)) = C (A(2 | j)).
Thus, the only way in which the two terms above differ is in the signs (−1)1+j and (−1)2+j . But 1 + j
and 2 + j are consecutive integers, so one must be even and the other must be odd. Hence, (−1)1+j and
(−1)2+j are either +1 and −1 or −1 and +1. In either case, the terms cancel, and the sum of the two terms
is 0.
Hence, the cofactor expansion is equal to 0, and D(A) = 0, as I wished to prove.
Tou can give a similar argument if A has two adjacent rows equal other than rows 1 and 2 (so, for
instance, if rows 4 and 5 are equal). I will skip the details.
Thus, I know that D(A) = 0 if two adjacent rows of A are equal. Since I proved that D satisfies the
linearity axiom, the hypotheses of the previous technical lemma are satisfied, and I can apply it. It says that
swapping two rows multiplies the determinant by −1.
Now take the general case: Two rows of A are equal, but they aren’t necessarily adjacent.
I can swap the rows of A until the two equal rows are adjacent, and each swap multiplies the value of
the determinant by −1. Let’s say that k swaps are needed to get the two equal rows to be adjacent. That
is, after k row swaps, I get a matrix B which has adjacent equal rows. Then D(B) = 0 by the adjacent row
case above, so
D(A) = (−1)k D(B) = (−1)k · 0 = 0.

This completes the proof that D(A) = 0 if A has two equal rows, and the Alternating Axiom has been
veriﬁed.

6
The identity has determinant 1: Suppose A = I. Since the entries of the identity matrix are 0 except
on the main diagonal, I have Aij = 0 unless i = j. When i = j, I have Ajj = 1. Therefore, the cofactor
expansion of D(A) has only one nonzero term, which is

(−1)j+j Ajj C (A(j | j)) = 1 · 1 · C(I) = 1.

(I know C(I) = 1 because C is a determinant function.)

I’ve veriﬁed that D satisﬁes the 3 axioms. Thus, D is a determinant function.
The theorem says that if I have a determinant function on matrices of a given size, I can use it to
construct a determinant function on matrices “one size bigger”. So a determinant function on 10 × 10
matrices can be used to construct a determinant function on 11 × 11 matrices, and so on.
Corollary. Let R be a commutative ring with identity. There is a determinant function on n × n matrices
over R for n ≥ 1.
Proof. I constructed a determinant function on 1 × 1 and on 2 × 2 matrices. So applying the theorem, from
the 2 × 2 determinant function I get a 3 × 3 determinant function, from the 3 × 3 determinant function I get
a 4 × 4 determinant function, and so on. By induction, I get a determinant function on n × n matrices for
all n ≥ 1.
I now know that there is at least one determinant function on n × n matrices. In fact, there is only one,
but that will require a separate discussion. Anticipating that result, I’ll refer to the determinant function
(rather than a determinant function).
Since we’ll prove that there’s only one determinant function, if A is an n × n matrix, the determinant
of A will be denoted |A| or det A.
I’ll show later on that det A = det AT . Since transposing sends rows to columns and columns to rows,
this means that you can expand by cofactors of rows as well as columns. We’ll use this result now in
computations.
While you can expand along any row or column you want, but it’s usually good to pick one with lots of
0’s.
Example. (Computing a determinant by cofactors) Compute the determinant of the following real
matrix:  
1 3 −5
 1 0 −2 
6 1 1

Expanding by cofactors of the second column, I get

1 3 −5
1 −2 1 −5 1 −5
1 0 −2 = −(3) + (0) − (1) = −42.
6 1 6 1 1 −2
6 1 1

This diagram shows where the terms in the cofactor expansion come from:

1 3 -5 1 3 -5 1 3 -5
1 0 -2 1 0 -2 1 0 -2
6 1 1 6 1 1 6 1 1

7
For each element (3, 0, 1) in the second column, compute the cofactor for that element by crossing out
the row and column containing the element (cross-outs shown in gray), computing the determinant of the
2 × 2 matrix that is left, and multiplying by the sign (+ or −) that comes from the “checkerboard pattern”).
Then multiply the cofactor by the column element.
So for the first term, the element is 3, the sign is “−”, and after crossing out the first row and second
column, the 2 × 2 determinant that is left is
1 −2
.
6 1
As usual, this is harder to describe in words than it is to actually do. Try a few computations yourself.
Finally, I computed the 2 × 2 determinants using the 2 × 2 determinant formula I derived earlier.
You can often simplify a cofactor expansion by doing row operations first. For instance, if you can
produce a row or a column with lots of zeros, you can expand by cofactors of that row or column.
Example. (Computing a determinant using row operations and cofactors) Compute the determi-
nant of the following matrix in M (3, Z3 ):  
1 2 1
2 2 1
2 0 1

I’ll do a couple of row operations ﬁrst to make some zeros in the ﬁrst column. Remember that adding
a multiple of a row to another row does not change the determinant.

1 2 1 1 2 1 1 2 1
→ →
2 2 1 0 1 2 0 1 2 .
r → r2 + r1 r → r2 + r1
2 0 1 2 2 0 1 2 0 2 2

Now I expand by cofactors of column 1. The two zeros make the computation easy; I’ll write out those
terms just so you can see the cofactors:

1 2 1
1 2 2 1 2 1
0 1 2 =1· −0· +0· = (2 − 4) − 0 + 0 = −2 = 1.
2 2 2 2 1 2
0 2 2

c 2021 by Bruce Ikenaga 8

3-27-2022
Determinants - Uniqueness
In this section, I’ll derive another formula for determinants. Using this formula, we’ll be able to show
that there is only one function on n × n matrices which satisfies the axioms for a determinant.
The main theorem says that a function on n × n matrices which satisfies the first two axioms for a
determinant function — linearity on the rows, and alternating (0 on matrices with two equal rows) — is a
sum of products of entries of the matrix, “scaled” by the value of the function on the identity matrix (D(I)).
Each product of entries is formed by choosing n entries, one from each row and each column, in all possible
ways; the products get a plus or minus sign according to the sign of the permutation that the chosen entries
“represent”.
Theorem. Let R be a commutative ring with identity. If D : M (n, r) → R is linear in each row and is 0 on
matrices with equal rows, then
X
D(A) = sgn(σ)A1σ(1) · · · Anσ(n) D(I).
σ∈Sn

Proof. This proof is a little complicated; if you wish to skip it for now, at least try to understand the
statement of the theorem (and see the discussion that follows).
Write the matrix in terms of its rows:
← r1 →
 
 ← r2 → 
A= .. .
 . 
← rn →

Notice that the first row of A can be written this way:

r1 = [ A11 A12 · · · A1n ] = A11 · [ 1 0 · · · 0 ] + A12 · [ 0 1 · · · 0 ] + · · · + A1n · [ 0 0 ··· 1] =

A11 e1 + A12 e2 + · · · A1n en

Here ej is the j-th standard basis vector (equivalently, the j-th row of the identity matrix). So in
summation notation, X
r1 = A1j ej .
j

I can use linearity applied to row 1 to expand the determinant of A into a sum of determinants:
P
← j A1j ej → ← ej →
   
← r2 → X  ← r2 → 
D(A) = D  .. = A1j D  .. .
 .  
j . 
← rn → ← rn →

In the same way, X

r2 = A2j ej .
j

Use linearity applied to row 2 to expand the determinant terms in the last D-sum:

← ej →
 
XX ← ek →
D(A) = A1j A2k D  .. .
j k
 . 
← rn →

1
Continue in this way for all n rows. I’ll switch notation and use j1 , j2 , . . . , jn as the summation indices:
← ej1 →
XX X ← ej2 →
D(A) = ... A1j1 A2j2 · · · Anjn D  .. .
.
 
j1 j2 jn
← ejn →

If two of the j’s are equal, then the e’s are equal — e.g. if j3 = j7 , then ej3 = ej7 . But D is 0 on matrices
with equal rows. So terms with two of the j’s equal are 0. Hence, I only need to consider terms where all
the j’s are distinct numbers in the set {1, 2, . . . n}. This means that {j1 , j2 , . . . , jn } is a permutation of
{1, 2, . . . , n}. So I can just sum over all permutations in σ ∈ Sn :
 
← eσ(1) →
X  ← eσ(2) →
D(A) = A1σ(1) A2σ(2) · · · Anσ(n) D  .
 
..
σ∈Sn
 . 
← eσ(n) →

Consider the last matrix:  

← eσ(1) →
 ← eσ(2) →
.
 
 ..
 . 
← eσ(n) →
Each of the e’s is a standard basis vector. But the standard basis vectors are the rows of the n×n identity
matrix. Thus, this matrix is just the identity matrix with its rows permuted by σ. Every permutation is
a product of transpositions. By an earlier lemma, since D is linear on the rows and is 0 for matrices with
equal rows, a transposition (i.e. a row swap) multiplies the value of D by −1. Hence,
 
← eσ(1) →
 ← eσ(2) →
D  = sgn(σ)D(I).
 
..
 . 
← eσ(n) →

Therefore, X
D(A) = A1σ(1) A2σ(2) · · · Anσ(n) sgn(σ)D(I).
σ∈Sn

Corollary. Let R be a commutative ring with identity, and let A ∈ M (n, R).
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn

Proof. Since the determinant function det defined by expansion by cofactors is alternating and linear in
each row, it satisfies the conditions of the theorem. Since det I = 1, the formula in the theorem becomes
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn

I’ll call this the permutation formula for the determinant.

Let’s try to understand a typical product in the sum: A1σ(1) A2σ(2) · · · Anσ(n) . You can see that the row
indices go from 1 to n, so we’re choosing one entry of the matrix from each row. The column indices are

2
σ(1), σ(2), . . . σ(n). This is a permutation of {1, 2, . . . n}, which means each of the numbers from 1 to n is
chosen exactly once. This means that we’re also choosing the entries so that one comes from each column.
We’re summing over all permutations of {1, 2, . . . n}, so we’re choosing entries for our products in all such
ways. Let’s see how this looks for small matrices.
Consider a 2 × 2 matrix:
a b
c d
I have to choose 2 entries at a time, so the 2 entries come from different rows and columns. I multiply
the 2 chosen entries to get one of the product terms. There are 2! = 2 ways to do this; they are

a d, bc

Next, consider a 3 × 3 matrix:  

a b c
d e f
g h i
I have to choose 3 entries at a time, so the 3 entries come from different rows and columns. I multiply
the 3 entries to get one of the product terms, and I do this in all possible ways. There are 3! = 6 ways to do
this; they are
a e i, a f h, b d i, b f g, c d h, c e g
For a 4 × 4 matrix, there will be 4! = 24 products!
The products we’re getting will each get a plus or minus sign, depending on the permutation the product
represents. We’ll see how to determine the signs in the example below. Finally, all the signed products are
added up to get the determinant.
Example. Use the permutation formula to compute the following real determinant:
 
2 1 4
det  1 −1 2 
5 3 1

I have to choose 3 entries from the matrix at a time, in such a way that there is one entry from each
row and each column. For each such choice, I take the product of the three elements and multiply by the
sign of the permutation of the elements, which I’ll describe below. Finally, I add up the results.
In order to do this systematically, focus on the first column. I can choose 2, 1, or 5 from column 1.
If I choose 2 from column 1, I can either choose −1 from column 2 and 1 from column 3, or 3 from
column 2 and 2 from column 3. (Remember that I can’t have two elements from the same row or column.)
   
2 ∗ ∗ 2 ∗ ∗
 ∗ −1 ∗   ∗ ∗ 2 
∗ ∗ 1 ∗ 3 ∗

If I choose 1 from column 1, I can either choose 1 from column 2 and 1 from column 3, or 3 from column
2 and 4 from column 3.    
∗ 1 ∗ ∗ ∗ 4
1 ∗ ∗ 1 ∗ ∗
∗ ∗ 1 ∗ 3 ∗
Finally, if I choose 5 from column 1, I can either choose 1 from column 2 and 2 from column 3, or −1
from column 2 and 4 from column 3.
   
∗ 1 ∗ ∗ ∗ 4
∗ ∗ 2 ∗ −1 ∗ 
5 ∗ ∗ 5 ∗ ∗

3
This gives me 6 products:

(2)(−1)(1) (2)(3)(2) (1)(1)(1) (1)(3)(4) (5)(1)(2) (5)(−1)(4)

Next, I have to attach a sign to each product. To do this, I count the number of row swaps I need to
move the 1’s in the identity matrix into the same positions as the numbers in the product. I’ll illustrate with
two examples.
       
∗ ∗ 4 1 0 0 0 1 0 0 0 1
1 ∗ ∗ : 0 1 0 → 1 0 0 → 1 0 0
r1 ↔ r2 r1 ↔ r3
∗ 3 ∗ 0 0 1 0 0 1 0 1 0

It took 2 row swaps to move the 1’s into the same positions as 1, 3, and 4. Since 2 is even, the sign of
(1)(3)(4) is +1.
     
∗ ∗ 4 1 0 0 0 0 1
 ∗ −1 ∗  : 0 1 →
0 0 1 0
r1 ↔ r3
5 ∗ ∗ 0 0 1 1 0 0
It took 1 row swap to move the 1’s into the same positions as 7, 5, and 3. Since 1 is odd, the sign of
(5)(−1)(4) is −1.
Continuing in this fashion, I get
 
2 1 4
det  1 −1 2  = (2)(−1)(1) − (2)(3)(2) − (1)(1)(1) + (1)(3)(4) + (5)(1)(2) − (5)(−1)(4) = 27.
5 3 1

Notice how ugly the computation was! While the permutation formula can be used for computations,
it’s easier to use row or column operations or expansion by cofactors. The main point of the permutation
formula lies in the following Corollary. It says there is only one function on n × n matrices which satisfies
the three axioms for a determinant — the determinant function is unique. Row reduction, expansion by
cofactors, and the permutation formula give different ways of computing the same thing.
The permutation formula is connected to a trick for computing determinants of 3 × 3 matrices. You
may have seen this trick in other math courses, or in physics courses. I’ll illustrate with the matrix in the
last example.
Warning: This only works on determinants which are 3 × 3!
Begin by making copies of the first two columns of the matrix. Put the copies to the right of the original
matrix:
2 1 4 2 1
1 -1 2 1 -1
5 3 1 5 3
Next, draw diagonal lines through the elements as shown below. Three lines down and to the right,
three lines up and to the right:
2 1 4 2 1
1 -1 2 1 -1
5 3 1 5 3
Form products by multiplying the elements along each line. The products of the “down and right” lines
get plus signs, and the products of the “down and left” lines get minus signs:

(2)(−1)(1) + (1)(2)(5) + (4)(1)(3) − (5)(−1)(4) − (3)(2)(2) − (1)(1)(1) = 27.

4
You can see that we got the same terms as we got with the permutation formula, with the factors and
the terms in a different order.
Again, I emphasize that this trick only works on matrices which are 3 × 3! You can’t use it on matrices
of any other size. It’s not bad for 3 × 3 determinants you’re computing by hand, so feel free to use it if you
wish. Don’t try to use it on determinants which are 2 × 2, 4 × 4, and so on.
Corollary. Let R be a commutative ring with identity. There is a unique determinant function | · | :
M (n, R) → R.
Proof. The permutation formula says
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn

But the right side only depends on the entries of the matrix A. So D(A) is completely determined by
A, and there can be only one determinant function on n × n matrices.
We know that the determinant function defined by cofactor expansion satisfies the axioms for a deter-
minant function. Therefore, it is the only determinant function on n × n matrices.
This doesn’t mean that you can’t compute the determinant in different ways; in fact, the permutation
formula gives a different way of computing determinants than cofactor expansion. To say that there’s only
one determinant function means that any function satisfying the determinant axioms will give the same
answer as any other function satisfying the determinant axioms, for a given matrix.
Remark. Here’s another way to express the theorem. Suppose det denotes the determinant function on
M (n, R). If D : M (n, r) → R is alternating and linear in each row, then

D(A) = (det A)D(I).

In other words, a function which satisfies the first two axioms for a determinant function is a multiple
of the “real” determinant function, the multiple being the value the function takes on the identity matrix.
In the case of the “real” determinant function, the third axiom says det I = 1, so the multiple is 1 and D is
the “real” determinant function.

c 2021 by Bruce Ikenaga 5

9-8-2022
Determinants - Properties
In this section, we’ll derive some properties of determinants. Two key results: The determinant of a
matrix is equal to the determinant of its transpose, and the determinant of a product of two matrices is
equal to the product of their determinants.
We’ll also derive a formula involving the adjugate of a matrix. We’ll use it to give a formula for the
inverse of a matrix, and to derive Cramer’s rule, a method for solving some systems of linear equations.
The first result is a corollary of the permutation formula for determinants which we derived earlier.
Corollary. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then |A| = |AT |.
Proof. We’ll use the permutation formula for the determinant, beginning with the determinant of AT .

X n
Y X n
Y
|AT | = sgn(σ) ATiσ(i) = sgn(σ) Aσ(i)i =
σ∈Sn i=1 σ∈Sn i=1

X n
Y X n
−1
Y
sgn(σ) Ajσ−1 (j) = sgn σ Ajσ−1 (j) =
σ∈Sn j=1 σ∈Sn j=1

X n n
Y X Y
sgn σ −1 Ajσ−1 (j) = sgn (τ ) Ajτ (j) = |A|.
σ−1 ∈Sn j=1 τ ∈Sn j=1

In the fourth equality, I went from summing over σ in Sn to σ −1 in Sn . This is valid because permutations
are bijective functions, so they have inverse functions which are also permutations. So summing over all
permutations in Sn is the same as summing over all their inverses in Sn — you will get the same terms in
the sum, just in a different order.
I got the next-to-the-last equality by letting τ = σ −1 . This just makes it easier to recognize the
next-to-last expression as the permutation formula for |A|.
Remark. We’ve used row operations as an aid to computing determinants. Since the rows of A are the
columns of AT and vice versa, the Corollary implies that you can also use column operations to compute
determinants. The allowable operations are swapping two columns, multiplying a column by a number, and
adding a multiple of a column to another column. They have the same effects on the determinant as the
corresponding row operations.
This also means that you can compute determinants using cofactors of rows as well as columns.
In proving the uniqueness of determinant functions, we showed that if D is a function on n × n matrices
which is alternating and linear on the rows, then D(M ) = (det M )D(I). We will use this to prove the
product rule for determinants.

Theorem. Let R be a commutative ring with identity, and let A, B ∈ M (n, R). Then |AB| = |A||B|.
Proof. Fix B, and define
D(A) = |AB|.
I will show that D is alternating and linear, then apply a result I derived in showing uniqueness of
determinant functions.
Let ri denote the i-th row of A. Then

← r1 B →
← r2 B →
D(A) = .. .
.
← rn B →

1
Now | · | is alternating, so interchanging two rows in the determinant above multiplies D(A) by −1.
Hence, D is alternating.
Next, I’ll show that D is linear:
 ..  ..
. .
D  ← kx + y →  = ← (kx + y)B → =
 
.. ..
. .

.. .. ..   .. 
. . . .
k · ← xB → + ← yB → = k · D← x → + D← y →.
   
.. .. .. ..
. . . .
This proves that D is linear in each row.
Since D is a function on M (n, R) which is alternating and linear in the rows, the result I mentioned
earlier shows
D(A) = |A|D(I).
But D(A) = |AB| and D(I) = |IB| = |B|, so we get

|AB| = D(A) = |A|D(I) = |A||B|.

In other words, the determinant of a product is the product of the determinants. A similar result holds
for powers.
Corollary. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then for every m ≥ 0,

|Am | = |A|m .

Proof. This follows from the previous result using induction. The result is obvious for m = 0 and m = 1
(note that A0 = I, the identity matrix), and the case m = 2 follows from the previous result if we take
B = A.
Suppose the result is true for m, so |Am | = |A|m . We need to show that the result holds for m + 1. We
have
|Am+1 | = |Am A| = |Am ||A| = |A|m |A| = |A|m+1 .
We used the case m = 2 to get the second equality, and the induction assumption was used to get the
third equality. This proves the result for m + 1, so it holds for all m ≥ 0 by induction.
While the determinant of a product is the product of the determinants, the determinant of a sum is not
necessarily the sum of the determinants.

Example. Give a specific example of 2 × 2 real matrices A and B for which det(A + B) 6= det A + det B.

1 0 −1 0
det = 1 and det = 1.
0 1 0 −1
But
1 0 −1 0 0 0
det + = det = 0.
0 1 0 −1 0 0

The rule for products gives us an easy criterion for the invertibility of a matrix. First, I’ll prove the
result in the special case where the entries of the matrix are elements of a field.
Theorem. Let F be a field, and let A ∈ M (n, F ).

2
A is invertible if and only if |A| 6= 0.
Proof. If A is invertible, then
|A||A−1 | = |AA−1 | = |I| = 1.
This equation implies that |A| 6= 0 (since |A| = 0 would yield “0 = 1”).
Conversely, suppose that |A| 6= 0. Suppose that A row reduces to the row reduced echelon matrix R,
and consider the effect of elementary row operations on |A|. Swapping two rows multiplies the determinant
by −1. Adding a multiple of a row to another row leaves the determinant unchanged. And multiplying a
row by a nonzero number multiplies the determinant by that nonzero number. Clearly, no row operation
will make the determinant 0 if it was nonzero to begin with. Since |A| 6= 0, it follows that |R| 6= 0.
Since R is a row reduced echelon matrix with nonzero determinant, it can’t have any all-zero rows. An
n × n row reduced echelon matrix with no all-zero rows must be the identity, so R = I. Since A row reduces
to the identity, A is invertible.
Corollary. Let F be a field, and let A ∈ M (n, F ). If A is invertible, then

|A−1 | = |A|−1 .

Proof. I showed in proving the theorem that |A||A−1 | = 1, so |A−1 | = |A|−1 .

We’ll see below what happens if we have a commutative ring with identity instead of a field.
The next example uses the determinant properties we’ve derived.
Example. Suppose A, B, and C are n × n matrices over R, and

|A| = 18, |B| = 5, |C| = 3.

Compute |AT B 2 C −1 |.
1 1
We have |AT | = |A| = 18 and |C −1 | = = . Using the product rule for determinants,
|C| 3

1
|AT B 2 C −1 | = |AT ||B|2 |C −1 | = 18 · 52 · = 150.
3

Definition. Let R be a commutative ring with identity. Matrices A, B ∈ M (n, R) are similar if there is an
invertible matrix P ∈ M (n, R) such that P AP −1 = B.
Similar matrices come up in many places, for instance in changing bases for vector spaces.

Corollary. Let R be a commutative ring with identity. Similar matrices in M (n, R) have equal determinants.
Proof. Suppose A and B are similar, so P AP −1 = B for some invertible matrix P . Then

|B| = |P AP −1 | = |P ||A||P −1 | = |P ||P −1 ||A| = |P P −1 ||A| = |I||A| = |A|.

In the third equality, I used the fact that |P −1 | and |A| are numbers — elements of the ring R — and
multiplication in R is commutative. That allows me to commute |P −1 | and |A|.
Definition. Let R be a commutative ring with identity, and let A ∈ M (n, R). The adjugate adj A is the
matrix whose i-j-th entry is
(adj A)ij = (−1)i+j |A(j | i)|.
In other words, adj A is the transpose of the matrix of cofactors.
Remark. In the past, adj A was referred to as the adjoint, or the classical adjoint. But the term “adjoint”
is now used to refer to something else: The conjugate transpose, which we’ll see when we discuss the

3
spectral theorem. So the term “adjugate” has come to replace it for the matrix defined above. One
advantage of the word “adjugate” is that you can use the same abbreviation “adj” as was used for “adjoint”!
Example. Compute the adjugate of  
1 0 3
A = 0 1 1.
1 −1 2

First, I’ll compute the cofactors. The first line shows the cofactors of the first row, the second line the
cofactors of the second row, and the third line the cofactors of the third row.

1 1 0 1 0 1
+ = 3, − = 1, + = −1,
−1 2 1 2 1 −1

0 3 1 3 1 0
− = −3, + = −1, − = 1,
−1 2 1 2 1 −1
0 3 1 3 1 0
+ = −3, − = −1, + = 1.
1 1 0 1 0 1
The adjugate is the transpose of the matrix of cofactors:
 
3 −3 −3
adj A =  1 −1 −1  .
−1 1 1

The next result shows that adjugates and tranposes can be interchanged: The adjugate of the transpose
equals the transpose of the adjugate.
Proposition. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then

(adj A)T = adj AT .

Proof. Consider the (i, j)th elements of the matrices on the two sides of the equation.

[(adj A)T ]ij = (adj A)ji = (−1)j+i |A(i | j)|,

[adj AT ]ij = (−1)i+j |AT (j | i)|.

The signs (−1)j+i and (−1)i+j are the same; what about the other terms? |A(i | j)| is the determinant
of the matrix formed by deleting the ith row and the j th column from A. And |AT (j | i)| is the determinant
of the matrix formed by deleting the j th row and ith column from AT . But the ith row of A is the ith column
of AT , and the j th column of A is the j th row of AT . So the two matrices that remain after these deletions
are transposes of one another, and hence they have the same determinant. Thus, |A(i | j)| = |AT (j | i)|.
Hence, [(adj A)T ]ij = [adj AT ]ij .
The next theorem is very important, but the proof is a little tricky. So I’ll discuss the main point in the
proof first by giving an example.
Suppose we compute the following determinant over R using expansion by cofactors on the 3rd row:

1 2 4
2 4 1 4 1 2
1 −1 0 = (2) − (3) + (−2) =
−1 0 1 0 1 −1
2 3 −2

(2)(4) − (3)(−4) + (−2)(−3) = 8 + 12 + 6 = 26.

As usual, I multiplied the cofactors of the 3rd row by the elements of the 3rd row.

4
Now suppose I make a mistake: I multiply the cofactors of the 3rd row by elements of the 1st row (which
are 1, 2, 4). Here’s what I get:

2 4 1 4 1 2
(1) − (2) + (4) =
−1 0 1 0 1 −1

(1)(4) − (2)(−4) + (4)(−3) = 4 + 8 − 12 = 0.

Or suppose I multiply the cofactors of the 3rd row by elements of the 2nd row (which are 1, −1, 0).
Here’s what I get:
2 4 1 4 1 2
(1) − (−1) + (0) =
−1 0 1 0 1 −1

(1)(4) − (−1)(−4) + (0)(−3) = 4 − 4 + 0 = 0.

These examples suggest that if I try to do a cofactor expansion by using the cofactors of one row
multiplied by the elements from another row, I get 0. It turns out that this is true in general, and is the key
step in the next proof.

Theorem. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then

|A| · I = A · adj A.

Proof. This proof is a little tricky, so you may want to skip it for now.
We expand |A| by cofactors of row i:
X
|A| = (−1)i+j Aij |A(i | j)|.
j

First, suppose k 6= i. Construct a new matrix B by replacing row k of A with row i of A. Thus, the
elements of B are the same as those of A, except that B’s row k duplicates A’s row i.

row k

A B

In symbols,

Alj if l 6= k
Blj =
Aij if l = k

Suppose we compute det B by expanding by cofactors of row k. We get

n
X n
X
(−1)k+j Bkj |B(k | j)| = (−1)k+j Aij |A(k | j)|.
j=1 j=1

Why is |B(k | j)| = |A(k | j)|? To compute |B(k | j)|, you delete row k and column j from B. To
compute |A(k | j)|, you delete row k and column j from A. But A and B only differ in row k, which is being

5
deleted in both cases. Hence, |B(k | j)| = |A(k | j)|.

delete delete
column j column j

delete row k

A B
The deleted row k is the only row
in which A and B are different
On the other hand, B has two equal rows — its row i and row k are both equal to row i of A — so the
determinant of B is 0. Hence,
X n
(−1)k+j Aij |A(k | j)| = 0.
j=1

This is the point we illustrated prior to stating the theorem: if you do a cofactor expansion by using
the cofactors of one row multiplied by the elements from another row, you get 0. The last equation is what
we get for k 6= i. In case k = i, we just get the cofactor expansion for |A|:
n
X
(−1)i + jAij |A(i | j)| = |A|.
j=1

We can combine the two equations into one using the Kronecker delta function:
X
(−1)k+j Aij |A(k | j)| = δik |A| for all i, k.
j

Remember that δik = 1 if i = k, and δik = 0 if i 6= k. These are the two cases above.
Interpret this equation as a matrix equation, where the two sides represent the (i, k)-th entries of their
respective matrices. What are the respective matrices? Since δik is the (i, k)-th entry of the identity matrix,
the right side is the (i, k)-th entry of |A| · I.
The left side is the (i, k)-th entry of A · adj A, because
X X
(A · adj A)ik = Aij (adj A)jk = Aij (−1)j+k |A(k | j)|.
j j

Therefore,
|A| · I = A · adj A. .
I can use the theorem to obtain an important corollary. I already know that a matrix over a field is
invertible if and only if its determinant is nonzero. The next result explains what happens over a commutative
ring with identity, and also provides a formula for the inverse of a matrix.
Corollary. Let R be a commutative ring with identity. A matrix A ∈ M (n, R) is invertible if and only if
|A| is invertible in R, in which case
A−1 = |A|−1 adj A.

6
Proof. First, suppose A is invertible. Then AA−1 = I, so

|A||A−1 | = |AA−1 | = |I| = 1.

Therefore, |A| is invertible in R.

Since |A| is invertible, I can take the equation |A| · I = A · adj A and multiply by |A|−1 to get

I = A · |A|−1 adj A.

This implies that A−1 = |A|−1 adj A.

Conversely, suppose |A| is invertible in R. As before, I get

I = A · |A|−1 adj A.

Again, this implies that A−1 = |A|−1 adj A, so A is invertible.

As a special case, we get the formula for the inverse of a 2 × 2 matrix.

Corollary. Let R be a commutative ring with identity. Suppose a, b, c, d ∈ R, and ad − bc is invertible in

R. Then −1
a b d −b
= (ad − bc)−1 .
c d −c a

Proof.
a b a b d −b
det = ad − bc and adj = .
c d c d −c a
Hence, the result follows from the adjugate formula.

To see the difference between the general case of a commutative ring with identity and a field, consider
the following matrices over Z6 :
5 3 2 1
and
1 1 1 3
In the first case,
5 3
det = 2.
1 1
2 is not invertible in Z6 — do you know how to prove it? Hence, even though the determinant is nonzero,
the matrix is not invertible.

2 1
det = 5.
1 3
5 is invertible in Z6 — in fact, 5 · 5 = 1. Hence, the second matrix is invertible. You can find the inverse
using the formula in the last corollary.

The adjugate formula can be used to find the inverse of a matrix. It’s not very good for big matrices
from a computational point of view: The usual row reduction algorithm uses fewer steps. However, it’s not
too bad for small matrices — say 3 × 3 or smaller.

Example. Compute the inverse of the following real matrix using the adjugate formula.
 
1 −2 −2
A = 3 −2 0  .
1 1 1

7
First, I’ll compute the cofactors. The first line shows the cofactors of the first row, the second line the
cofactors of the second row, and the third line the cofactors of the third row. I’m showing the “checkerboard”
pattern of pluses and minuses as well.

−2 0 3 0 3 −2
+ = −2, − = −3, + = 5,
1 1 1 1 1 1

−2 −2 1 −2 3 −2
− = 0, + = 3, − = −3,
1 1 1 1 1 1
−2 −2 1 −2 1 −2
+ = −4, − = −6, + = 4.
2 0 3 0 3 −2
The adjugate is the transpose of the matrix of cofactors:
 
−2 0 −4
adj A =  −3 3 −6  .
5 −3 4

I’ll let you show that det A = −6. So I have

 
−2 0 −4
−1 1
A =− −3 3 −6  .
6
5 −3 4

Another consequence of the formula |A| · I = A · adj A is Cramer’s rule, which gives a formula for the
solution of a system of linear equations.
Corollary. (Cramer’s rule) If A is an invertible n × n matrix, the unique solution to Ax = y is given by

|Bi |
xi = ,
|A|

Here Bi is the matrix obtained from A by replacing its i-th column by y.

Proof.
Ax = y
(adj A)Ax = (adj A)y
Hence, X X
|A|xi = (adj A)ij yj = (−1)i+j yj |A(j | i)|.
j j

But the last sum is a cofactor expansion of A along column i, where instead of the elements of A’s
column i I’m using the components of y. This is exactly |Bi |.
Example. Use Cramer’s Rule to solve the following system over R:

2x + y + z = 1
x + y − z = 5
3x − y + 2z = −2

In matrix form, this is     

2 1 1 x 1
1 1 −1   y  =  5  .
3 −1 2 z −2

8
I replace the successive columns of the coefficient matrix with (1, 5, −2), in each case computing the
determinant of the resulting matrix and dividing by the determinant of the coefficient matrix:

1 1 1 2 1 1 2 1 1
5 1 −1 1 5 −1 1 1 5
−2 −1 2 −10 10 3 −2 2 −6 6 3 −1 −2 19 19
x= = = , y= = = , z= = =− .
2 1 1 −7 7 2 1 1 −7 7 2 1 1 −7 7
1 1 −1 1 1 −1 1 1 −1
3 −1 2 3 −1 2 3 −1 2

This looks pretty simple, doesn’t it? But notice that you need to compute four 3 × 3 determinants to do
this (and I didn’t write out the work for those computations!). It becomes more expensive to solve systems
this way as the matrices get larger.
As with the adjugate formula for the inverse of a matrix, Cramer’s rule is not computationally efficient:
It’s better to use row reduction to solve large systems. Cramer’s rule is not too bad for solving systems of
two linear equations in two variables; for anything larger, you’re probably better off using row reduction.

c 2022 by Bruce Ikenaga 9

8-14-2022
Vector Spaces
Vector spaces and linear transformations are the primary objects of study in linear algebra. A
vector space (which I’ll define below) consists of two sets: A set of objects called vectors and a field (the
scalars).

You may have seen vectors before — in physics or engineering courses, or in multivariable calculus. In
those courses, you tend to see particular kinds of vectors, and it could lead you to think that those particular
kinds of vectors are the only kinds of vectors. We’ll discuss vectors by giving axioms for vectors. When
you define a mathematical object using a set of axioms, you are describing how it behaves. Why is this
important?
One common intuitive description of vectors is that they’re things which “have a magnitude and a
direction”. This isn’t a bad description for certain kinds of vectors, but it has some shortcomings. Consider
the following example: Take a book and hold it out in front of you with the cover facing toward you.
90 degrees away
90 degrees left
Book
Book

ok
Bo
90 degrees left 90 degrees away

Book ok
Bo
Book

First, rotate the book 90◦ away from you, then (without returning the book to its original position)
rotate the book 90◦ to your left. The first three pictures illustrate the result.
Next, return the book to its original position facing you. Rotate the book 90◦ to your left, then
(without returning the book to its original position) rotate the book (without returning the book to its
original position) away from you. The next three pictures illustrate the result.
In other words, we’re doing two rotations — 90◦ away from you and 90◦ to your left — one after the
other, in the two possible orders. Note that the final positions are different.
It’s certainly reasonable to say that rotations by 90◦ away from you or to your left are things with
“magnitudes” and “directions”. And it seems reasonable to “add” such rotations by doing one followed by
the other. However, we saw that when we performed the “addition” in different orders, we got different
results. Symbolically,
A + B 6= B + A.
The addition fails to be commutative. But it happens that we really do want vector addition to be
commutative, for all of the “vectors” that come up in practice.
It is not enough to tell what a thing “looks like” (“magnitude and direction”); we need to say how the
thing behaves. This example also showed that words like “magnitude” and “direction” are ambiguous.
Other descriptions of vectors — as “arrow”, or as “lists of numbers” — also describe particular kinds
of vectors. And as with “magnitude and direction”, they’re incomplete: They don’t tell how the “vectors”
in question behave.
Our axioms for a vector space describe how vectors should behave — and if they behave right, we don’t
care what they look like! It’s okay to think of “magnitude and direction” or “arrow” or “list of numbers”,
as long as you remember that these are only particular kinds of vectors.

1
Let’s see the axioms for a vector space.
Definition. A vector space V over a field F is a set V equipped with two operations. The first is called
(vector) addition; it takes vectors u and v and produces another vector u + v.
The second operation is called scalar multiplication; it takes an element a ∈ F and a vector u ∈ V
and produces a vector au ∈ V .
These operations satisfy the following axioms:
1. Vector addition is associative: If u, v, w ∈ V , then

(u + v) + w = u + (v + w).

2. Vector addition is commutative: If u, v ∈ V , then

u + v = v + u.

3. There is a zero vector 0 which satisfies

0 + u = u = u + 0 for all u ∈ V.

Note: Some people prefer to write something like “~0” for the zero vector to distinguish it from the
number 0 in the field F . I’ll be a little lazy and just write “0” and rely on you to determine whether it’s the
zero vector or the number zero from the context.
4. For every vector u ∈ V , there is a vector −u ∈ V which satisfies

u + (−u) = 0 = (−u) + u.

5. If a, b ∈ F and x ∈ V , then
a(bx) = (ab)x.

6. If a, b ∈ F and x ∈ V , then
(a + b)x = ax + bx.

7. If a ∈ F and x, y ∈ V , then
a(x + y) = ax + ay.

8. If x ∈ V , then
1 · x = x.

The elements of V are called vectors; the elements of F are called scalars. As usual, the use of words
like “multiplication” does not imply that the operations involved look like ordinary “multiplication”.
Note that Axiom (4) allows us to define subtraction of vectors this way:

x − y = x + (−y).

An easy (and trivial) vector space (over any field F ) is the zero vector space V = {0}. It consists of
a zero vector (which is required by Axiom 3) and nothing else. The scalar multiplication is a · 0 = 0 for any
a ∈ F . You can easily check that all the axioms hold.
The most important example of a vector space over a field F is given by the “standard” vector space F n .
In fact, every (finite-dimensional) vector space over F is isomorphic to F n for some nonnegative integer n.
We’ll discuss isomorphisms later; let’s give the definition of F n .

2
If F is a field and n ≥ 1, then F n denotes the set

F n = {(a1 , . . . , an ) | a1 , . . . , an ∈ F }.

If you know about (Cartesian) products, you can see that F n is the product of n copies of F .
We can also define F 0 to be the zero vector space {0}.
If v ∈ F n and v = (v1 , v2 , . . . vn ), I’ll often refer to v1 , v2 , . . . vn as the components of v.
Proposition. F n becomes a vector space over F with the following operations:

(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn ).

p · (a1 , . . . , an ) = (pa1 , . . . , pan ), where p ∈ F.

F n is called the vector space of n-dimensional vectors over F . The elements a1 , . . . , an are called
the vector’s components.
Proof. I’ll check a few of the axioms by way of example.
To check Axiom 2, take u, v ∈ F n and write

u = (u1 , u2 , . . . un ) and v = (v1 , v2 , . . . vn ).

Thus, ui , vi ∈ F for i = 1, . . . n.
Then

u + v = (u1 , u2 , . . . un ) + (v1 , v2 , . . . vn ) = (u1 + v1 , u2 + v2 , . . . un + vn ) = (v1 + u1 , v2 + u2 , . . . vn + un ) =

(v1 , v2 , . . . vn ) + (u1 , u2 , . . . un ) = v + u.
The second equality used the fact that ui + vi = vi + ui for each i, because the u’s and v’s are elements
of the field F , and addition in F is commutative.
The zero vector is 0 = (0, 0, . . . 0) (n components). I’ll use “0” to denote the zero vector as well as the
number 0 in the field F ; the context should make it clear which of the two is intended. For instance, if v is
a vector and I write “0 + v”, the “0” must be the zero vector, since adding the number 0 to the vector v is
not defined.
If u = (u1 , u2 , . . . un ) ∈ F n , then

0 + u = (0, 0, . . . 0) + (u1 , u2 , . . . un ) = (0 + u1 , 0 + u2 , . . . 0 + un ) = (u1 , u2 , . . . un ) = u.

Since I already showed that addition of vectors is commutative, it follows that u + 0 = 0 as well. This
verifies Axiom 3.
If u = (u1 , u2 , . . . un ) ∈ F n , then I’ll define −u = (−u1 , −u2 , . . . − un ). Then

u + (−u) = (u1 , u2 , . . . un ) + (−u1 , −u2 , . . . − un ) = (u1 − u1 , u2 − u2 , . . . un − un ) = (0, 0, . . . 0) = 0.

Commutativity of addition gives (−u) + u = 0 as well, and this verifies Axiom 4.

I’ll write out the proof Axiom 6 in a little more detail. Let p, q ∈ F , and let (a1 , . . . , an ) ∈ F n . Then

(p + q)(a1 , . . . , an ) = ((p + q)a1 , . . . , (p + q)an ) Definition of scalar multiplication

= (pa1 + qa1 , . . . , pan + qan ) Field axiom: Distributivity
= (pa1 , . . . , pan ) + (qa1 , . . . , qan ) Definition of vector additon
= p(a1 , . . . , an ) + q(a1 , . . . , an ) Definition of scalar multiplication

You can see that checking the axioms amounts to writing out the vectors in component form, applying
the definitions of vector addition and scalar multiplication, and using the axioms for a field.

3
While all vector spaces “look like” F n (at least if “n” is allowed to be infinite — the fancy word is
“isomorphism”), you should not assume that a given vector space is F n , unless you’re explicitly told that it
is. We’ll see examples (like C[0, 1] below) where it’s not easy to see why a given vector space “looks like”
F n.
In discussing matrices, we’ve referred to a matrix with a single row as a row vector and a matrix with
a single column as a column vector.
 
1
[1 2 3] 2
3
row vector column vector
The elements of F n are just ordered n-tuples, not matrices. In certain situations, we may “identify” a
vector in F n with a row vector or a column vector in the obvious way:
 
1
(1, 2, 3) ↔ [ 1 2 3 ] ↔  2 
3
(Very often, we’ll use a column vector, for reasons we’ll see later.) I will mention this identification
explicitly if we’re doing this, but for right now just think of (1, 2, 3) as an ordered triple, not a matrix.
You may have seen examples of F n -vector spaces before.
For instance, R3 consists of 3-dimensional vectors with real components, like

1
(3, −2, π) or , 0, −1.234 .
2
You’re probably familiar with addition and scalar multiplication for these vectors:
(1, −2, 4) + (4, 5, 2) = (1 + 4, −2 + 5, 4 + 2) = (5, 3, 6).
7 · (−2, 0, 3) = (7 · (−2), 7 · 0, 7 · 3) = (−14, 0, 21).
Note: Some people write (3, −2, π) as “h3, −2, πi”, using angle brackets to distinguish vectors from
points.
Recall that Z3 is the field {0, 1, 2}, where the operations are addition and multiplication mod 3. Thus,
Z23 consists of 2-dimensional vectors with components in Z3 . Since each of the two components can be any
element in {0, 1, 2}, there are 3 · 3 = 9 such vectors:
(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2).
Here are examples of vector addition and multiplication in Z23 :
(1, 2) + (1, 1) = (1 + 1, 2 + 1) = (2, 0).
2 · (2, 1) = (2 · 2, 2 · 1) = (1, 2).
We can picture elements of F as points in n-dimensional space. Let’s look at R2 , since it’s easy to
n

draw the pictures. R2 is the x-y-plane, and elements of R2 are points in the plane:
y

(2,3)

(-3,0)
x

(2,-2)

(-1,-4)

4
In the picture above, each grid square is 1 × 1. The vectors (2, 3), (2, −2), (−3, 0), and (−1, −4) are
shown.
Vectors in Rn are often drawn as arrows going from the origin (0, 0, . . . , 0) to the corresponding point.
Here’s how the vectors in the previous picture look when represented with arrows:
y

(2,3)

(-3,0)
x

(2,-2)

(-1,-4)

If the x-component is negative, the arrow goes to the left; if the y-component is negative, the arrow
goes down.
When you represent vectors in Rn as arrows, the arrows do not have to start at the origin. For instance,
2
in R the vector (3, 2) can be represented by any arrow which goes 3 units in the x-direction and 2 units in
the y-direction, from the start of the arrow to the end of the arrow. All of the arrows in the picture below
represent the vector (3, 2):
y

As long as the length and direction of the arrow don’t change as it is moved around, it represents the
same vector.
Representing vectors in Rn as arrows gives us a way of picturing vector addition, vector subtraction,
and scalar multiplication.
To add vectors a and b represented as arrows, move one of the arrows — say b — so that it starts at
the end of the vector a. As you move b, keep its length and direction the same:
b

a a a+b

b
As we noted earlier, if you don’t change an arrow’s length or direction, it represents the same vector.
So the new vector is still b. The sum a + b is represented by the arrow that goes from the start of a to the
end of b.

5
You can also leave b alone and move a so it starts at the end of b. Then b + a is the arrow going from
the start of b to the end of a. Notice that it’s the same as the arrow a + b, which reflects the commutativity
of vector addition: a + b = b + a.
b

a+b
a
a
b+a

This picture also shows that you can think of a + b (or b + a) as the arrow given by the diagonal of the
parallelogram whose sides are a and b.
You can add more than two vectors in the same way. Move the vectors to make a chain, so that the
next vector’s arrow starts at the end of the previous vector’s arrow. The sum is the arrow that goes from
the start of the first arrow to the end of the last arrow:

a
a+b+c+d

To subtract a vector b from a vector a — that is, to do a − b — draw the arrow from the end of b to
the end of a. This assumes that the arrows for a and b start at the same point:

a-b

To see that this picture is correct, interpret it as an addition picture, where we’re adding a − b to b.
The sum (a − b) + b = a should be the arrow from the start of b to the end of a − b, which it is.

When a vector a is multiplied by a real number k to get ka, the arrow representing the vector is scaled
up by a factor of k. In addition, if k is negative, the arrow is “flipped” 180◦ , so it points in the opposite

6
direction to the arrow for a.

a
2a
-3 a

In the picture above, the vector 2a is twice as long as a and points in the same direction as a. The
vector −3a is 3 times as long as a, but points in the opposite direction.

Example. Three vectors in R2 are shown in the picture below.

w
u

Draw the vector u − 2v + 3w.

I start by constructing 2v, an arrow twice as long as v in the same direction as v. I place it so it starts
at the same place as u. Then the arrow that goes from the end of 2v to the end of u is u − 2v.

u-2v+3w
u-2v
2v

Next, I construct 3w, an arrow 3 times as long as w in the same direction as w. I move 3w so it starts
at the end of u − 2v. Then the arrow from the start of u − 2v to the end of 3w is u − 2v + 3w.

While we can draw pictures of vectors when the field of scalars is the real numbers R, pictures don’t
work quite as well with other fields. As an example, suppose the field is Z5 = {0, 1, 2, 3, 4}. Remember that
the operations in Z5 are addition mod 5 and multiplication mod 5. So, for instance,

4 + 3 = 2 and 2 · 4 = 3.

We saw that R2 is just the x-y plane. What about Z25 ? It consists of pairs (a, b) where a and b are
elements of Z5 . Since there are 5 choices for a and 5 choices for b, there are 5 · 5 = 25 elements in Z25 . We

7
can picture it as a 5 × 5 grid of dots:
4
3
2
1
0
0 1 2 3 4
The dot corresponding to the vector (3, 2) is circled as an example.
Picturing vectors as arrows seems to work until we try to do vector arithmetic. For example, suppose
v = (3, 4) in Z25 . We can represent it with an arrow from the origin to the point (3, 4).
Suppose we multiply v by 2. You can check that 2v = (1, 3).
Here’s a picture showing v = (3, 4) and 2v = (1, 3):

4 4
v
3 3
2 2
2v
1 1
0 0
0 1 2 3 4 0 1 2 3 4

In R2 , we’d expect 2v to have the same direction as v and twice the length. You can see that it doesn’t
work that way in Z25 .
What about vector addition in Z25 ? Suppose we add (2, 1) and (2, 4):

(2, 1) + (2, 4) = (2 + 2, 1 + 4) = (4, 0).

If we represent the vectors as arrows and try to add the arrows as we did in R2 , we encounter problems.
First, when I move (2, 4) so that it starts at the end of (2, 1), the end of (2, 4) sticks outside of the 5 × 5 grid
which represents Z25 .

4
3
2
1
0
0 1 2 3 4
If I ignore this problem and I draw the arrow from the start of (1, 2) to the end of (2, 4), the diagonal
arrow which should represent the sum looks very different from the actual sum arrow (4, 0) (the horizontal
arrow in the picture) — and as with (2, 4), the end of the sum arrow sticks outside the grid which represents
Z25 .
You can see that thinking of vectors as arrows has limitations. It’s okay for vectors in Rn .
What about thinking of vectors as “lists of numbers”? That seemed to work in the examples above in
Rn and in Z25 . In general, this works for the F n vector spaces for finite n, but those aren’t the only vector
spaces.
Here are some examples of vector spaces which are not F n ’s, at least for finite n.

8
The set R[x] of polynomials with real coefficients is a vector space over R, using the standard operations
on polynomials. For example, you add polynomials and multiply them by numbers in the usual ways:

(2x2 + 3x + 5) + (x3 + 7x − 11) = x3 + 2x2 + 10x − 6.

4 · (−3x2 + 10) = −12x2 + 40.

Unlike Rn , the set of polynomials R[x] is infinite dimensional. (We’ll discuss the dimension of a
vector space more precisely later). Intuitively, you need an infinite set of polynomials, like 1, x, x2 , x3 , . . . to
“construct” all the elements of R[x].
You might notice that we can represent polynomials as “lists of numbers”, as long as we’re willing to
allow infinite lists. For example,

x3 + 2x2 + 10x − 6 = (−6, 10, 2, 1, 0, 0, . . .), and − 12x2 + 40 = (40, 0, −12, 0, 0, . . .).

We have to begin with the lowest degree coefficient and work our way up, because polynomials can have
arbitrarily large degree. So a polynomial whose highest power term was 3x100 might have nonzero numbers
from the zeroth slot up to the “3” in the 101st slot, followed by an infinite number of zeros.
Not bad! It’s hard to see how we could think of these as “arrows”, but at least we have something like
our earlier examples.
However, sometimes you can’t represent an element of a vector space as a “list of numbers”, even if you
allow an “infinite list”.
Let C([0, 1]) denote the continuous real-valued functions defined on the interval 0 ≤ x ≤ 1. You add
functions pointwise:
(f + g)(x) = f (x) + g(x) for f, g ∈ C([0, 1]).
From calculus, you know that the sum of continuous functions is a continuous function. For instance, if
f (x) = ex and g(x) = sin(x3 + 1), then

f (x) + g(x) = ex + sin(x3 + 1).

If a ∈ R and f ∈ C([0, 1]), define scalar multiplication in pointwise fashion:

(af )(x) = a · f (x).

For example, if f (x) = x2 and a = 3, then

(af )(x) = 3x2 .

These operations make C([0, 1]) into an R-vector space.

C([0, 1]) is infinite dimensional just like R[x]. However, its dimension is uncountably infinite, while
R[x] has countably infinite dimension over R. We can’t represent elements of C([0, 1]) as a “list of
numbers”, even infinite lists of numbers.
Thinking of vectors as arrows or lists of numbers is fine where it’s appropriate. But be aware that those
ways of thinking about vectors don’t apply in every case.
Having seen some examples, let’s wind up by proving some easy properties of vectors in vector spaces.
The next result says that many of the “obvious” things you’d assume about vector arithmetic are true.
Proposition. Let V be a vector space over a field F .
(a) 0 · x = 0 for all x ∈ V .
Note: The “0” on the left is the number 0 in the field F , while the “0” on the right is the zero vector
in V .

9
(b) k · 0 = 0 for all k ∈ F .
Note: On both the left and right, “0” denotes the zero vector in V .
(c) (−1) · x = −x for all x ∈ V .
(d) −(−x) = x for all x ∈ V .
Proof. (a) As I noted above, the “0” on the left is the zero in F , whereas the “0” on the right is the zero
vector in V . We use a little trick, writing 0 as 0 + 0:

0 · x = (0 + 0) · x = 0 · x + 0 · x.

The first step used the definition of the number zero: “Zero plus anything gives the anything”, so take
the “anything” to be the number 0 itself. The second step used distributivity.
Next, I’ll subtract 0 · x from both sides. Just this once, I’ll show all the steps using the axioms. Start
with the equation above, and add −(0 · x) to both sides:

0·x = 0·x+0·x
0 · x + [−(0 · x)] = (0 · x + 0 · x) + [−(0 · x)]
0 = (0 · x + 0 · x) + [−(0 · x)] (Axiom (4))
0 = 0 · x + (0 · x + [−(0 · x)]) (Axiom (1))
0 = 0·x+0 (Axiom (4))
0=0·x (Axiom (3))

Normally, I would just say: “Subtracting 0 · x from both sides, I get 0 = 0 · x.” It’s important to go
through a few simple proofs based on axioms to ensure that you really can do them. But the result isn’t very
surprising: You’d expect “zero times anything to equal zero”. In the future, I won’t usually do elementary
proofs like this one in such detail.
(b) Note that “0” on both the left and right denotes the zero vector, not the number 0 in F . I use the same
idea as in the proof of (a):
k · 0 = k · (0 + 0) = k · 0 + k · 0.
The first step used the definition of the zero vector, and the second used distributivity. Now subtract
k · 0 from both sides to get 0 = k · 0.
(c) (The “−1” on the left is the scalar −1; the “−x” on the right is the “negative” of x ∈ V .)

(−1) · x + x = (−1) · x + 1 · x = ((−1) + 1) · x = 0 · x = 0.

(d)
−(−x) = (−1) · [(−1) · x] = [(−1) · (−1)]x = 1 · x = x.

c 2022 by Bruce Ikenaga 10

Subspaces

Definition. Let V be a vector space over a field F , and let W ⊂ V , W 6= ∅. W is a subspace of V if:

(a) If u, v ∈ W , then u + v ∈ W .

(b) If k ∈ F and u ∈ W , then ku ∈ W .

In other words, W is closed under addition of vectors and under scalar multiplication.

If we draw the vectors as arrows, we can picture the axioms in this way:

ku
u+v
v
u
u

If u and v are in W, If k is a number and

then so is u + v. u is in W, then so is k u.

Remember that not all vectors can be drawn as arrows, so in general these pictures are just aids to your
intuition.

A subspace W of a vector space V is itself a vector space, using the vector addition and scalar multi-
plication operations from V . If you go through the axioms for a vector space, you’ll see that they all hold
in W because they hold in V , and W is contained in V . Thus, the subspace axioms simply ensure that the
vector addition and scalar multiplication operations from V “don’t take you outside of W ” when applied to
vectors in W .

Remark. If W is a subspace, then axiom (a) says that sum of two vectors in W is in W . You can show
using induction that if x1 , . . . , xn ∈ W , then x1 + · · · + xn ∈ W for any n ≥ 1.

What do subspaces “look like”?

The subspaces of the plane R2 are {0}, R2 , and lines passing through the origin. In the picture below,
the lines A and B are subspaces of R2 .

y
A
B

1
In R3 , the subspaces are the {0}, R3 , and lines or planes passing through the origin.

A plane through the origin

Similar statements hold for Rn .

We’ll see below that a subspace must contain the zero vector, which explains why the examples I just
gave are sets which pass through the origin.
As we’ve seen earlier, you get very different pictures of vectors when F is a field other than R. In these
cases, pictures for subspaces are also different. For example, consider the following subspace of Z25 :

S = {(0, 0), (2, 3), (4, 1), (1, 4), (3, 2)}.

Here’s a picture, with the gray dots denoting the 5 points of S:

4
3
2
1
0
0 1 2 3 4

While 4 of the points lie on a “line”, the “line” is not a line through the origin. The origin is in S, but
it doesn’t lie on the same “line” as the other points.
Or consider the vector space C(R) over R, consisting of continuous functions R → R. There is a subspace
consisting of all multiples of ex — so things like 2ex , −πex , 1.79132ex, 0 · ex , and so on. Here’s a picture
which shows the graph of some of the elements of this subspace:

Of course, there are actually an infinite number of “graphs” (functions) in this subspace — I’ve only
drawn a few. You can see our subspace is pretty far from “a line through the origin”, even though it consists
of all multiples of a single vector.

2
In what follows, we’ll look at properties of subspaces, and discuss how to check whether a set is or is
not a subspace.
First, every vector space contains at least two “obvious” subspaces, as described in the next result.
Proposition. if V is a vector space over a field F , then {0} and V are subspaces of V .
Proof. I’ll do the proof for {0} by way of example. First, I have to take two vectors in {0} and show
that their sum is in {0}. But {0} contains only the zero vector 0, so my “two” vectors are 0 and 0 — and
0 + 0 = 0, which is in {0}.
Next, I have to take k ∈ F and a vector in {0} — which, as I just saw, must be 0 — and show that
their product is in {0}. But k · 0 = 0 ∈ {0}. This verifies the second axiom, and so {0} is a subspace.
Obviously, the very uninteresting vector space consisting of just a zero vector (i.e. V = {0}) has only
the one subspace V = {0}.
If a vector space V is nonzero and one-dimensional — roughly speaking, if V “looks like” a line — then
{0} and V are the only subspaces, and they are distinct. In this case, V consists of all multiples kv of any
nonzero vector v ∈ V by all scalars k ∈ F .
Beyond those cases, a vector space V always has subspaces other than {0} and V . For example, if
V 6= 0, take a nonzero vector x ∈ V and consider the set of all multiples kx of x by scalars k ∈ F . You can
check that this is a subspace — the “line” passing through x.
If you want to show that a subset of a vector space is a subspace, you can combine the verifications for
the two subspace axioms into a single verification.
Proposition. Let V be a vector space over a field F , and let W be a subset of V .
W is a subspace of V if and only if u, v ∈ W and k ∈ F implies ku + v ∈ W .
Proof. Suppose W is a subspace of V , and let u, v ∈ W and k ∈ F . Since W is closed under scalar
multiplication, ku ∈ W . Since W is closed under vector addition, ku + v ∈ W .
Conversely, suppose u, v ∈ W and k ∈ F implies ku + v ∈ W . Take k = 1: Our assumption says that if
u, v ∈ W , then u + v ∈ W . This proves that W is closed under vector addition. Again in our assumption,
take v = 0. The assumption then says that if u ∈ W and k ∈ F , then ku ∈ W . This proves that W is closed
under scalar multiplication. Hence, W is a subspace.
Note that the two axioms for a subspace are independent: Both can be true, both can be false, or one
can be true and the other false. Hence, some of our examples will ask that you check each axiom separately,
proving that it holds if it’s true and disproving it by a counterexample if it’s false.
Lemma. Let W be a subspace of a vector space V .
(a) The zero vector is in W .
(b) If w ∈ W , then −w ∈ W .
Note: These are not part of the axioms for a subspace: They are properties a subspace must have. So
if you are checking the axioms for a subspace, you don’t need to check these properties. But on the other
hand, if a subset does not have one of these properties (e.g. the subset doesn’t contain the zero vector), then
it can’t be a subspace.
Proof. (a) Take any vector w ∈ W (which you can do because W is nonempty), and take 0 ∈ F . Since W
is closed under scalar multiplication, 0 · w ∈ W . But 0 · w = 0, so 0 ∈ W .
(b) Since w ∈ W and −1 ∈ F , (−1) · w = −w is in W .
Example. Consider the real vector space R2 , the usual x-y plane.
(a) Show that the following sets are subspaces of R2 :

W1 = {(x, 0) | x ∈ R} and W2 = {(0, y) | y ∈ R}

3
(These are just the x and y-axes.)
(b) Show that the union W1 ∪ W2 is not a subspace.
(a) I’ll check that W1 is a subspace. (The proof for W2 is similar.) First, I have to show that two elements
of W1 add to an element of W1 . An element of W1 is a pair with the second component 0. So (x1 , 0), (x2 , 0)
are two arbitrary elements of W1 . Add them:

(x1 , 0) + (x2 , 0) = (x1 + x2 , 0).

(x1 + x2 , 0) is in W1 , because its second component is 0. Thus, W1 is closed under sums.

Next, I have to show that W1 is closed under scalar multiplication. Take a scalar k ∈ R and a vector
(x, 0) ∈ W1 . Take their product:
k · (x, 0) = (kx, 0).
The product (kx, 0) is in W1 because its second component is 0. Therefore, W1 is closed under scalar
multiplication.
Thus, W1 is a subspace.
Notice that in doing the proof, I did not use specific vectors in W1 like (42, 0) or (−17, 0). I’m trying to
prove statements about arbitrary elements of W1 , so I use “variable” elements.
(b) I’ll show that W1 ∪ W2 is not closed under vector addition. Remember that the union of two sets consists
of everything in the first set or in the second set (or in both). Thus, (3, 0) ∈ W1 ∪ W2 , because (3, 0) ∈ W1 .
And (0, 17) ∈ W1 ∪ W2 , because (0, 17) ∈ W2 . But

(3, 0) + (0, 17) = (3, 17) 6∈ W1 ∪ W2 .

(3, 17) 6∈ W1 because its second component isn’t 0. And (3, 17) 6∈ W2 because its first component isn’t
0. Since (3, 17) isn’t in either W1 or W2 , it’s not in their union.
Pictorially, it’s easy to see: (3, 17) doesn’t lie in either the x-axis (W1 ) or the y-axis (W2 ):

y (3,17)

(0,17)

(3,0) x

Thus, W1 ∪ W2 is not a subspace.

You can check, however, that W1 ∪ W2 is closed under scalar multiplication: Multiplying a vector on the
x-axis by a number gives another vector on the x-axis, and multiplying a vector on the y-axis by a number
gives another vector on the y-axis.
The last example shows that the union of subspaces is not in general a subspace. However, the intersec-
tion of subspaces is a subspace, as we’ll see later.
Example. Prove or disprove: The following subset of R3 is a subspace of R3 :

W = {(x, y, 1) | x, y ∈ R}.

If you’re trying to decide whether a set is a subspace, it’s always good to check whether it contains
the zero vector before you start checking the axioms. In this case, the set consists of 3-dimensional vectors
whose third components are equal to 1. Obviously, the zero vector (0, 0, 0) doesn’t satisfy this condition.

4
Since W doesn’t contain the zero vector, it’s not a subspace of R3 .
Example. Consider the following subset of the vector space R2 :

W = {(x, sin x) | x ∈ R} .

Check each axiom for a subspace (i.e. closure under addition and closure under scalar multiplication).
If the axiom holds, prove it; if the axiom doesn’t hold, give a specific counterexample.
Notice that this problem is open-ended, in that you aren’t told at the start whether a given axiom holds
or not. So you have to decide whether you’re going to try to prove that the axiom holds, or whether you’re
going to try to find a counterexample. In these kinds of situations, look at the statement of the problem —
in this case, the definition of W . See if your mathematical experience causes you to lean one way or another
— if so, try that approach first.
If you can’t make up your mind, pick either “prove” or “disprove” and get started! Usually, if you pick
the wrong approach you’ll know it pretty quickly — in fact, getting stuck taking the wrong approach may
give you an idea of how to make the right approach work.
Suppose I start by trying to prove that the set is closed under sums. I take two vectors in W — say
(x, sin x) and (y, sin y). I add them:

(x, sin x) + (y, sin y) = (x + y, sin x + sin y).

The last vector isn’t in the right form — it would be if sin x + sin y was equal to sin(x + y). Based
on your knowledge of trigonometry, you should know that doesn’t sound right. You might reason that if a
simple identity like “sin x + sin y = sin(x + y)” was true, you probably would have learned about it!
I now suspect that the sum axiom doesn’t hold. I need a specific counterexample — that is, two vectors
in W whose sum is not in W .
To choose things for a counterexample, you should try to choose things which are not too “special”
or your “counterexample” might accidentally satisfy the axiom, which is not what you want. At the same
time, you should avoid things which are too “ugly”, because it makes the counterexample less convincing
if a computer is needed (for instance) to compute the numbers. You may need a few tries to find a good
counterexample. Remember that the things in your counterexample should involve specific numbers, not
“variables”. π π
Returning to our problem, I need two vectors in W whose sum isn’t in W . I’ll use , sin and
2 2
(π, sin π). Note that π π π
, sin = , 1 ∈ W and (π, sin π) = (π, 0) ∈ W.
2 2 2
On the other hand,

π π π 3π
, sin + (π, sin π) = , 1 + (π, 0) = ,1 .
2 2 2 2

3π 3π
But , 1 6∈ W because sin = −1 6= 1.
2 2
How did I choose the two vectors? I decided to use a multiple of π in the first component, because the
sine of a multiple of π (in the second component) comes out to “nice numbers”. If I had used (say) (1, sin 1),
I’d have needed a computer to tell me that sin 1 ≈ 0.841470984808 . . ., and a counterexample would have
looked kind of ugly. In addition, an approximation of this kind really isn’t a proof.
π
How did I know to use and π? Actually, I didn’t know till I did the work that these numbers would
2
produce a counterexample — you often can’t know without trying whether the numbers you’ve chosen will
work.
Thus, W is not closed under vector addition, and so it is not a subspace. If that was the question, I’d
be done, but I was asked to each axiom. It is possible for one of the axioms to hold even if the other one
does not. So I’ll consider scalar multiplication.

5
I’ll givea counterexample to show that the scalar multiplication axiom doesn’t hold. I need a vector in
π π
W ; I’ll use , sin again. I also need a real number; I’ll use 2. Now
2 2
π π π
2· , sin = π, 2 sin = (π, 2).
2 2 2

But (π, 2) 6∈ W , because sin π = 0 6= 2.

Thus, neither the addition axiom nor the scalar multiplication axiom holds. Obviously, W is not a
subspace.
Example. Let F be a field, and let A, B ∈ M (n, F ). Consider the following subset of F n :

W = {v ∈ F n | Av = Bv}.

Show that W is a subspace of F n .

This set is defined by a property rather than by appearance, and axiom checks for this kind of set often
give people trouble. The problem is that elements of W don’t “look like” anything — if you need to refer
to a couple of arbitrary elements of W , you might call them u and v (for instance). There’s nothing about
the symbols u and v which tells you that they belong to W . But u and v are like people who belong to a
club: You can’t tell from their appearance that they’re club members, but you could tell from the property
that they both have membership cards.
When you write a proof, you start from assumptions and reason to the conclusion. You should not
start with the conclusion and “work backwards”. Sometimes reasoning that works in one direction might
not work in the opposite direction. For example, suppose x is a real number. If x = 1, then x2 = 1. But if
x2 = 1, it doesn’t follow that x = 1 — maybe x = −1.
Reasoning in mathematics is deductive, not confirmational.
In this problem, to check closure under addition, you assume that u and v and in W , and show that
u + v is in W . You do not start by assuming that u + v is in W .
Nevertheless, in deciding how to do a proof, it’s okay to work backwards “on scratch paper” to figure
out what to do. Here’s a way of sketching out a proof that allows you to work backward while ensuring that
the steps work forward as well.
Start by putting down the assumptions u ∈ W and v ∈ W at the top and the conclusion u + v ∈ W at
the bottom. Leave space in between to work.

u ∈W v ∈W
Au = B u Av = B v
A + Av = B u + B v
A (u + v) = B (u + v)
u + v ∈W

Next, use the definition of W to translate each of the statements: u ∈ W means Au = Bu, so put
“Au = Bu” below “u ∈ W ”. Likewise, v ∈ W means Av = Bv, so put “Av = Bv” below “v ∈ W ”. On the
other hand, u + v ∈ W means A(u + v) = B(u + v), but since “u + v ∈ W ” is what we want to conclude,
put “A(u + v) = B(u + v)” above “u + v ∈ W ”.

u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v
A(u + v) = B (u + v)
u + v ∈W

6
At this point, you can either work downwards from Au = Bu and Av = Bv, or upwards from A(u + v) =
B(u + v). But if you work upwards from A(u + v) = B(u + v), you must ensure that the algebra you do is
reversible — that it works downwards as well.
I’ll work downwards from Au = Bu and Av = Bv. What algebra could I do which would get me closer
to A(u + v) = B(u + v)? Since the target involves addition, it’s natural to add the equations:

u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v Add equations

A(u + v) = B (u + v)
u + v ∈W

At this point, I’m almost done. To finish, I have to explain how to go from Au + Av = Bu + Bv to
A(u + v) = B(u + v). You can see that I just need to factor A out of the left side and factor B out of the
right side:
u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v
Factor out A and B
A(u + v) = B (u + v)
u + v ∈W
The proof is complete! If you were just writing the proof for yourself, the sketch above might be good
enough. If you were writing this proof more formally — say for an assignment or a paper — you might add
some explanatory words.
For instance, you might say: “I need to show that W is closed under addition. Let u ∈ W and let
v ∈ W . By definition of W , this means that Au = Bu and Av = Bv. Adding the equations, I get
Au = Av = Bu + Bv. Factoring A out of the left side and B out of the right side, I get A(u + v) = B(u + v).
By definition of W , this means that u + v ∈ W . Hence, W is closed under addition.”
By the way, be careful not to write things like “A(u + v) + B(u + v) ∈ W ” — do you see why this doesn’t
make sense? “A(u + v) = B(u + v)” is an equation that u + v satisfies. You can’t write “∈ W ”, since an
equation can’t be an element of W . Elements of W are vectors. You say “u + v ∈ W ”, as in the last line.
Here’s a sketch of a similar “fill-in” proof for closure under scalar multiplication:

u ∈W k ∈F

Au = Bu

kAu = kBu

A(ku) = B(ku)

ku ∈W

Try working through the proof yourself.

Example. Consider the following subset of the polynomial ring R[x]:

V = {f (x) ∈ R[x] | f (2) = 1}.

Show that V is not a subspace of R[x].

The zero polynomial (i.e. the zero vector) is not in V , because the zero polynomial does not give 1 when
you plug in x = 2. Hence, V is not a subspace.

7
Alternatively, the constant polynomial f (x) = 1 is an element of V — it gives 1 when you plug in 2 —
but 2 · f (x) is not. So V is not closed under scalar multiplication.
See if you can give an example which shows that V is not closed under vector addition.

Proposition. If A is an m × n matrix over the field F , consider the set V of n-dimensional vectors x which
satisfy
Ax = 0.

Then V is a subspace of F n .

Proof. Suppose x, y ∈ V . Then Ax = 0 and Ay = 0, so

A(x + y) = Ax + Ay = 0 + 0 = 0.

Hence, x + y ∈ V .
Suppose x ∈ V and k ∈ F . Then Ax = 0, so

A(kx) = k(Ax) = k · 0 = 0.

Therefore, kx ∈ V .
Thus, V is a subspace.

The subspace defined in the last proposition is called the null space of A.

Definition. Let A be an m × n matrix over the field F .

null space(A) = {x ∈ F n | Ax = 0}.

As a specific example of the last proposition, consider the following system of linear equations over R:

w 0

  

1 1 0 1  x  0
  =  .
0 0 1 3 y 0
z 0

You can show by row reduction that the general solution can be written as

w = −s − t, x = s, y = −3t, z = t.

Thus,
w −s − t
   
x  s 
 = .
y −3t
z t

The Proposition says that the set of all vectors of this form constitute a subspace of R4 .
For example, if you add two vectors of this form, you get another vector of this form:

−s − t −s − t′ −(s + s′ ) − (t + t′ )
   ′   
 s   s′ s + s′
+ ′ =  .
  
−3t −3t −3(t + t′ )

t t′ t + t′

You can check directly that the set is also closed under scalar multiplication.

8
In terms of systems of linear equations, a vector (x1 , . . . , xn ) ∈ F n is in the null space of the matrix
A = (aij ) if it’s a solution to the system
a11 x1 + ··· + a1n xn = 0
a21 x1 + ··· + a2n xn = 0
..
.
an1 x1 + ··· + ann xn = 0
In this situation, we say that the vectors (x1 , . . . , xn ) make up the solution space of the system. Since
the solution space of the system is another name for the null space of A, the solution space is a subspace of
F n.
We’ll study the null space of a matrix in more detail later.
Example. C(R) denotes the real vector space of continuous functions R → R. Consider the following subset
of C(R): Z x
S= f ∈ C(R) f (x) = et f (t) dt .
0
Prove that S is a subspace of C(R).
Let f, g ∈ S. Then Z x Z x
f (x) = et f (t) dt and g(x) = et g(t) dt.
0 0
Adding the two equations and using the fact that “the integral of a sum is the sum of the integrals”,
we have Z x Z x Z x
f (x) + g(x) = et f (t) dt + et g(t) dt = et [f (t) + g(t)] dt.
0 0 0
This proves that f (x) + g(x) ∈ S, so S is closed under addition.
Let f ∈ S, so Z x
f (x) = et f (t) dt.
0
Let c ∈ R. Using the fact that constants can be moved into integrals, we have
Z x Z x
t
c · f (x) = c · e f (t) dt = et [c · f (t)] dt.
0 0

This proves that c · f (x) ∈ S, so S is closed under scalar multiplication. Thus, S is a subspace of C(R).

Intersections of subspaces.
We’ve seen that the union of subspaces is not necessarily a subspace. For intersections, the situation
is different: The intersection of any number of subspaces is a subspace. The only signficant issue with the
proof is that we will deal with an arbitrary collection of sets — possibly infinite, and possibly uncountable.
Except for taking care with the notation, the proof is fairly easy.
Theorem. Let \ V be a vector space over a field F , and let {Ui }i∈I be a collection of subspaces of V . The
intersection Ui is a subspace of V .
i∈I
\
Proof. We have to show that Ui is closed under vector addition and under scalar multiplication.
\ i∈I
Suppose x, y ∈ Ui . For x and y to be in the intersection of the Ui , they must be in each Ui for
i∈I
all i ∈ I. So pick a particular i ∈ I; we have x, y ∈ Ui . Now Ui is a subspace,
\ so it’s closed under vector
addition. Hence, x + y ∈ Ui . Since this is true for all i ∈ I, I have x + y ∈ Ui .
i∈I

9
\
Thus, Ui is closed under vector addition.
i∈I
\
Next, suppose k ∈ F and x ∈ Ui . For x to be in the intersection of the Ui , it must be in each Ui
i∈I
for all i ∈ I. So pick a particular i ∈ I; we have x ∈ Ui . Now Ui is a subspace,
\ so it’s closed under scalar
multiplication. Hence, kx ∈ Ui . Since this is true for all i ∈ I, I have kx ∈ Ui .
i∈I
\
Thus, Ui is closed under scalar multiplication.
i∈I
\
Hence, Ui is a subspace of V .
i∈I

You can see that the proof was pretty easy, the two parts being pretty similar. The key idea is that
something is in the intersection of a bunch of sets if and only if it’s in each of the sets. How many sets there
are in the bunch doesn’t matter. If you’re still feeling a little uncomfortable, try writing out the proof for
the case of two subspaces: If U and V are subspaces of a vector space W over a field F , then U ∩ V is a
subspace of W . The notation is easier for two subspaces, but the idea of the proof is the same as the idea
for the proof above.

c 2022 by Bruce Ikenaga 10

9-12-2022
The Span of a Set of Vectors
If you take a set of vectors in a vector space, there’s no reason why that set of vectors should be a
subspace. For instance, consider the following vectors in the R-vector space R3 :

{(1, 3, 2), (−5, 1, 0), (−4, 4, 2)}.

Now (1, 3, 2) + (−5, 1, 0) = (−4, 4, 2), which is in the set. But (−5, 1, 0) + (−4, 4, 2) = (−9, 5, 2) is not
in the set, so the set is not closed under vector addition. And 2 · (1, 3, 2) = (2, 6, 4) is not in the set, so the
set is not closed under scalar multiplication. The set is not a subspace of R3 .
Suppose we try to fix this by finding a subspace of R3 which contains the three vectors. Well, that’s
easy: We could take R3 ! This would work for any set of vectors in R3 , but for that reason, it isn’t a very
interesting answer.
It might be better to ask for the “smallest” subspace of R3 which contains the three vectors. This set
will be called the span of the given set of vectors. We might approach this by asking: What vectors do
we need to add to the set of three vectors to make it a subspace? Since subspaces are closed under vector
addition, we’ll need to throw in all the vectors which you get by adding the three vectors. You can check
that we’d need to throw in two more vectors, so we now have 5. But subspaces are also closed under scalar
multiplication, so we need to throw in multiples of these 5 vectors by every element of R. Now we have an
infinite number of vectors. Oops! We need to ensure that the set is closed under addition, so we need to
go back and throw in all sums of the infinitely many vectors we have so far. Then, since we threw in some
additional vectors, we need to take scalar multiples of those, and . . . .
Our approach is a reasonable one, but we clearly need a more systematic way of carrying it out. We
will see how to do it in this section.
Building subspaces involves forming sums and scalar multiples. Rather than thinking of doing these
two operations separately, we’ll define a concept which does both at once.

Definition. If v1 , v2 , . . . , vn are vectors in a vector space V over a field F , a linear combination of the
v’s is a vector
k1 v1 + k2 v2 + · · · + kn vn for k1 , k2 , . . . kn ∈ F.

Notice that if you take two vectors v1 and v2 and take the scalars to be k1 = 1 and k2 = 1, you get
v1 + v2 , which is vector addition. And if you just take one vector v1 , then k1 v1 represents a scalar multiple
of that vector. Thus, the idea of a linear combination contains the operations of vector addition and scalar
multiplication, but allows us to do many such operations all at once.

Let’s see a numerical example. Take u = (1, 2) and v = (−3, 7) in R2 . Here is a linear combination of u
and v:
2u − 5v = 2 · (1, 2) − 5 · (−3, 7) = (17, −31).
√ π2
( 2−17)u+ v is also a linear combination of u and v. And u and v are themselves linear combinations
4
of u and v, as is the zero vector:

u = 1 · u + 0 · v, v = 0 · u + 1 · v, 0 = 0 · u + 0 · v.

On the other hand, there are vectors in R2 which are not linear combinations of p = (1, −2) and
q = (−2, 4). Do you see how this pair is different from the first?

Definition. If S is a set of vectors in a vector space V , the span hSi of S is the set of all linear combinations
of vectors in S.

Remark. hSi is also referred to as the subspace generated by S.

1
If S is a subset of a subspace W , then S spans W (or S is a spanning set for W , or S generates W )
if W = hSi.
If S is a finite set of vectors, we could just describe hSi as the set of all linear combinations of the
elements of S. If S is infinite, a particular linear combination of vectors from S involves finitely many
vectors from S — we “grab” finitely many vectors from S at a time — but we’re doing this in all possible
ways.
Note that S ⊂ hSi. For if s is a vector in S, then 1 · s = s is a linear combination of a vector from S,
so it is in hSi.

Here’s a pictorial example in R2 . Take two vectors u and v:

v
u

It turns out that the span of u and v is all of R2 ; that is, if w ∈ R2 , then w can be written as a linear
combination of u and v. To see why this is reasonable, take a vector w and “project” w onto the lines
containing the vectors u and v:

v
u

I can scale u up by multiplying by an appropriate number to get a vector au which is the projection of
w on the line of u. Likewise, I can scale v up by multiplying by an appropriate number to get a vector bv
which is the projection of w on the line of v.

au bv

w=au+bv

As you can see in the picture, w is the diagonal of the parallelogram whose sides are au and bv, so
w = au + bv.
Try this with some other vectors in place of w. Of course, a picture isn’t a proof, but this example
should help you get an idea of what the span of a set of vectors means geometrically.

Theorem. If S is a subset of a vector space V , the span hSi of S is a subspace of V which contains S.

Proof. We already observed that S ⊂ hSi.

Here are typical elements of the span of S:

j1 u1 + j2 u2 + · · · + jn un , k1 v1 + k2 v2 + · · · + kn vn .

2
Here the j’s and k’s are scalars and the u’s and v’s are elements of S.
Take two elements of the span and add them:

(j1 u1 + j2 u2 + · · · + jn un ) + (k1 v1 + k2 v2 + · · · + km vm ) = j1 u1 + j2 u2 + · · · + jn un + k1 v1 + k2 v2 + · · · + km vm .

This sum is an element of the span, because it’s a sum of vectors in S, each multiplied by a scalar —
that is, a linear combination of elements of S. Thus, the span is closed under taking sums.
Take an element of the span and multiply it by a scalar:

k · (k1 v1 + k2 v2 + · · · kn vn ) = kk1 v1 + kk2 v2 + · · · + kkn vn .

This is an element of the span, because it’s a linear combination of elements of S. Thus, the span is
closed under scalar multiplication.
Therefore, the span is a subspace.
Example. Prove that the span of (3, 1, 0) and (2, 1, 0) in R3 is

V = {(a, b, 0) | a, b ∈ R} .

To show that two sets are equal, you need to show that each is contained in the other. To do this, take
a typical element of the first set and show that it’s in the second set. Then take a typical element of the
second set and show that it’s in the first set.
Let W be the span of (3, 1, 0) and (2, 1, 0) in R3 . A typical element of W is a linear combination of the
two vectors:
x · (3, 1, 0) + y · (2, 1, 0) = (3x + 2y, x + y, 0).
Since the sum is a vector of the form (a, b, 0) for a, b ∈ R, it is in V . This proves that W ⊂ V .
Now let (a, b, 0) ∈ V . I have to show that this vector is a linear combination of (3, 1, 0) and h2, 1, 0).
This means that I have to find real numbers x and y such that

x · (3, 1, 0) + y · (2, 1, 0) = (a, b, 0).

If I write the vectors as column vectors, I get

     
3 2 a
x · 1 + y · 1 =  b .
0 0 0

I can write this as the matrix equation

   
3 2 a
1 1 x
=  b.
y
0 0 0

If I solve this system by row reduction, I get

   
3 2 a 1 0 a − 2b
1 1 b → 0 1 −a + 3b 
0 0 0 0 0 0

The solution is
x = a − 2b, y = −a + 3b.
In other words,
(a − 2b) · (3, 1, 0) + (−a + 3b) · (2, 1, 0) = (a, b, 0).

3
Since (a, b, 0) is a linear combination of (3, 1, 0) and (2, 1, 0), it follows that (a, b, 0) ∈ W . This proves
that V ⊂ W .
Since W ⊂ V and V ⊂ W , I have W = V .
Example. Let
S = {(1, 1, 0), (0, 1, 1)} ⊂ R3 .
(a) Prove or disprove: (3, −1, −4) is in the span of S.
(b) Prove or disprove: (5, −2, 6) is in the span of S.
(a) The vector (3, −1, 4) is in the span of (1, 1, 0) and (0, 1, 1) if it can be written as a linear combination of
(1, 1, 0) and (0, 1, 1). So I try to find numbers a and b such that

a · (1, 1, 0) + b · (0, 1, 1) = (3, −1, 4).

I’ll convert this to a matrix equation by writing the vectors as column vectors:
     
1 0 3
a ·  1  + b ·  1  =  −1  .
0 1 −4

By using column vectors (rather than row vectors), we get a familiar kind of matrix equation:
   
1 0 3
1 a
1 =  −1  .
b
0 1 −4

We can solve for a and b by row reducing the augmented matrix:

   
1 0 3 1 0 3
 1 1 −1  →  0 1 −4 
0 1 −4 0 0 0

The solution is a = 3, b = −4. That is,

3 · (1, 1, 0) + (−4) · (0, 1, 1) = (3, −1, −4).

The vector (3, −1, −4) is in the span of S.

(b) Following the approach we took in part (a), we’ll convert the vectors to column vectors to get the equation
     
1 0 5
a ·  1  + b ·  1  =  −2  .
0 1 6

This is equivalent to the matrix equation

   
1 0 5
1 1 a
=  −2  .
b
0 1 6

Row reduce to solve the system:

   
1 0 5 1 0 0
1 1 −2  → 0 1 0
0 1 6 0 0 1

4
The last matrix says “0 = 1”, a contradiction. The system is inconsistent, so there are no such numbers
a and b. Therefore, (5, −2, 6) is not in the span of S.
Thus, to determine whether the vector b ∈ F n is in the span of v1 , v2 , . . . , vm in F n , form the augmented
matrix  
↑ ↑ ↑ ↑
 v1 v2 · · · vm b 
↓ ↓ ↓ ↓
If the system has a solution, b is in the span, and coefficients of a linear combination of the v’s which
add up to b are given by a solution to the system. If the system has no solutions, then b is not in the span
of the v’s.
(In a general vector space where vectors may not be “numbers in slots”, you have to got back to the
definition of spanning set.)
Example. Consider the following set of vectors in R3 :
     
 1 2 1 
 0 ,1, 1 
−1 3 −4
 

Prove that the span of the set is all of R3 .

I have to show that any vector (x, y, z) ∈ R3 can be written as a linear combination of the vectors in
the set. That is, I must find real numbers a, b, c such that
       
1 2 1 x
a ·  0  + b · 1+ c ·  1  = y .
−1 3 −4 z

The matrix equation is     

1 2 1 a x
 0 1 1 b  = y .
−1 3 −4 c z
Row reduce to solve:
 1 
1 0 0 (7x − 11y − z)

1 2 1 x

 8 
 0 1
 1 
1 y → 0 1 0 (x + 3y + z) 
−1 3 −4 z

 8 
1

0 0 1 (−x + 5y − z)
8
I get the ugly solution

1 1 1
a= (7x − 11y − z), b= (x + 3y + z), c= (−x + 5y − z).
8 8 8

This shows that, given any vector (x, y, z), I can find a linear combination of the original three vectors
which equals (x, y, z).
Thus, the span of the original set of three vectors is all of R3 .
Example. Let
S = {(1, 2, 1), (1, 4, 2)} in Z35 .

(a) Determine whether (0, 4, 2) ∈ hSi.

5
(b) Determine whether (1, 1, 1) ∈ hSi.

(a) I want to determine whether there are numbers a, b ∈ Z5 such that

     
1 1 0
a · 2 + b · 4 = 4.
1 2 2

This gives the system    

1 1 0
2 4 a = 4.
b
1 2 2
Form the augmented matrix and row reduce:
   
1 1 0 1 0 3
2 4 4 → 0 1 2
1 2 2 0 0 0

The last matrix says a = 3 and b = 2. Therefore, (0, 4, 2) is in the span of S:

     
1 1 0
3 · 2 + 2 · 4 = 4.
1 2 2

(b) I want to determine whether there are numbers a, b ∈ Z5 such that

     
1 1 1
a · 2 + b · 4 = 1.
1 2 1

This gives the system    

1 1 1
2 4 a
= 1.
b
1 2 1
Form the augmented matrix and row reduce:
   
1 1 1 1 0 0
2 4 1 → 0 1 0
1 2 1 0 0 1

The last row of the row reduced echelon matrix says “0 = 1”. This contradiction implies that the system
is has no solutions. Therefore, (1, 1, 1) is not in the span of S.

In the next example, we go back to the definition of the span of a set of vectors, rather than writing
down a matrix equation.

Example. C(R) is the vector space over R of continuous functions R → R. Let

f (x) = sin x, g(x) = ex , h(x) = x2 .

Show that h is not in hf, gi, the span of f and g.

We’ll give a proof by contradiction.

6
Suppose there are numbers a, b ∈ R such that

x2 = a sin x + bex for all x ∈ R.

Set x = 0. We get
02 = a sin 0 + be0 , so 0 = b.
The equation becomes
x2 = a sin x.
π
Set x = . We get
2 π 2 π π2
= a sin , so = a.
2 2 4
The equation is now
π2
x2 = sin x.
4
5π
Finally, set x = . We get
2
2
π2 25π 2 π2

5π 5π
= sin , or = , so 25 = 1.
2 4 2 4 4

This contradiction shows there are no numbers a, b ∈ R which make the initial equation true. Hence, h
is not in the span of f and g.
Another definition of the span of a set
There is another way to define the span of a set of vectors. The construction is a little more subtle
than the one we gave earlier. It uses the intersection of a (possibly) infinite collection of subspaces, which
we know by an earlier result is a subspace.
Let V be a vector space over a field F , and let S be a set of vectors in V . Let
\
S̃ = {W | S ⊂ W and W is a subspace of V }.

That is, to form S̃ we intersect all subspaces of V which contain S. Note that there is at least one
subspace that contains S, namely V itself. We will show that S̃ is just the span of S, as constructed earlier:
The set of all linear combinations of elements of S.
(By the way, “S̃” is just temporary notation I’m using for this discussion. After this, I’ll just use hSi
to denote the span of S.)
Theorem. S̃ = hSi.
Proof. First, the span hSi is a subspace of V which contains S. So hSi is one of the subspaces being
intersected to construct S̃, and hence
\
S̃ = {W | S ⊂ W and W is a subspace of V } ⊂ hSi.

To complete the proof, I’ll show that hSi ⊂ S̃. Take an element of hSi, namely a linear combination of
elements of S:
a1 s1 + as s2 + · · · + an sn where ai ∈ F and si ∈ S.
Let W be a subspace of V which contains S. Then s1 , s2 , . . . sn ∈ W , and since W is a subspace,

a1 s1 + as s2 + · · · + an sn ∈ W.

7
Thus, a1 s1 + as s2 + · · · + an sn is contained in every subspace which contains S — that is, it is contained
in every subspace being intersected to construct S̃. Hence, it is contained in S̃. This proves that hSi ⊂ S̃.
Therefore, S̃ = hSi.
If you want to construct an object in math using some “building blocks”, there are often two general
ways to do so. The first way is to start with the building blocks, and combine them to build the object
“from the inside out”. This direct approach is what we did in the definition of hSi as all linear combinations
of elements of S. It requires that you know how to put the blocks together to do the “building”.
Another approach is to take all objects of the “right kind” which contain the building blocks and find the
“smallest”. You find the “smallest” such object by intersecting all the objects of the “right kind” containing
the building blocks. In this approach, you will usually have to intersect an infinite number of sets. And you
will need to show that the intersection of sets of the “right kind” is still a set of the “right kind”. This is
the approach we used in intersecting all subspaces containing S.
Of course, if you’ve done things correctly, the two approaches give the same object at the end, as they
did here. Both approaches are useful in mathematics.

c 2022 by Bruce Ikenaga 8

9-14-2022
Linear Independence
Definition. Let V be a vector space over a field F , and let S ⊂ V . The set S is linearly independent if
v1 , . . . , vn ∈ S, a1 , . . . , an ∈ F , and

a1 v1 + · · · + an vn = 0 implies a1 = · · · = an = 0.

A set of vectors which is not linearly independent is linearly dependent. (I’ll usually say “independent”
and “dependent” for short.) Thus, a set of vectors S is dependent if there are vectors v1 , . . . , vn ∈ S and
numbers a1 , . . . , an ∈ F , not all of which are 0, such that

a1 v1 + · · · + an vn = 0.

Note that S could be an infinite set of vectors.

In words, the definition says that if a linear combination of any finite set of vectors in S equals the zero
vector, then all the coefficients in the linear combination must be 0. I’ll refer to such a linear combination
as a trivial linear combination.
On the other hand, a linear combination of vectors is nontrivial if at least one of the coefficients
is nonzero. (“At least one” doesn’t mean “all” — a nontrivial linear combination can have some zero
coefficients, as long as at least one is nonzero.)
Thus, we can also say that a set of vectors is independent if there is no nontrivial linear combination
among finitely many of the vectors which is equal to 0. And a set of vectors is dependent if there is some
nontrivial linear combination among finitely many of the vectors which is equal to 0.

Let’s see a pictorial example of a dependent set. Consider the following vectors u, v, and w in R2 .

w
u
v

I’ll show how to get a nontrivial linear combination of the vectors that is equal to the zero vector.
Project w onto the lines of u and v.

au bv

The projections are multiples au of u and bv of v. Since w is the diagonal of the parallelogram whose
sides are au and bv, we have
w = au + bv, so au + bv − w = 0.

This is a nontrivial linear combination of u, v and w which is equal to the zero vector, so {u, v, w} is
dependent.

In fact, it’s true that any 3 vectors in R2 are dependent, and this pictorial example should make this
reasonable. More generally, if F is a field then any n vectors in F m are dependent if n > m. We’ll prove
this below.

1
Example. If F is a field, the standard basis vectors are

e1 = (1, 0, 0, . . . 0)
e2 = (0, 1, 0, . . . 0)
..
.
en = (0, 0, 0, . . . 1)

Show that they form an independent set in F n .

Write
a1 e1 + a2 e2 + · · · + an en = 0.
I have to show all the a’s are 0. Now

a1 e1 = (a1 , 0, 0, . . . 0)
a2 e2 = (0, a2 , 0, . . . 0)
..
.
an en = (0, 0, 0, . . . an )

So
a1 e1 + a2 e2 + · · · + an en = (a1 , a2 , . . . an ).
Since by assumption a1 e1 + a2 e2 + · · · + an en = 0, I get

(a1 , a2 , . . . an ) = (0, 0, . . . 0).

Hence, a1 = a2 = · · · = an = 0, and the set is independent.

Example. Show that any set containing the zero vector is dependent.
If 0 ∈ S, then 1 · 0 = 0. The left side is a nontrivial (since 1 6= 0) linear combination of vectors in S —
actually, a vector in S. The linear combination is equal to 0. Hence, S is dependent.
Notice that it doesn’t matter what else is in S (if anything).
Example. Show that the vectors (2, −1, 4) and (−6, 3, −12) are dependent in R3 .
I have to find numbers a and b, not both 0, such that

a · (2, −1, 4) + b · (−6, 3, −12) = (0, 0, 0).

In this case, you can probably juggle numbers in your head to see that

3 · (2, −1, 4) + 1 · (−6, 3, −12) = (0, 0, 0).

This shows that the vectors are dependent. There are infinitely many pairs of numbers a and b that
work. In examples to follow, I’ll show how to find numbers systematically in cases where the arithmetic isn’t
so easy.
Example. Suppose u, v, w, and x are vectors in a vector space. Prove that the set {u − v, v − w, w − x, x− u}
is dependent.
Notice that in the four vectors in {u − v, v − w, w − x, x − u}, each of u, v, w, and x occurs once with a
plus sign and once with a minus sign. So

(u − v) + (v − w) + (w − x) + (x − u) = 0.

2
This is a dependence relation, so the set is dependent.
If you can’t see an “easy” linear combination of a set of vectors that equals 0, you may have to determine
independence or dependence by solving a system of equations.
Example. Consider the following sets of vectors in R3 . If the set is independent, prove it. If the set is
dependent, find a nontrivial linear combination of the vectors which is equal to 0.
(a) {(2, 0, −3), (1, 1, 1), (1, 7, 2)}.
(b) {(1, 2, −1), (4, 1, 3), (−10, 1, −11)}.
(a) Write a linear combination of the vectors and set it equal to 0:
       
2 1 1 0
a ·  0  + b · 1 + c · 7 = 0.
−3 1 2 0

I have to determine whether this implies that a = b = c = 0.

Note: When I convert vectors given in “parenthesis form” to “matrix form”, I’ll turn the vectors into
column vectors as above. This is consistent with the way I’ve set up systems of linear equations. Thus,
 
2
(2, 0, −3) became  0  .
−3

The vector equation above is equivalent to the matrix equation

    
2 1 1 a 0
 0 1 7 b  = 0.
−3 1 2 c 0

Row reduce to solve:    

2 1 1 0 1 0 0 0
 0 1 7 0 → 0 1 0 0
−3 1 2 0 0 0 1 0
Note: Row operations won’t change the last column of zeros, so you don’t actually need to write it when
you do the row reduction. I’ll put it in to avoid confusion.
The last matrix gives the equations

a = 0, b = 0, c = 0.

Therefore, the vectors are independent.

(b) Write        
1 4 −10 0
a ·  2  + b · 1 + c ·  1  = 0.
−1 3 −11 0
This gives the matrix equation
    
1 4 −10 a 0
 2 1 1  b  = 0.
−1 3 −11 c 0

Row reduce to solve:    

1 4 −10 0 1 0 2 0
 2 1 1 0 → 0 1 −3 0 
−1 3 −11 0 0 0 0 0

3
This gives the equations
a + 2c = 0, b − 3c = 0.
Thus, a = −2c and b = 3c. I can get a nontrivial solution by setting c to any nonzero number. I’ll use
c = 1. This gives a = −2 and b = 3. So
       
1 4 −10 0
(−2) ·  2  + 3 ·  1  + 1 ·  1  =  0  .
−1 3 −11 0

This is a linear dependence relation, and the vectors are dependent.

The same approach works for vectors in F n where F is a field other than the real numbers.
Example. Consider the set of vectors

{(4, 1, 2), (3, 3, 0), (0, 1, 1)} in Z35 .

If the set is independent, prove it. If the set is dependent, find a nontrivial linear combination of the
vectors which is equal to 0.
Write        
4 3 0 0
a · 1 + b · 3 + c · 1 = 0.
2 0 1 0
This gives the matrix equation     
4 3 0 a 0
1 3 1 b = 0.
2 0 1 c 0
Row reduce to solve the system:
   
4 3 0 0 1 2 0 0
1 → →
3 1 0 1 3 1 0
r1 → 4r1 r3 → r3 + 3r1
2 0 1 0 2 0 1 0
   
1 2 0 0 1 2 0 0
1 → →
3 1 0 0 1 1 0
r2 → r2 + 4r1 r1 → r1 + 3r2
0 1 1 0 0 1 1 0
   
1 0 3 0 1 0 3 0
0 →
1 1 0 0 1 1 0
r3 → r3 + 4r2
0 1 1 0 0 0 0 0
This gives the equations
a + 3c = 0, b + c = 0.
Thus, a = 2c and b = 4c. Set c = 1. This gives a = 2 and b = 4. Hence, the set is dependent, and
       
4 3 0 0
2 · 1 + 4 · 3 + 1 · 1 = 0.
2 0 1 0

Example. Consider the following set of vectors in Z43 :

{(1, 0, 1, 2), (1, 2, 2, 1), (0, 1, 2, 1)}

4
If the set is independent, prove it. If the set is dependent, find a nontrivial inear combination of the
vectors equal to 0.

Write
1 1 0 0
       
0 2 1 0
a · + b· +c ·  =  .
1 2 2 0
2 1 1 0
This gives the matrix equation
1 1 0   0
   
a
0 2 1  0
 b =  .
1 2 2 0

c
2 1 1 0
Row reduce to solve the system:

1 1 0 0 1 0 1 0
   
0 2 1 0 0 1 2 0
→
1 2 2 0 0 0 0 0
   
2 1 1 0 0 0 0 0

This gives the equations

a + c = 0, b + 2c = 0.
Hence, a = 2c and b = c. Set c = 1. Then a = 2 and b = 1. Therefore, the set is dependent, and

1 1 0 0
       
0 2 1 0
2 ·  + 1 ·   + 1 ·   =  .
1 2 2 0
2 1 1 0

To summarize, to determine whether vectors v1 , v2 , . . . , vm in a vector space V are independent, I try

to solve
a1 v1 + a2 v2 + · · · + am vm = 0.
If the only solution is a1 = a2 = · · · = am = 0, then the vectors are independent; otherwise, they are
dependent.

It’s important to understand this general setup, and not just memorize the special case of vectors in F n ,
as shown in the last few examples. Remember that vectors don’t have to look like things like “(−3, 5, 7, 0)”
(“numbers in slots”). Consider the next example, for instance.

Example. R[x] is a vector space over the reals. Show that the set {1, x, x2 , . . .} is independent.

Suppose
a0 + a1 x + a2 x2 + · · · + an xn = 0.
That is,
a0 + a1 x + a2 x2 + · · · + an xn = 0 + 0 · x + 0 · x2 + · · · + 0 · xn .
Two polynomials are equal if and only if their corresponding coefficients are equal. Hence, a0 = a2 =
· · · = an = 0. Therefore, {1, x, x2 , . . .} is independent.

In some cases, you can tell by inspection that a set is dependent. I noted earlier that a set containing
the zero vector must be dependent. Here’s another easy case.

Proposition. If n > m, a set of n vectors in F m is dependent.

5
Proof. Suppose v1 , v2 , . . . vn are n vectors in F m , and n > m. Write

a1 v1 + a2 v2 + · · · + an vn = 0.

This gives the matrix equation

 a1 0
   

↑ ↑ ↑  a2   0 
 ...  =  ...  .
 v1 v2 vn     
↓ ↓ ↓
an 0

To solve, I’d row reduce the matrix

 
↑ ↑ ↑ 0
 .. 
 v1 v2 vn .
↓ ↓ ↓ 0

Note that this matrix has m rows and n + 1 columns, and n > m.
The row-reduced echelon form can have at most one leading coefficient in each row, so there are at most
m leading coefficients. These correspond to the main variables in the solution. Since there are n variables
and n > m, there must be some parameter variables. By setting any parameter variables equal to nonzero
numbers, I get a nontrivial solution for a1 , a2 , . . . an . This implies that {v1 , v2 , . . . vn } is dependent.

Example. Is the following set of vectors in R2 independent or dependent?

n √ o
(1, −3), (5, −π), ( 42, 7)

Any set of three (or more) vectors in R2 is dependent.

Notice that we know this by just counting the number of vectors. To answer the given question, we
don’t actually have to give a nontrivial linear combination of the vectors that’s equal to 0.

Proposition. Let {v1 , v2 , . . . vn } be vectors in F n , where F is a field. {v1 , v2 , . . . vn } is independent if and

only if the matrix constructed using the vectors as columns is invertible:
 
↑ ↑ ↑
 v1 v2 vn  .
↓ ↓ ↓

Proof. Suppose the set is independent. Consider the system

a1 0
  
 
↑ ↑ ↑  a2   0 
 ...  =  ...  .
 v1 v2 vn     
↓ ↓ ↓
an 0

Multiplying out the left side, this gives

a1 v1 + a2 v2 + an vn = 0.

By independence, a1 = a2 = · · · = an = 0. Thus, the system above has only the zero vector 0 as a
solution. An earlier theorem on invertibility shows that this means the matrix of v’s is invertible.

6
Conversely, suppose the following matrix is invertible:
 
↑ ↑ ↑
A =  v1 v2 vn  .
↓ ↓ ↓

Let
a1 v1 + a2 v2 + an vn = 0.
Write this as a matrix equation and solve it:

a1 0
   
 
↑ ↑ ↑  a2   0 
 ...  =  ... 
 v1 v2 vn     
↓ ↓ ↓
an 0
a1 0
   
 a2   0 
A· ...  =  ... 
  

an 0
a1 0
   
 a 2   0
A−1 A ·  −1  
 ...  = A ·  ... 


an 0
a1 0
   
 a2   0 
 . =.
 ..   .. 
an 0

This gives a1 = a2 = · · · = an = 0. Hence, the v’s are independent.

Note that this proposition requires that you have n vectors in F n — the number of vectors must match
the dimension of the space.
The result can also be stated in contrapositive form: The set of vectors is dependent if and only if the
matrix having the vectors as columns is not invertible. I’ll use this form in the next example.

Example. Consider the following set of vectors in R3 :

     
 x−8 −7 13 
 0 ,x − 1, 5  .
0 0 30
 

For what values of x is the set dependent?

I have 3 vectors in R3 , so the previous result applies.

Construct the matrix having the vectors as columns:
 
x−8 −7 13
A= 0 x−1 5 .
0 0 30

The set is dependent when A is not invertible, and A is not invertible when its determinant is equal to
0. Now
det A = 30(x − 1)(x − 8).

7
Thus, det A = 0 for x = 1 and x = 8. For those values of x, the original set is dependent.
The next proposition says that a independent set can be thought of as a set without “redundancy”, in
the sense that you can’t build any one of the vectors out of the others.
Proposition. Let V be a vector space over a field F , and let S ⊂ V . S is dependent if and only if some
v ∈ S can be expressed as a linear combination of vectors in S.
Proof. Suppose v ∈ S can be written as a linear combination of vectors in S other than v:

v = a1 v1 + · · · + an vn .

Here v1 , . . . , vn ∈ S (where vi 6= v for all i) and a1 , . . . , an ∈ F .

Then
0 = a1 v1 + · · · + an vn − 1 · v.
This is a nontrivial linear relation among elements of S. Hence, S is dependent.
Conversely, suppose S is dependent. Then there are elements a1 , a2 , . . . an ∈ F (not all 0) and
v1 , v2 , . . . vn ∈ S such that
a1 v1 + a2 v2 + · · · + an vn = 0.
Since not all the a’s are 0, at least one is nonzero. There’s no harm in assuming that a1 6= 0. (If another
a was nonzero instead, just relabel the a’s and v’s so a1 6= 0 and start again.)
Since a1 6= 0, its inverse a−1
1 is defined. So

a1 v1 + a2 v2 + · · · + an vn = 0
a1 v1 = −a2 v2 − · · · − an vn
−1
a1 a1 v1 = a−1
1 (−a2 v2 − · · · − an vn )
v1 = −a−1 −1
1 a2 v2 − · · · − a1 an vn

Thus, I’ve expressed v1 as a linear combination of other vectors in S.

Independence of sets of functions
In some cases, we can use a determinant to tell that a finite set of functions is independent. The
determinant is called the Wronskian, and its rows are the successive derivatives of the original functions.
Definition. Let {f1 , f2 , . . . fn } be a set of functions R → R which are differentiable n − 1 times. The
Wronskian is
f1 f2 ··· fn
f1′ f2′ ··· fn′
W (f1 , f2 , . . . fn ) = f1′′ f2′′ ··· fn′′ .
.. .. ..
. . .
(n−1) (n−1) (n−1)
fn f2 · · · fn

Naturally, this requires that the functions be sufficiently differentiable.

Theorem. Let C n−1 (R) be the real vector space of functions R → R which are differentiable n − 1 times.
Let S = {f1 , f2 , . . . fn } be a subset of C n−1 (R).If W (f1 , f2 , . . . fn ) 6= 0 at some point x = c, then S is
independent.
Thus, if you can find some value of x for which the Wronskian is nonzero, the functions are independent.
The converse is false, and we’ll give a counterexample to the converse below. The converse does hold
with additional conditions: For example, if the functions are solutions to a linear differential equation.
Proof. Let
a1 f1 (x) + a2 f2 (x) + · · · + an fn (x) = 0.

8
I have to show all the a’s are 0.
This equation is an identity in x, so I may differentiate it repeatedly to get n equations:

a1 f1 (x) + a2 f2 (x) + · · · + an fn (x) = 0

a1 f1′ (x) + a2 f2′ (x) + · · · + an fn′ (x) = 0
a1 f1′′ (x) + a2 f2′′ (x) + · · · + an fn′′ (x) = 0
..
.
(n−1) (n−1)
a1 f 1 (x) + a2 f2 (x) + · · · + an fn(n−1) (x) = 0

I can write this in matrix form:

 f1 (x) f2 (x) ··· fn (x) 
a1 0
   
f1′ (x) f2′ (x) ··· fn′ (x)
  a2   0 
 
f1′′ (x) f2′′ (x) ··· fn′′ (x)

 .  =  . .
  .   .. 

 .. .. ..  .
. . .

(n−1) (n−1) (n−1) an 0
fn (x) f2 (x) · · · fn (x)

Plug in x = c:
 f1 (c) f2 (c) ··· fn (c) 
a1 0
   
f1′ (c) f2′ (c) ··· fn′ (c)
  a2   0 
 
f1′′ (c) f2′′ (c) ··· fn′′ (c)

 .  =  . .
  .   .. 

 .. .. ..  .
. . .

(n−1) (n−1) (n−1) an 0
fn (c) f2 (c) · · · fn (c)
Let  f1 (c) f2 (c) ··· fn (c) 
 f1′ (c) f2′ (c) ··· fn′ (c) 
f1′′ (c) f2′′ (c) ··· fn′′ (c)
 
A= .
 .. .. .. 
. . .
 
(n−1) (n−1) (n−1)
fn (c) f2 (c) · · · fn (c)
The determinant of this matrix is the Wronskian W (f1 , f2 , . . . fn )(c), which by assumption is nonzero.
Since the determinant is nonzero, the matrix is invertible. So

a1 0
   
 a2   0 
A·
 ...  =  ... 
  

an 0
a1 0
   
 a 2   0
A−1 A ·  −1  
 ...  = A ·  ... 


an 0
a1 0
   
 a2   0 
 . =.
 ..   .. 
an 0

Since a1 = a2 = · · · = an = 0, the functions are independent.

Example. C 2 (R) denotes the vector space over R consisting of twice-differentiable functions R → R.
Demonstrate that the set of functions {x, x3 , x5 } is independent in C 2 (R).

9
Compute the Wronskian:

x x3 x5
3 5
W (x, x , x ) = 1 3x2 5x4 = 16x6 .
0 6x 20x3

I can find values of x for which the Wronskian is nonzero: for example, if x = 1, then W (x, x3 , x5 ) =
16 6= 0. Hence, {x, x3 , x5 } is independent.

The next example shows that the converse of the last theorem is false: You can have a set of independent
functions whose Wronskian is always 0 (so there’s no point where the Wronskian is nonzero).
Example. C 1 (R) denotes the vector space over R consisting of differentiable functions R → R. Let

f (x) = x2 , g(x) = x2 if x ≥ 0 .
−x2 if x < 0

Show that {f, g} is independent in C 1 (R), but W (f, g)(x) = 0 for all x ∈ R.
Note: You can check that g is differentiable at 0, and g ′ (0) = 0.
For independence, suppose that a, b ∈ R and af (x) + bg(x) = 0 for all x ∈ R. Plugging in x = 1, I get

af (1) + bg(1) = 0
a+b=0

Plugging in x = −1 (and noting that g(−1) = −(−1)2 = −1), I get

af (−1) + bg(−1) = 0
a−b=0

Adding a + b = 0 and a − b = 0 gives 2a = 0, so a = 0. Plugging a = 0 into a + b = 0 gives b = 0. This

proves that {f, g} is independent.
The Wronskian is
f (x) g(x)
W (f, g)(x) = .
f ′ (x) g ′ (x)
I’ll take cases.
Since f (0) = 0 and g(0) = 0, I have W (f, g)(0) = 0.
If x > 0, I have f ′ (x) = 2x and g ′ (x) = 2x, so

x2 x2
W (f, g)(x) = = 0.
2x 2x

If x < 0, I have f ′ (x) = 2x and g ′ (x) = −2x, so

x2 −x2
W (f, g)(x) = = 0.
2x −2x

This shows that W (f, g)(x) = 0 for all x ∈ R.

c 2022 by Bruce Ikenaga 10

9-24-2022
Bases for Vector Spaces
A set is independent if, roughly speaking, there is no redundancy in the set: You can’t “build” any
vector in the set as a linear combination of the others. A set spans if you can “build everything” in the
vector space as linear combinations of vectors in the set. Putting these two ideas together, a basis is an
independent spanning set: A set with no redundancy out of which you can “build everything”.
Definition. Let V be an F -vector space. A subset B of V is a basis if it is linearly independent and spans
V.
Definition. Let F be a field. The standard basis for F n consists of the n vectors
(1, 0, 0, . . . 0), (0, 1, 0, . . . 0), (0, 0, 1, . . . 0), . . . (0, 0, 0, . . . 1).
Thus, the k th standard basis vector has a “1” in the k th slot and zeros elsewhere.
The standard basis vectors in R2 are (1, 0) and (0, 1). They can be pictured as arrows of length 1
pointing along the positive x-axis and positive y-axis.
y

(0,1)

(1,0) x

The standard basis vectors in R3 are (1, 0, 0), (0, 1, 0), and (0, 0, 1). They can be pictured as arrows of
length 1 pointing along the positive x-axis, the positive y-axis, and the positive z-axis.
z

(0,0,1)

y
(0,1,0)
(1,0,0)
x

Since we’re calling this a “basis”, we’d better check that the name is justified!
Proposition. The standard basis is a basis for F n .
Proof. First, we show that the standard basis spans F n . Let (a1 , a2 , a3 , . . . an ) ∈ F n . I must write this
vector as a linear combination of the standard basis vectors. It’s easy:
a1 · (1, 0, 0, . . . 0)
a2 · (0, 1, 0, . . . 0)
a3 · (0, 0, 1, . . . 0)
..
.
+ an · (0, 0, 0, . . . 1)
(a1 , a2 , a3 , . . . an )

1
For independence, suppose

a1 (1, 0, 0, . . . , 0) + a2 (0, 1, 0, . . . , 0) + a3 (0, 0, 1, . . . 0) + · · · an (0, 0, 0, . . . 1) = (0, 0, 0, . . . 0).

The computation we did to prove that the set spans shows that the left side is just (a1 , a2 , a3 , . . . an ), so

(a1 , a2 , a3 , . . . an ) = (0, 0, 0, . . . 0).

This implies that a1 = a2 = a3 = · · · = an = 0, so the set is independent.

Aside from giving us an example of a basis for F n , this result shows that at least one basis for F n has
n elements. We will show later that every basis for F n has n elements.
Example. Prove that the following set is a basis for R3 over R:
     
 1 0 1 
1,1,0
0 1 1
 

Make a matrix with the vectors as columns and row reduce:

   
1 0 1 1 0 0
A = 1 1 0 → 0 1 0
0 1 1 0 0 1

Since A row reduces to the identity, it is invertible,and there are a number of conditions which are
equivalent to A being invertible.
First, since A is invertible the following system has a unique solution for every (a, b, c):
    
1 0 1 x a
1 1 0y  =  b .
0 1 1 z c

But this matrix equation can be written as

       
1 0 1 a
x · 1 + y · 1 + z · 0 =  b .
0 1 1 c

In other words, any vector (a, b, c) ∈ R3 can be written as a linear combination of the given vectors.
This proves that the given vectors span R3 .
Second, since A is invertible, the following system has only x = 0, y = 0, z = 0 as a solution:
    
1 0 1 x 0
1 1 0y  = 0.
0 1 1 z 0

This matrix equation can be written as

       
1 0 1 0
x · 1 + y · 1 + z · 0 = 0.
0 1 1 0

Since the only solution is x = 0, y = 0, z = 0, the vectors are independent.

Hence, the given set of vectors is a basis for R3 .

2
We’ll generalize the computations we did in the last example.

Proposition. Let F be a field. {v1 , v2 , . . . , vn } is a basis for F n if and only if

 
↑ ↑ ↑
A =  v1 v2 · · · vn  is invertible.
↓ ↓ ↓

Remark. By earlier results on invertibility, this is equivalent to the following conditions:

1. A row reduces to the identity.

2. det A 6= 0.
3. The system Ax = 0 has only x = (x1 , x2 , . . . xn ) = (0, 0, . . . 0) as a solution.

4. For every vector b = (b1 , b2 , . . . bn ) ∈ F n , the system Ax = b has a unique solution.

Thus, if you have n vectors in F n , this gives you several ways of checking whether or not the set is a
basis.
Proof. Suppose {v1 , v2 , . . . , vn } is a basis for F n . Let
 
↑ ↑ ↑
A =  v1 v2 · · · vn  .
↓ ↓ ↓

Consider the system

x1 0
  

 
↑ ↑ ↑  x2   0 
 v1 v2  ...  =  ...  .
· · · vn     
↓ ↓ ↓
xn 0
Doing the matrix multiplication on the left and writing the result as a vector equation, we have

x1 v1 + x2 v2 + · · · + xn vn = (0, 0, . . . 0).

Since {v1 , v2 , . . . , vn } is independent, I have x1 = x2 = · · · = xn = 0. This shows that the system above
has only the zero vector as a solution. An earlier result on invertibility shows that A must be invertible.
Conversely, suppose the following matrix is invertible:
 
↑ ↑ ↑
A =  v1 v2 · · · vn  .
↓ ↓ ↓

We will show that {v1 , v2 , . . . , vn } is a basis.

For independence, suppose

x1 v1 + x2 v2 + · · · + xn vn = (0, 0, . . . 0).

Writing the left side as a matrix multiplication gives the system

x1 0

  
 
↑ ↑ ↑  x2   0 
 v1 v2 · · · vn  
 ...  =  ...  .
  
↓ ↓ ↓
xn 0

3
Since A is invertible, the only solution to this system is x1 = x2 = · · · = xn = 0. This shows that
{v1 , v2 , . . . , vn } is independent.
To show that {v1 , v2 , . . . , vn } spans, let (b1 , b2 , . . . bn ) ∈ F n . Consider the system

x1 b1
   
 
↑ ↑ ↑  x2   b2 
 v1 v2 · · · vn  
 ...  =  ...  .
  
↓ ↓ ↓
xn bn

Since A is invertible, this system has a (unique) solution (x1 , x2 , . . . xn ). That is,

x1 v1 + x2 v2 + · · · + xn vn = (b1 , b2 , . . . bn ).

This shows that {v1 , v2 , . . . , vn } spans.

Hence, {v1 , v2 , . . . , vn } is a basis for F n .

A basis for V is a spanning set for V , so every vector in V can be written as a linear combination of
basis elements. The next result says that such a linear combination is unique.
Proposition. Let B be a basis for a vector space V . Every v ∈ V can be written in exactly one way as

v = a1 v1 + a2 v2 + · · · + an vn , ai ∈ F, v1 , . . . vn ∈ B.

Proof. Let v ∈ V . Since B spans V , there are scalars a1 , a2 , . . . , an and vectors v1 , . . . vn ∈ B such that

v = a1 v1 + a2 v2 + · · · + an vn .

Suppose that there is another way to do this: There are scalars b1 , b2 , . . . , bm and vectors w1 , . . . wm ∈ B
such that
v = b1 w1 + b2 w2 + · · · + bm wm .
First, note that I can assume that the same set of vectors are involved in both linear combinations —
that is, the v’s and w’s are the same set of vectors. For if not, I can instead use the vectors in the union

S = {v1 , . . . vn } ∪ {w1 , . . . wm }.

I can rewrite both of the original linear combinations as linear combinations of vectors in S, using 0 as
the coefficient of any vector which doesn’t occur in a given combination. Then both linear combinations for
v use the same vectors.
I’ll assume this has been done and just assume that {v1 , v2 , . . . vn } is the set of vectors. Thus, I have
two linear combinations
v = a1 v1 + a2 v2 + · · · + an vn
v = b1 v1 + b2 v2 + · · · + bn vn
Then
a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn .
Hence,
(a1 − b1 )v1 + (a2 − b2 )v2 + · · · + (an − bn )vn = 0.
Since {v1 , v2 , . . . , vn } is independent,

a1 − b1 = 0, a2 − b2 = 0, . . . , an − bn = 0.

Therefore,
a1 = b 1 , a2 = b 2 , . . . , an = b n .

4
That is, the two linear combinations were actually the same. This proves that there’s only one way to
write v as a linear combination of vectors in B.

I want to show that two bases for a vector space must have the same number of elements. I need some
preliminary results, which are important in their own right.

Lemma. If A is an m × n matrix with m < n, the system Ax = 0 has nontrivial solutions.

Proof. Write
a11 a12 ··· a1n
 
 a21 a22 ··· a2n 
A=
 ... .. ..  .
. . 
am1 am2 · · · amn
The condition m < n means that the following system has more variables than equations:

a11 a12 ··· a1n x1 0

    
 a21 a22 ··· a2n   x2   0 
 . .. ..   .  =  . .
 .. . .   ..   .. 
am1 am2 · · · amn xn 0

If A row reduces to a row reduced echelon matrix R, then R can have at most m leading coefficients.
Therefore, some of the variables x1 , x2 , . . . , xn will be free variables (parameters); if I assign nonzero values
to the free variables (e.g. by setting all of them equal to 1), the resulting solution will be nontrivial.

Theorem. Let {v1 , v2 , . . . , vn } be a basis for a vector space V over a field F .

(a) Any subset of V with more than n elements is dependent.

(b) Any subset of V with fewer than n elements cannot span.

Proof. (a) Suppose {w1 , w2 , . . . , wm } is a subset of V , and that m > n. I want to show that {w1 , w2 , . . . , wm }
is dependent.
Write each w as a linear combination of the v’s:

w1 = a11 v1 + a12 v2 + · · · + a1n vn

w2 = a21 v1 + a22 v2 + · · · + a2n vn
..
.
wm = am1 v1 + am2 v2 + · · · + amn vn

This can be represented as the following matrix equation:

a11 a21 ··· am1

 
   
↑ ↑ ↑ ↑ ↑ ↑  a12 a22 ··· am2 
 w1 w2 · · · wm  =  v1 v2 · · · vn  
 ... .. ..  .
↓ ↓ ↓ ↓ ↓ ↓ . . 
a1n a2n · · · amn

Since m > n, the matrix of a’s has more columns than rows. Therefore, the following system has a
nontrivial solution x1 = b1 , x2 = b2 , . . . xm = bm :

a11 a21 ··· am1 x1 0

    
 a12 a22 ··· am2   x2   0 
 . .. ..  ..  =  . .
.   .. 

 .. . . 
a1n a2n · · · amn xm 0

5
That is, not all the b’s are 0, but

a11 a21 ··· am1 b1 0

    
 a12 a22 ··· am2   b2   0 
 . .. ..  ..  =  . .
.   .. 

 .. . . 
a1n a2n · · · amn bm 0

But then
b1 a11 a21 ··· am1 b1
    
   
↑ ↑ ↑  b2  ↑ ↑ ↑  a12 a22 ··· am2   b2 
 w1 w2 · · · wm   .  =  v1
 ..  v2 · · · vn  
 ... .. ..   ..  .
↓ ↓ ↓ ↓ ↓ ↓ . .  . 
bm a1n a2n · · · amn bm

Therefore,
b1 0
   
 
↑ ↑ ↑  b2   0 
 w1 w2 · · · wm   ..  =  . .
↓ ↓ ↓
 .   .. 
bm 0
In equation form,
b1 w1 + b2 w2 + · · · + bm wm = 0.
This is a nontrivial linear combination of the w’s which adds up to 0, so the w’s are dependent.
(b) Suppose that {w1 , w2 , . . . , wm } is a set of vectors in V and m < n. I want to show that {w1 , w2 , . . . , wm }
does not span V .
Suppose on the contrary that the w’s span V . Then each v can be written as a linear combination of
the w’s:
v1 = a11 w1 + a12 w2 + · · · + a1m wm
v2 = a21 w1 + a22 w2 + · · · + a2m wm
..
.
vn = an1 w1 + an2 w2 + · · · + anm wm
In matrix form, this is

a11 a21 ··· an1

 
   
↑ ↑ ↑ ↑ ↑ ↑  a12 a22 ··· an2 
 v1 v2 · · · vn  =  w1 w2 · · · wm  
 ... .. ..  .
↓ ↓ ↓ ↓ ↓ ↓ . . 
a1m a2m · · · anm

Since n > m, the coefficient matrix has more columns than rows. Hence, the following system has a
nontrivial solution x1 = b1 , x2 = b2 , . . . xn = bn :

a11 a21 ··· an1 x1 0

    
 a12 a22 ··· an2   x2   0 
 . .. ..   .  =  . .
 .. . .   ..   .. 
a1m a2m · · · anm xn 0

Thus,
a11 a21 ··· an1 b1 0
    
 a12 a22 ··· an2   b2   0 
 . .. ..   .  =  . .
 .. . .   ..   .. 
a1m a2m · · · anm bn 0

6
Multiplying the v and w equation on the right by the b-vector gives

b1 a11 a21 ··· an1 b1

    
   
↑ ↑ ↑  b2  ↑ ↑ ↑  a12 a22 ··· an2   b2 
 v1 v2 · · · vn   .  =  w1
 ..  w2 · · · wm  
 ... .. ..   . .
↓ ↓ ↓ ↓ ↓ ↓ . .   .. 
bn a1m a2m · · · anm bn

Hence,
b1 0
  
 
↑ ↑ ↑  b2   0 
 v1 v2 · · · vn  
 ...  =  ...  .
  
↓ ↓ ↓
bn 0
In equation form, this is
b1 v1 + b2 v2 + · · · + bn vn = 0.
Since not all the b’s are 0, this is a nontrivial linear combination of the v’s which adds up to 0 —
contradicting the independence of the v’s.
This contradiction means that the w’s can’t span after all.
Corollary. Let F be a field.

(a) Any subset of F n with more than n elements is dependent.

(b) Any subset of F n with fewer than n elements cannot span.
Proof. This follows from the Theorem, and the fact that the standard basis for F n has n elements.
As an example, the following set of four vectors in R3 can’t be independent:
       
 1 2 1 0 
 0  ,  −3  ,  1  ,  11 
−1 10 1 −7
 

Likewise, the following set of two vectors can’t span R3 :

   
 −2 3 
 1 ,1
2 5
 

Corollary. If {v1 , . . . , vn } is a basis for a vector space V , then every basis for V has n elements.
Proof. If {w1 , . . . , wm } is another basis for V , then m can’t be less than n or {w1 , . . . , wm } couldn’t span.
Likewise, m can’t be greater than n or {w1 , . . . , wm } couldn’t be independent. Therefore, m = n.

The Corollary allows us to define the dimension of a vector space.

Definition. The number of elements in a basis for V is called the dimension of V , and is denoted dim V .
The definition makes sense, since in a finite-dimensional vector space, any two bases have the same
number of elements. This is true in general; I’ll state the relevant results without proof.
(a) Every vector space has a basis. The proof requires a set-theoretic result called Zorn’s Lemma.
(b) Two bases for any vector space have the same cardinality. Specifically, if B and C are bases for a
vector space V , there is a bijective function f : B → C.
Here’s an example of a basis with an infinite number of elements.

7
Example. Let R[x] denote the R-vector space of polynomials with coefficients in R.
Show that {1, x, x2 , x3 , . . .} is a basis for R[x].

First, a polynomial has the form an xn + an−1 xn−1 + · · · + a1 x + a0 . This is a linear combination of
elements of {1, x, x2 , x3 , . . .}. Hence, the set spans R[x].
To show that the set is independent, suppose there are numbers a0 , a1 , . . . an ∈ R such that

a0 + a1 x + · · · + an xn = 0.

This equation is an identity — it’s true for all x ∈ R. Setting x = 0, I get a0 = 0. Then plugging a0 = 0
back in gives
a1 x + · · · + an xn = 0.
Since this is an identity, I can differentiate both sides to obtain

a1 + 2a2 x + · · · + nan xn−1 = 0.

Setting x = 0 gives a1 = 0. Plugging a1 = 0 back in gives

2a2 x + · · · + nan xn−1 = 0.

Continuing in this way, I get a0 = a1 = · · · = an = 0. Hence, the set is independent.

Therefore, {1, x, x2 , x3 , . . .} is a basis for R[x].

The next result shows that, in principle, you can construct a basis by:
(a) Starting with an independent set and adding vectors.

(b) Starting with a spanning set and removing vectors.

Theorem. Let V be a finite-dimensional vector space.

(a) Any set of independent vectors is a subset of a basis for V .

(b) Any spanning set for V contains a subset which is a basis.

Part (a) means that if S is an independent set, then there is a basis T such that S ⊂ T . (If S was a
basis to begin with, then S = T .) Part (b) means that if S is a spanning set, then there is a basis T such
that T ⊂ S.

I’m only proving the result in the case where V has finite dimension n, but it is true for vector spaces
of any dimension.

Proof.
(a) Let {v1 , . . . , vm } be independent. If this set spans V , it’s a basis, and I’m done. Otherwise, there is a
vector v ∈ V which is not in the span of {v1 , . . . , vm }.
I claim that {v, v1 , . . . , vm } is independent. Suppose

av + a1 v1 + · · · + am vm = 0.

Suppose a 6= 0. Then I can write

1
v = − (a1 v1 + · · · + am vm ) .
a

Since v has been expressed as a linear combination of the vk ’s, it’s in the span of the vk ’s, contrary to
assumption. Therefore, this case is ruled out.

8
The only other possibility is a = 0. Then a1 v1 + · · · + am vm = 0, so independence of the vk ’s implies
a1 = · · · = am = 0. Therefore, {v, v1 , . . . , vm } is independent.
I can continue adding vectors in this way until I get a set which is independent and spans — a basis.
The process must terminate, since no independent set in V can have more than n elements.
(b) Suppose {v1 , . . . , vm } spans V . I want to show that some subset of {v1 , . . . , vm } is a basis.
If {v1 , . . . , vm } is independent, it’s a basis, and I’m done. Otherwise, there is a nontrivial linear combi-
nation
a1 v1 + · · · + am vm = 0.
Assume without loss of generality that a1 6= 0. Then

1
v1 = − (a2 v2 + · · · + am vm ) .
a1

Since v1 is a linear combination of the other v’s, I can remove it and still have a set which spans V ; that
is, V = hv2 , . . . , vm i.
I continue throwing out vectors in this way until I reach a set which spans and is independent — a basis.
The process must terminate, because no set containing fewer than n vectors can span V .

It’s possible to carry out the “adding vectors” and “removing vectors” procedures in some specific cases.
The algorithms are related to those for finding bases for the row space and column space of a matrix,
which I’ll discuss later.
Suppose you know a basis should have n elements, and you have a set S with n elements (“the right
number”). To show S is a basis, you only need to check either that it is independent or that it spans — not
both. I’ll justify this statement, then show by example how you can use it. I need a preliminary result.
Proposition. Let V be a finite dimensional vector space over a field F , and let W be a subspace of V . If
dim W = dim V , then V = W .
Proof. Suppose dim W = dim V = n, but V 6= W . I’ll show that this leads to a contradiction.
Let {x1 , x2 , . . . , xn } be a basis for W . Suppose this is not a basis for V . Since it’s an independent set,
the previous result shows that I can add vectors y1 , y2 , . . . ym to make a basis for V :

{x1 , x2 , . . . , xn , y1 , y2 , . . . ym }.

But this is a basis for V with more than n elements, which is impossible.
Therefore, {x1 , x2 , . . . , xn } is also a basis for V . Let x ∈ V . Since {x1 , x2 , . . . , xn } spans V , I can write
x as a linear combination of the elements of {x1 , x2 , . . . , xn }:

x = a1 x1 + a2 x2 + · · · + an xn , ai ∈ F.

But since x1 , x2 , . . . xn are in W and W is a subspace, any linear combination of x1 , x2 , . . . xn must be

in W . Thus, x ∈ W .
Since x was an arbitrary element of V , I get V ⊂ W , so W = V .
You might be thinking that this result is obvious: W and V have the same dimension, and you can’t
have one thing inside another, with both things having the “same size”, unless the things are equal. This
is intuitively what is going on here, but this kind of intuitive reasoning doesn’t always work. For example,
the even integers are a subset of the integers and, as infinite sets, both are of the same “order of infinity”
(cardinality). But the even integers aren’t all of the integers: There are odd integers as well.
Corollary. Let S be a set of n vectors in an n-dimensional vector space V .
(a) If S is independent, then S is a basis for V .
(b) If S spans V , then S is a basis for V .

9
Proof. (a) Suppose S is independent. Consider W , the span of S. Then S is independent and spans W ,
so S is a basis for W . Since S has n elements, dim W = n. But W ⊂ V and dim V = n. By the preceding
result, V = W .
Hence, S spans V , and S is a basis for V .
(b) Suppose S spans V . Suppose S is not independent. By an earlier result, I can remove some elements of
S to get a set T which is a basis for V . But now I have a basis T for V with fewer than n elements (since I
removed elements from S, which had n elements).
This is a contradiction, and hence S must be independent.
Example. (a) Determine whether {(1, 2, 1), (2, 1, 1), (2, 0, 1)} is a basis for the Z3 vector space Z33 .
(b) Determine whether {(1, 1, 0), (1, 2, 2), (2, 2, 0)} is a basis for the Z3 vector space Z33 .
(a) Form the matrix with the vectors as columns and row reduce:
   
1 2 2 1 0 0
2 1 0 → 0 1 0
1 1 1 0 0 1

Since the matrix row reduces to the identity, it is invertible. Since the matrix is invertible, the vectors
are independent. Since we have 3 vectors in a 3-dimensional vector space, the Corollary says that the set is
a basis.
(b) Form the matrix with the vectors as columns and row reduce:
   
1 1 2 1 0 2
1 2 2 → 0 1 0
0 2 0 0 0 0

The matrix did not row reduce to the identity, so it is not invertible. Since the matrix is not invertible,
the vectors aren’t independent. Hence, the vectors are not a basis.

c 2022 by Bruce Ikenaga 10

4-3-2023
The Row Space of a Matrix
We will look at 3 subspaces associated to a matrix: The row space, the column space, and the null
space. They provide important information about the matrix and the linear transformation associated to
it. In this section, we’ll discuss the row space of a matrix. We’ll also discuss algorithms for finding a basis
for a subspace spanned by a set of vectors, determining whether a set of vectors is independent, and adding
vectors to an independent set to get a basis.
A lot of what follows generalizes to matrices with entries in a commutative ring with identity, but we
will almost always stick to matrices over a field in this section.
Definition. Let A be an m × n matrix with entries in a field F . The row vectors of A are the vectors in
F n corresponding to the rows of A. The row space of A is the subspace of F n spanned by the row vectors
of A.
For example, consider the real matrix
 
1 0
A = 0 1.
0 0

The row vectors are (1, 0), (0, 1), and (0, 0). The row space is the subspace of R2 spanned by these
vectors. Since the first two vectors are the standard basis vectors for R2 , the row space is R2 .
Lemma. Let A be a matrix with entries in a field. If E is an elementary row operation, then E(A) has the
same row space as A.
Proof. If E is an operation of the form ri ↔ rj , then E(A) and A have the same rows (except for order),
so it’s clear that their row vectors have the same span. Hence, the matrices have the same row space.
If E is an operation of the form ri → ari where a 6= 0, then A and E(A) agree except in the i-th row.
We have
a1 r1 + · · · + ai ri + · · · + am rm = a1 r1 + · · · + ai a−1 (ari ) + · · · + am rm ,
Note that the vectors r1 , . . . , ari , . . . , rm are the rows of E(A). So this equation says any linear
combination of the rows r1 , . . . , rm of A is a linear combination of the rows of E(A). This means that the
row space of A is contained in the row space of E(A).
Going the other way, a linear combination of the rows r1 , . . . , ari , . . . , rm of E(A) looks like this:

b1 r1 + · · · + bi (ari ) + · · · + bm rm .

But this is a linear combination of the rows r1 , . . . , rm of A, so the row space of E(A) is contained in
the row space of A. Hence, A and E(A) have the same row space.
Finally, suppose E is a row operation of the form ri → ri + arj , where a ∈ F . Then

a1 r1 + · · · + ai ri + · · · + am rm = a1 r1 + · · · + ai (ri + arj ) + · · · + (aj − ai a)rj + · · · + am rm .

This shows that the row space of A is contained in the row space of E(A).
Conversely,

a1 r1 + · · · + ai (ri + arj ) + · · · + aj rj + · · · + am rm = a1 r1 + · · · + ai ri + · · · + (ai a + aj )rj + · · · + am rm .

Hence, the row space of E(A) is contained in the row space of A.

Definition. Two matrices over a field are row equivalent if one can be obtained from the other via
elementary row operations.

1
Since row operations preserve row space, row equivalent matrices have the same row space. In particular,
a matrix and its row reduced echelon form have the same row space.
The next proposition describes some of the components of a vector in the row space of a row-reduced
echelon matrix R. Such a vector is a linear combination of the nonzero rows of R.
Proposition. Let R = {rij } be a row reduced echelon matrix over a field with nonzero rows r1 , . . . , rp .
Suppose the leading entries of R occur at

(1, j1 ), (2, j2 ), . . . , where j1 < j2 < · · · .

Suppose v = (v1 , . . . , vn ) and

v = a1 r1 + · · · + ap rp .
Then vjk = ak for all k.
Proof.
p
X
vjk = ai rijk .
i=1

But the only nonzero element in column jk is the leading entry rkjk = 1. Therefore, the only nonzero
term in the sum is ak rkjk = ak .
This result looks a bit technical, but it becomes obvious if you consider an example. Here’s a row
reduced echelon matrix over R:
0 1 2 0 −1 0
 
0 0 0 1 2 0
.
0 0 0 0 0 1

0 0 0 0 0 0
Here’s a vector in the row space, a linear combination of the nonzero rows:

The leading entries occur in columns j1 = 2, j2 = 4, and j3 = 6. The 2nd , 4th , and 6th components of
the vector are
v2 = a, v4 = b, v6 = c.
You can see from the picture why this happens. The coefficients a, b, c multiply the leading entries. The
leading entries are all 1’s, and they’re the only nonzero elements in their columns. So in the components of
the vector corresponding to those columns, you get a, b, and c.
Corollary. The nonzero rows of a row reduced echelon matrix over a field are independent.
Proof. Suppose R is a row reduced echelon matrix with nonzero rows r1 , . . . , rp . Suppose the leading
entries of R occur at (1, j1 ), (2, j2 ), . . ., where j1 < j2 < · · ·. Suppose

0 = a1 r1 + · · · + ap rp .

The proposition implies that ak = vjk = 0 for all k. Therefore, {ri } are independent.
Corollary. The nonzero rows of a row reduced echelon matrix over a field form a basis for the row space of
the matrix.

2
Proof. The nonzero rows span the row space, and are independent, by the preceding corollary.
Algorithm. Let V be a finite-dimensional vector space, and let v1 , . . . , vm be vectors in V . Find a basis
for W = hv1 , . . . , vm i, the subspace spanned by the vi .
Let M be the matrix whose i-th row is vi . The row space of M is W . Let R be a row-reduced echelon
matrix which is row equivalent to M . Then R and M have the same row space W , and the nonzero rows of
R form a basis for W .
Example. Consider the vectors v1 = (1, 0, 1, 1), v2 = (−2, 1, 1, 0), and v3 = (7, −2, 1, 3) in R4 . Find a basis
for the subspace hv1 , v2 , v3 i spanned by the vectors.
Construct a matrix with the vi as its rows and row reduce:
   
1 0 1 1 1 0 1 1
 −2 1 1 0 → 0 1 3 2
7 −2 1 3 0 0 0 0
The vectors (1, 0, 1, 1) and (0, 1, 3, 2) form a basis for hv1 , v2 , v3 i.
Example. Determine the dimension of the subspace of R3 spanned by (1, 2, −1), (1, 1, 1), and (2, −2, 1).
Form a matrix using the vectors as the rows and row reduce:
   
1 2 −1 1 0 0
1 1 1  → 0 1 0
2 −2 1 0 0 1
The subspace has dimension 3, since the row reduced echelon matrix has 3 nonzero rows.
Definition. The rank of a matrix over a field is the dimension of its row space.
Example. Find the rank of the following matrix over Z5 :
 
1 4 2 1
3 3 1 2.
0 1 0 4

Row reduce the matrix:    

1 4 2 1 1 0 2 0
3 3 1 2 → 0 1 0 4
0 1 0 4 0 0 0 0
The row reduced echelon matrix has 2 nonzero rows. Therefore, the original matrix has rank 2.
I’ll need the following fact about matrix multiplication for the proof of the next lemma. Consider the
following multiplication:
← r1 →
 
 ← r2 → 
[ a1 a2 · · · an ]  .. .
 . 
← rn →
In doing the multiplication, each ai multiplies the corresponding row ri . The result is a linear combi-
nation of the ri ’s with the ai ’s as coefficients. Here’s the picture:

3
Therefore,
← r1 →
 
← r2 →
[ a1 a2 · · · an ]  ..  = [ a1 r1 + a2 r2 + · · · + an rn ] .
 . 
← rn →
If instead of a single row vector on the left I have an entire matrix, here’s what I get:

a11 a12 ··· a1n ← r1 → ← a11 r1 + a12 r2 + · · · + a1n rn →

    
 a21 a22 ··· a2n   ← r2 → ← a21 r1 + a22 r2 + · · · + a2n rn →
 . .. ..   .. = .. .
 .. . .  .   . 
am1 am2 · · · amn ← rn → ← am1 r1 + am2 r2 + · · · + amn rn →

Hence, the rows of the product are linear combinations of the rows r1 , r2 , . . . rn .

Proposition. Let M and N be matrices over a field F which are compatible for multiplication. Then

rank(M N ) ≤ rank N.

Proof. The preceding discussion shows that the rows of M N are linear combinations of the rows of N .
Therefore, the rows of M N are all contained in the row space of N .
The row space of N is a subspace, so it’s closed under taking linear combinations of vectors. Hence,
any linear combination of the rows of M N is in the row space of N . Therefore, the row space of M N is
contained in the row space of N .
From this, it follows that the dimension of the row space of M N is less than or equal to the dimension
of the row space of N — that is, rank(M N ) ≤ rank N .

I already have one algorithm for testing whether a set of vectors in F n is independent. That algorithm
involves constructing a matrix with the vectors as the columns, then row reducing. The algorithm will also
produce a linear combination of the vectors which adds up to the zero vector if the set is dependent.
If all you care about is whether or not a set of vectors in F n is independent — i.e. you don’t care about
a possible dependence relation — the results on rank can be used to give an alternative algorithm. In this
approach, you construct a matrix with the given vectors as the rows.

Algorithm. Let V be a finite-dimensional vector space, and let v1 , . . . , vm be vectors in V . Determine

whether the set {v1 , . . . , vm } is independent.

Let M be the matrix whose i-th row is vi . Let R be a row reduced echelon matrix which is row equivalent
to M . If R has m nonzero rows, then {v1 , . . . , vm } is independent. Otherwise, the set is dependent.
If R has p nonzero rows, then R and M have rank p. (They have the same rank, because they have the
same row space.) Suppose p = m. Since {vi } spans, some subset of {vi } is a basis. However, a basis must
contain p = m elements. Therefore, {vi } must be independent.
Any independent subset of the row space must contain ≤ p elements. Hence, if m > p, {vi } must be
dependent.
Example. Determine whether the vectors v1 = (1, 0, 1, 1), v2 = (−2, 1, 1, 0), and v3 = (7, −2, 1, 3) in R4 are
independent.

Form a matrix with the vectors as the rows and row reduce:
   
1 0 1 1 1 0 1 1
 −2 1 1 0 → 0 1 3 2
7 −2 1 3 0 0 0 0

The row reduced echelon matrix has only two nonzero rows. Hence, the vectors are dependent.

4
I already know that every matrix can be row reduced to a row reduced echelon matrix. The next result
completes the discussion by showing that the row reduced echelon form is unique
Proposition. Every matrix over a field can be row reduced to a unique row reduced echelon matrix.
Proof. Suppose M row reduces to R, a row reduced echelon matrix with nonzero rows r1 , . . . , rp . Suppose
the leading coefficients of R occur at (1, j1 ), (2, j2 ), . . ., where j1 < j2 < · · ·.
Let W be the row space of R and let v = (v1 , . . . , vn ) W . Since r1 , . . . rp span the row space W , we
have
v = a1 r1 + · · · + ap rp .
Claim: The first nonzero component of v must occur in column jk , for some k = 1, 2, . . ..
Suppose ak is the first ai which is nonzero. Since the ai ri terms before ak rk are zero, we have
v = ak rk + · · · + ap rp .
The first nonzero element of rk is a 1 at (k, jk ). The first nonzero element in rk+1 , . . . , rp lies to the
right of column jk . Thus, vj = 0 for j < jk , and vjk = ak . Evidently, this is the first nonzero component of
v. This proves the claim.
This establishes that if a row reduced echelon matrix R′ is row equivalent to M , its leading coefficients
must lie in the same columns as those of R. For the rows of R′ are elements of W , and the claim applies.
Next, I’ll show that the nonzero rows of R′ are the same as the nonzero row of R.
Consider, for instance, the first nonzero rows of R and R′ . Their first nonzero components are 1’s lying
in column j1 . Moreover, both r1 and r1′ have zeros in columns j2 , j3 , . . . .
Suppose r1 6= r1′ . Then r1 − r1′ is a nonzero vector in W whose first nonzero component is not in column
j1 , j2 , . . . , which is a contradiction.
The same argument applies to show that rk = rk′ for all k. Therefore, R = R′ .
In my discussion of bases, I showed that every independent set is a subset of a basis. To put it another
way, you can add vectors to an independent set to get a basis.

Here’s how to find specific vectors to add to an independent set to get a basis.

Example. The following set of vectors in R5 is independent:

{(2, −4, 1, 0, 8), (−1, 2, −1, −1, −4), (2, −4, 1, 1, 7)}
Add vectors to the set to make a basis for R5 .
If I make a matrix with these vectors as rows and row reduce, the row reduced echelon form will have
the same row space (i.e. the same span) as the original set of vectors:
   
2 −4 1 0 8 1 −2 0 0 3
 −1 2 −1 −1 −4  →  0 0 1 0 2 
2 −4 1 1 7 0 0 0 1 −1

5
Since there are three nonzero rows and the original set had three vectors, the original set of vectors is
indeed independent.
By examining the row reduced echelon form, I see that the vectors (0, 1, 0, 0, 0) and (0, 0, 0, 0, 1) will not
be linear combinations of the others. Reason: A nonzero linear combination of the rows of the row reduced
echelon form must have a nonzero entry in at least one of the first, third, or fourth columns, since those are
the columns containing the leading entries.
In other words, I’m choosing standard basis vectors with a 1’s in positions not occupied by leading
entries in the row reduced echelon form. Therefore, I can add (0, 1, 0, 0, 0) and (0, 0, 0, 0, 1) to the set and
get a new independent set:

{(2, −4, 1, 0, 8), (−1, 2, −1, −1, −4), (2, −4, 1, 1, 7), (0, 1, 0, 0, 0), (0, 0, 0, 0, 1)} .

There are 5 vectors in this set, so it is a basis for R5 .

c 2023 by Bruce Ikenaga 6

4-3-2023
The Column Space of a Matrix
Definition. Let A be an m × n matrix. The column vectors of A are the vectors in F n corresponding to
the columns of A. The column space of A is the subspace of F n spanned by the column vectors of A.
For example, consider the real matrix
 
1 0
A = 0 1.
0 0

The column vectors are (1, 0, 0) and (0, 1, 0). The column space is the subspace of R3 spanned by these
vectors. Thus, the column space consists of all vectors of the form

a · (1, 0, 0) + b · (0, 1, 0) = (a, b, 0).

We’ve seen how to find a basis for the row space of a matrix. We’ll now give an algorithm for finding a
basis for the column space.
First, here’s a reminder about matrix multiplication. If A is an m × n matrix and v ∈ F n , then you can
think of the multiplication Av as multiplying the columns of A by the components of v:

This means that if ci is the i-th column of A and v = (a1 , . . . , an ), the product Av is a linear combination
of the columns of A:
 a1
 

↑ ↑ ↑  a2 
 c1 c2 · · · cn   .  = a1 c1 + a2 c2 + · · · + an cn .
 .. 
↓ ↓ ↓
an

Proposition. Let A be a matrix, and let R be the row reduced echelon matrix which is row equivalent to
A. Suppose the leading entries of R occur in columns j1 , . . . , jp , where j1 < · · · < jp , and let ci denote the
i-th column of A. Then {cj1 , . . . , cjp } is independent.
Proof. Suppose that
aj1 cj1 + · · · + ajp cjp = 0, for ai ∈ F.
Form the vector v = (vi ), where

0 if i ∈
/ {j1 , . . . , jp }
vi =
ai if i ∈ {j1 , . . . , jp }

The equation above implies that Av = 0.

It follows that v is in the solution space of the system Ax = 0. Since Rx = 0 has the same solution
space, Rv = 0. Let c′i denote the i-th column of R. Then

0 = Rv = aj1 c′j1 + · · · + ajp cjp .

1
However, since R is in row reduced echelon form, c′jk is a vector with 1 in the k-th row and 0’s elsewhere.
Hence, {cj1 , . . . , cjp } is independent, and aj1 = · · · = ajp = 0.

The proof provides an algorithm for finding a basis for the column space of a matrix. Specifically,
row reduce the matrix A to a row reduced echelon matrix R. If the leading entries of R occur in columns
j1 , . . . , jp , then consider the columns cj1 , . . . , cjp of A. These columns form a basis for the column space of
A.

Example. Find a basis for the column space of the real matrix

1 −2 3 1 1
 
2 1 0 3 1
.
0 −5 6 −1 1

7 1 3 10 4

Row reduce the matrix:

1 −2 3 1 1 1 0 0.6 1.4 0.6

   
2 1 0 3 1 0 1 −1.2 0.2 −0.2 
→
0 −5 6 −1 1 0 0 0 0 0
   
7 1 3 10 4 0 0 0 0 0

The leading entries occur in columns 1 and 2. Therefore, (1, 2, 0, 7) and (−2, 1, −5, 1) form a basis for
the column space of A.

Note that if A and B are row equivalent, they don’t necessarily have the same column space. For
example,
1 2 1 → 1 2 1
.
1 2 1 r2 → r2 − r1 0 0 0
However, all the elements of the column space of the second matrix have their second component equal
to 0; this is obviously not true of elements of the column space of the first matrix.

Example. Find a basis for the column space of the following matrix over Z3 :
 
0 1 1 0
A = 1 2 1 0.
2 1 2 1

Row reduce the matrix:

   
0 1 1 0 1 2 1 0
1 2 → →
1 0 0 1 1 0
r1 ↔ r2 r3 → r3 + r1
2 1 2 1 2 1 2 1
   
1 2 1 0 1 0 2 0
0 1 →
1 0 0 1 1 0
r1 → r1 + r2
0 0 0 1 0 0 0 1
The leading entries occur in columns 1, 2, and 4. Hence, columns 1, 2, and 4 of A are independent and
form a basis for the column space of A:
     
 0 1 0 
1,2,0
2 1 1
 

2
I showed earlier that you can add vectors to an independent set to get a basis. The column space basis
algorithm shows how to remove vectors from a spanning set to get a basis.

Example. Find a subset of the following set of vectors which forms a basis for R3 .
       
 1 −1 1 4 
 2  ,  1  ,  1  ,  −1 
1 −1 1 2
 

Make a matrix with the vectors as columns and row reduce:

2
 
  1 0 0
1 −1 1 4  3 
 2 1 1 −1  →  1 
0 1 − 0
 
1 −1 1 2 3
0 0 0 1

The leading entries occur in columns 1, 2, and 4. Therefore, the corresponding columns of the original
matrix are independent, and form a basis for R3 :
     
 1 −1 4 
 2  ,  1  ,  −1  .
1 −1 2
 

Definition. Let A be a matrix. The column rank of A is the dimension of the column space of A.

This is really just a temporary definition, since we’ll show that the column rank is the same as the rank
we define earlier (the dimension of the row space).

Proposition. Let A be a matrix. Then

rank(A) = column rank(A).

Proof. Let R be the row reduced echelon matrix which is row equivalent to A. Suppose the leading entries
of R occur in columns j1 , . . . , jp , where j1 < · · · < jp , and let ci denote the i-th column of A. By the
preceding lemma, {cj1 , . . . , cjp } is independent. There is one vector in this set for each leading entry, and
the number of leading entries equals the row rank. Therefore,

rank(A) ≤ column rank(A).

3
Now consider AT . This is A with the rows and columns swapped, so

rank(AT ) = column rank(A),

column rank(AT ) = rank(A).

Applying the first part of the proof to AT ,

column rank(A) = rank(AT ) ≤ column rank(AT ) = rank(A).

Therefore,
column rank(A) = rank(A).

Proposition. Let A, B, P and Q be matrices, where P and Q are invertible. Suppose A = P BQ. Then

rank A = rank B.

Proof. I showed earlier that rank M N ≤ rank N . This was row rank; a similar proof shows that

column rank(M N ) ≤ column rank(M ).

Since row rank and column rank are the same, rank M N ≤ rank M .
Now

rank A = rank P BQ ≤ rank BQ = column rank(BQ) ≤ column rank(B) = rank B.

But B = P −1 AQ−1 , so repeating the computation gives rank B ≤ rank A. Therefore, rank A = rank B.

c 2022 by Bruce Ikenaga 4

4-3-2023
The Null Space of a Matrix

Definition. The null space (or kernel) of a matrix A is the set of vectors x such that Ax = 0. The
dimension of the null space of A is called the nullity of A, and is denoted nullity(A).

The null space is the same as the solution space of the system of equations Ax = 0. I showed earlier
that if A is an m × n matrix, then the solution space is a subspace of F n . Thus, the null space of a matrix
is a subspace of F n .

Example. Consider the real matrix

3 −1 1
A= .
−1 1 1

(a) Is (1, 2, −1) in the null space of A?

(b) Is (1, 1, 1) in the null space of A?

(a)
 
1
3 −1 1  0
2  = .
−1 1 1 0
−1

Hence the vector (1, 2, −1) is in the null space of A.

(b)
 
1
3 −1 1   3 0
1 = 6= .
−1 1 1 1 0
1

Hence, the vector (1, 1, 1) is not in the null space of A.

Algorithm. Let A be an m × n matrix. Find a basis for the null space of A.

Let x = (x1 , x2 , . . . , xn ). Solve the following system by row reducing A to row-reduced echelon form:

Ax = 0.

In the row reduced echelon form, suppose that {xi1 , xi2 , . . . , xip } are the variables corresponding to the
leading entries, and suppose that {xj1 , xj2 , . . . , xjq } are the free variables. Note that p + q = n.
Put the solution in parametric form, writing the leading entry variables {xi1 , xi2 , . . . , xip } in terms of
the free variables (parameters) {xj1 , xj2 , . . . , xjq }:

xi1 = fi1 (xj1 , xj2 , . . . , xjq )

xi2 = fi2 (xj1 , xj2 , . . . , xjq )
..
.
xip = fip (xj1 , xj2 , . . . , xjq )

Plug these expressions into the general solution vector x = (x1 , x2 , . . . , xn ) for the xi components:
So xi1 = fi1 (xj1 , xj2 , . . . , xjq ) for xi1 , then xi2 = fi2 (xj1 , xj2 , . . . , xjq ), for xi2 , and so on. Leave the xj

1
components (those with just {xj1 , xj2 , . . . , xjq }) alone. Schematically, the result looks like this:
 (f ′
s) 
ik ∗
 
∗
 
∗
 
 xj1  1 0 0
   (fik ′ s)  ∗ ∗ ∗
       
x1
 xj2  0 1 0
       
 x2   ′
 .  =  (fik s)  = xj1  ∗  + xj2  ∗  + · · · + xjq  ∗  .
      
 ..   .. . . .
 ..   ..   .. 
    
.
 
xn
       
∗ ∗
∗
 (fik ′ s) 
     
 
0 0 1
xjq
 
(fik ′ s) ∗ ∗ ∗

The ∗’s represent the stuff that’s left after factoring xj1 , xj2 , . . . out of the f -terms.
In the last expression, the vectors which are being multiplied by xj1 , xj2 , . . . , xjq form a basis for the
null space.
First, the vectors span the null space, because the equation above has expressed an arbitrary vector in
the null space as a linear combination of the vectors.
Second, the vectors are independent. Suppose the linear combination above is equal to the zero vector
(0, 0, . . . 0):
∗ ∗ ∗
     
1 0 0
∗ ∗ ∗  
     
0
0 1 0
     
  0
∗ ∗ ∗  
   
 .  + xj2  .  + · · · + xjq  .  =  ...  .
xj1 
 ..   ..   .. 
 
∗
 
∗
 
∗ 0
     
0 0 1
∗ ∗ ∗
Then  (f ′
ik s) 
 xj1 
 (fik ′ s)   
 
0
 xj2 
 
0
 (fik ′ s)  =  .  .
  
 ..   .. 
.
 
  0
 (fik ′ s) 
 
xjq
 
′
(fik s)
We see that xj1 = xj2 = · · · = xjq = 0.

This description is probably hard to understand with all the subscripts flying around, but I think the
examples which follow will make it clear.

Before giving an example, here’s an important result that comes out of the algorithm.

Theorem. Let A be an m × n matrix. Then

n = rank A + nullity A.

Proof. In the algorithm above, p, the number of leading entry variables, is the rank of A. And q, the number
of free variables, is the same as the number of vectors in the basis for the null space. That is, q = nullity(A).
Finally, I observed earlier that p + q = n. Thus, n = rank A + nullity A.

2
This theorem is a special case of the First Isomorphism Theorem, which you’d see in a course in
abstract algebra.
Example. Find the nullity and a basis for the null space of the real matrix
 
1 2 0 3
1 2 1 −2  .
2 4 1 1

Let’s follow the steps in the algorithm. First, row reduce the matrix to row-reduced echelon form:
   
1 2 0 3 1 2 0 3
 1 2 1 −2  →  0 0 1 −5 
2 4 1 1 0 0 0 0

I’ll use w, x, y, and z as my solution variables. Thinking of the last matrix as representing equations
for a homogeneous system, I have

w + 2x + 3z = 0, or w = −2x − 3z,

y − 5z = 0, or y = 5z.
I’ve expressed the leading entry variables in terms of the free variables. Now I substitute for w and y
in the general solution vector (w, x, y, z):

w −2x − 3z −2 −3
       
x  x  1   0 
 = = x· +z· .

y 5z 0 5
z z 0 1

After substituting, I broke the resulting vector up into pieces corresponding to each of the free variables
x and z.
The equation above shows that every vector (w, x, y, z) in the null space can be written as a linear
combination of (−2, 1, 0, 0) and (−3, 0, 5, 1). Thus, these two vectors span the null space. They’re also
independent: Suppose
−2 −3 0
     
 1   0  0
x· +z·  =  .
0 5 0
0 1 0
Then
−2x − 3z 0
   
x  0
 =  .

5z 0

z 0
Looking at the second and fourth components, you can see that x = 0 and z = 0.
Hence, {(−2, 1, 0, 0), (−3, 0, 5, 1)} is a basis for the null space. The nullity is 2.
Notice also that the rank is 2, the number of columns is 4, and 4 = 2 + 2, which confirms the preceding
theorem.
Example. Consider the following matrix over Z3 :
 
1 1 0 2
A = 2 2 1 2.
1 1 1 0

Find bases for the row space, column space, and null space.

3
Row reduce the matrix:    
1 1 0 2 1 1 0 2
2 2 1 2 → 0 0 1 1
1 1 1 0 0 0 0 0
{(1, 1, 0, 2), (0, 0, 1, 1)} is a basis for the row space.
The leading entries occur in columns 1 and 3. Taking the first and third columns of the original matrix,
I find that {(1, 2, 1), (0, 1, 1)} is a basis for the column space.
Using a, b, c, and d as variables, I find that the row reduced matrix gives the equations

a + b + 2d = 0, or a = 2b + d,

c + d = 0, or c = 2d.
Thus,
a 2b + d 2 1
       
b  b  1 0
 =  = b ·  + d ·  .
c 2d 0 2
d d 0 1
Therefore, {(2, 1, 0, 0), (1, 0, 2, 1)} is a basis for the null space.

c 2023 by Bruce Ikenaga 4

3-6-2008
Linear Transformations
Definition. Let V and W be vector spaces over a field F . A linear transformation f : V → W is a
function which satisfies

f (ku + v) = kf (u) + f (v) for all u, v ∈ V, k ∈ F.

Note that u and v are vectors, whereas k is a scalar (number).

You can break the definition down into two pieces:

f (u + v) = f (u) + f (v) for all u, v ∈ V,

f (au) = kf (u) for all k ∈ F, u ∈ V.

Conversely, it is clear that if these two equations are satisfied then f is a linear transformation.
The notation f : F m → F n means that f is a function which takes a vector in F m as input and produces
a vector in F n as output.

Example. The function

f (x, y) = (x2 , y 2 , xy)
is a function f : R2 → R3 .
Sometimes, the outputs are given variable names, e.g.

(u, v, w) = f (x, y) = (x2 , y 2 , xy).

This is the same as saying

u = x2 , v = y2 , w = uv.
You plug numbers into f in the obvious way:

f (3, −2) = 32 , (−2)2 , 3 · (−2) = (9, 4, −6).

Since you can think of f as taking a 2-dimensional vector as its input and producing a 3-dimensional
vector as its output, you could write

(u, v, w) = f ((x, y)) = (x2 , y 2 , xy).

But I’ll supress some of the angle brackets when there’s no danger of confusion.

Example. Define f : R2 → R3 by

f (x, y) = (x + 2y, x − y, −2x + 3y).

I’ll show that f is a linear transformation the hard way. First, I need two 2-dimensional vectors:

u = (u1 , u2 ), v = (v1 , v2 ).

I also need a real number k.

Notice that I don’t pick specific numerical vectors like u = (4, −2) or a specific number for k, like k = 3.
I have to do the proof with general vectors and numbers.

1
I must show that
f (ku + v) = kf (u) + f (v).
I’ll compute the left side and the right side, then show that they’re equal. Here’s the left side:

f (ku + v) = f (k(u1 , u2 ) + (v1 , v2 )) = f ((ku1 + v1 , ku2 + v2 )) =

((ku1 + v1 ) + 2(ku2 + v2 ), (ku1 + v1 ) − (ku2 + v2 ), −2(ku1 + v1 ) + 3(ku2 + v2 )) =

(k(u1 + 2u2 ) + (v1 + 2v2 ), k(u1 − u2 ) + (v1 − v2 ), k(−2u1 + 3u2 ) + (−2v1 + 3v2 )).
And here’s the right side:

kf (u)+f (v) = kf ((u1 , u2 ))+f ((v1 , v2 )) = k(u1 +2u2 , u1 −u2 , −2u1 +3u2 )+(v1 +2v2 , v1 −v2 , −2v1 +3v2 ) =

(k(u1 + 2u2 ) + (v1 + 2v2 ), k(u1 − u2 ) + (v1 − v2 ), k(−2u1 + 3u2 ) + (−2v1 + 3v2 )).
Therefore, f (ku + v) = kf (u) + f (v), so f is a linear transformation.
This was a pretty disgusting computation, and it would be a shame to have to go through this every
time. I’ll come up with a better way of recognizing linear transformations shortly.

Example. The function

f (x, y) = (x2 , y 2 , xy)
is not a linear transformation from R2 to R3 .
To prove that a function is not a linear transformation — unlike proving that it is — you must come
up with specific, numerical vectors u and v and a number k for which the defining equation is false. There
is often no systematic way to come up with such a counterexample; you simply try some numbers till you
get what you want.
I’ll take k = 1, u = (1, 2), and v = (1, 1). This time, I want to show f (ku + v) 6= kf (u) + f (v); again, I
compute the two sides.

f (ku + v) = f (1 · (1, 2) + (1, 1)) = f ((2, 3)) = (22 , 32 , 2 · 3) = (4, 9, 6).

kf (u) + f (v) = 1 · f ((1, 2)) + f ((1, 1)) = (12 , 22 , 1 · 2) + (12 , 12 , 1 · 1) =

(1, 4, 2) + (1, 1, 1) = (2, 5, 3).
Since f (ku + v) 6= kf (u) + f (v), f is not a linear transformation.

Example. Let f : Rn → Rm be a function. f is differentiable at a point a ∈ Rn if there is a linear

transformation Df (a) : Rn → Rm such that

|f (a + h) − f (a) − Df (a)(h)|
lim = 0.
h→0 |h|

In this definition, if v ∈ Rm , then |v| is the length of v:

q
|v| = |(v1 , v2 , . . . , vm )| = v12 + v22 + · · · + vm
2 .

Since f produces outputs in Rm , you can think of f as being built out of m component functions.
Suppose that f = (f1 , f2 , . . . , fm ).
It turns out that the matrix of Df (a) (relative to the standard bases on Rn and Rm ) is the m × n matrix
whose (i, j)th entry is
∂fi
Dj f i = .
∂xj

2
This matrix is called the Jacobian matrix of f at a.
For example, suppose f : R2 → R2 is given by

f (x, y) = (x2 y 3 , x2 − y 5 ).

Then
2xy 3 3x2 y 2
Df (x, y) = .
2x −5y 4

The next lemma gives an easy way of constructing — or recognizing — linear transformations.
Theorem. Let F be a field, and let A be an n × m matrix over F . The function f : F m → F n given by

f (u) = A · u for u ∈ F m

is a linear transformation.
Proof. This is pretty easy given the rules for matrix arithmetic. Let u, v ∈ F m and let k ∈ F . Then

f (ku + v) = A · (ku + v) = A · (ku) + A · v = k(A · u) + A · v = kf (u) + f (v).

Therefore, f is a linear transformation.

This result says that any function which is defined by matrix multiplication is a linear transformation.
Later on, I’ll show that for finite-dimensional vector spaces, any linear transformation can be thought of as
multiplication by a matrix.

Example. Define f : R2 → R3 by

f (x, y) = (x + 2y, x − y, −2x + 3y).

I’ll show that f is a linear transformation the easy way. Observe that
 
1 2
x
f ((x, y)) =  1 −1  .
y
−2 3

f is given by multiplication by a matrix of numbers, exactly as in the lemma. ((x, y) is taking the place
of u.) So the lemma implies that f is a linear transformation.

Lemma. Let F be a field, and let V and W be vector spaces over F . Suppose that f : V → W is a linear
transformation. Then:
a. f (~0) = ~0.
b. f (−v) = −f (v) for all v ∈ V .
Proof. (a) Put u = ~0, v = ~0, and k = 1 in the defining equation for a linear transformation. Then

f (~0 + ~0) = f (~0) + f (~0), so f (~0) = f (~0) + f (~0).

Subtracting f (~0) from both sides, I get f (~0) = ~0.

3
(b) I know that −v = (−1) · v, so

f (−v) = f [(−1) · v] = (−1) · f (v) = −f (v).

The lemma gives a quick way of showing a function is not a linear transformation.

Example. Define g : R2 → R2 by
g(x, y) = (x + 1, y + 2).
Then
g(0, 0) = (1, 2) 6= (0, 0).
Since g does not take the zero vector to the zero vector, it is not a linear transformation.
Be careful! If f (~0) = ~0, you can’t conclude that f is a linear transformation. For example, I showed
that the function f (x, y) = (x2 , y 2 , xy) is not a linear transformation from R2 to R3 . But f (0, 0) = (0, 0, 0),
so it does take the zero vector to the zero vector.

Next, I want to prove the result I mentioned earlier: Every linear transformation on a finite-dimensional
vector space can be represented by matrix multiplication. I’ll begin by reviewing some notation.
Definition. The standard basis vectors for F m are

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ... em = (0, 0, 0, . . . , 1).

Thus, ei is an m-dimensional vector with a 1 in the ith position and 0’s elsewhere. For instance, in R3 ,

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).

I showed earlier that {e1 , e2 , . . . , em } is a basis for F m . This implies the following result.
Lemma. Every vector u ∈ F m can be written uniquely as

u = u1 e1 + u2 e2 + · · · + um em ,

where u1 , u2 , . . . , um ∈ W .

Example. In R3 ,

(π, −5, 17) = π · (1, 0, 0) + (−5) · (0, 1, 0) + 17 · (0, 0, 1) = π · e1 + (−5) · e2 + 17 · e3 .

Now here is the converse to the Theorem I proved earlier.

Theorem. If f : F m → F n is a linear transformation, there is an n × m matrix A such that

f (u) = A · u for all u ∈ F m .

Proof. Regard the standard basis vectors e1 , e2 , . . . , em as m-dimensional column vectors. Then f (e1 ),
f (e2 ), . . . , f (em ) are n-dimensional column vectors, because f produces outputs in F n . Take these m
n-dimensional column vectors f (e1 ), f (e2 ), . . . , f (em ) and build a matrix:
 
↑ ↑ ↑
A =  f (e1 ) f (e2 ) · · · f (em )  .
↓ ↓ ↓

4
I claim that A is the matrix I want. To see this, take a vector u ∈ Rm and write it in component form:

u1
 
 u2 
u = (u1 , u2 , . . . , um ) = 
 ...  .


By the standard basis vector lemma above,

u = u1 e1 + u2 e2 + · · · + um em .

Then I can use the fact that f is a linear transformation — so f of a sum is the sum of the f ’s, and
constants (like the ui ’s) can be pulled out — to write

f (u) = f (u1 e1 + u2 e2 + · · · + um em ) = u1 f (e1 ) + u2 f (e2 ) + · · · + um f (em ).

On the other hand,

u1
 
 
↑ ↑ ↑  u2 
A · u =  f (e1 ) f (e2 )  ...  = u1 f (e1 ) + u2 f (e2 ) + · · · + um f (em ).
· · · f (em )   
↓ ↓ ↓
um

To get the last equality, think about how matrix multiplication works.
Therefore, f (u) = A · u.
Linear transformations and matrices are not quite identical. If a linear transformation is like a person,
then a matrix for the transformation is like a picture of the person — the point being that there can be many
different pictures of the same person. You get different “pictures” of a linear transformation by changing
coordinates — something I’ll discuss later.

Example. Define f : R2 → R3 by

f (x, y) = (x + 2y, x − y, −2x + 3y).

I already know that f is a linear transformation, and I found its matrix by inspection. Here’s how it
would work using the theorem. I feed the standard basis vectors into f :

f (e1 ) = f (1, 0) = (1, 1, −2), f (e2 ) = f (0, 1) = (2, −1, 3).

I construct a matrix with these vectors as the columns:

 
1 2
A =  1 −1  .
−2 3

It’s the same matrix as I found by inspection.

You can combine linear transformations to obtain other linear transformations. First, I’ll consider sums
and scalar multiplication.
Definition. Let f, g : V → W be linear transformations of vector spaces over a field F , and let k ∈ F .

5
a. The sum f + g of f and g is the function V → W which is defined by

(f + g)(v) = f (v) + g(v) for all v ∈ V.

b. The product k · f of k and f is the function V → W which is defined by

(k · f )(v) = k · f (v) for all v ∈ V.

Lemma. Let f, g : V → W be linear transformations of vector spaces over a field F , and let k ∈ F .

a. f + g is a linear transformation.

b. k · f is a linear transformation.

Proof. I’ll prove the first part by way of example and leave the proof of the second part to you.
Let u, v ∈ V and let k ∈ F . Then

(f + g)(ku + v) = f (ku + v) + g(ku + v) = [kf (u) + f (v)] + [kg(u) + g(v)] =

k[f (u) + g(u)] + [f (v) + g(v)] = k(f + g)(u) + (f + g)(v).

Therefore, f + g is a linear transformation.

If f : X → Y and g : Y → Z are functions, the composite of f and g is

(g ◦ f )(x) = g(f (x)).

Y Z
g
f
X

g of

The ◦ between the g and f does not mean multiplication, but it’s so easy to confuse that I’ll usually
just write “g(f (x))” when I want to compose functions.
Note that things go from left to right in the picture, but that they go right to left in “g(f (x))”. The
effect is to do f first, then g.

Lemma. Let f : U → V and g : V → W be linear transformations of vector spaces over a field F . Then
g ◦ f : U → W is a linear transformation.

Proof. Let u1 , u2 ∈ U and let k ∈ F . Then

(g ◦ f )(ku1 + u2 ) = g(f (ku1 + u2 )) = g(kf (u1 ) + f (u2 )) = kg(f (u1 )) + g(f (u2 )) = k(g ◦ f )(u1 ) + (g ◦ f )(u2 ).

Hence, g ◦ f : U → W is a linear transformation.

Now suppose f : F m → F n and g : F n → F p are linear transformations.

f g
Rm −→ Rn −→ Rp

6
f and g can be represented by matrices; I’ll use [f ] for the matrix of f and [g] for the matrix of g, so

f (u) = [f ] · u for all u ∈ Rm ,

g(v) = [g] · v for all v ∈ Rn .

What’s the matrix for the composite g ◦ f ?

g(f (u)) = [g] · (f (u)) = [g] · ([f ] · u) = ([g] · [f ]) · u.

The matrix for g ◦ f is [g] · [f ], the product of the matrices for f and g.

Example. Suppose

f (x, y) = (x + y, 2x + 5y, −3y) and g(u, v, w) = (u + v − w, 2u + 3w, −u − v + 6w)

are linear transformations. Thus, f : R2 → R3 and g : R3 → R3 .

Write them in matrix form:
       
1 1 u 1 1 −1 u
x x
f = 2 5  , g   v  =  2 0 3  v .
y y
0 −3 w −1 −1 6 w

The matrix for the composite transformation g ◦ f is

    
1 1 −1 1 1 3 9
 2 0 3 2 5  =  2 −7  .
−1 −1 6 0 −3 −3 −24

To write it out in equation form, multiply:

   
3 9 3x + 9y
x x
(g ◦ f ) = 2 −7  =  2x − 7y  .
y y
−3 −24 −3x − 24y

That is,
(g ◦ f )(x, y) = 3x + 9y, 2x − 7y, −3x − 24y).

Example. The idea of composing transformations can be extended to affine transformations. For the
sake of this example, you can think of an affine transformation as a linear transformation plus a translation
(a constant). This provides a powerful way of doing various geometric constructions.
For example, I wanted to write a program to generate self-similar fractals. There are many self-similar
fractals, but I was interested in those that can be constructed in the following way. Start with an initiator,
which is a collection of segments in the plane. For example, this initiator is a square:

7
Next, I need a generator. It’s a collection of segments which start at the point (0, 0) and end at the
point (1, 0). Here’s a generator shaped like a square hump:
(1/3,1/3) (2/3,1/3)

(0,0) (1/3,0) (2/3,0) (1,0)

The construction proceeds in stages. Start with the initiator and replace each segment with a scaled
copy of the generator. (There is an issue here of which way you “flip” the generator when you copy it, but
I’ll ignore this for simplicity.) Here’s what you get by replacing the segments of the square with copies of
the square hump:

Now keep going. Take the current figure and replace all of its segments with copies of the generator.
And so on. Here’s what you get after around 4 steps:

Roughly, self-similarity means that if you enlarge a piece of the figure, the enlarged piece looks like
the original. If you imagine carrying out infinitely many steps of the construction above, you’d get a figure
which would look “the same” no matter how much you enlarged it — which is a very crude definition of a
fractal. If you’re interested in this stuff, you should look at Benoit Mandelbrot’s classic book ([1]) — it has
great pictures!
What does this have to do with transformations? The idea is that to replace a segment of the current
figure with a scaled copy of the generator, you need to stretch the generator, rotate it, then translate it.
Here’s a picture with a different initiator and generator:

stretch + rotate + translate

8
Stretching by a factor of k amounts to multiplying by the matrix

k 0
.
0 k

This is a linear transformation.

Rotation through an angle θ is accomplished by multiplying by

cos θ sin θ
.
− sin θ cos θ

This is also a linear transformation.

Finally, translation by a vector (a, b) amounts to adding the vector (a, b). This is not a linear transfor-
mation (unless the vector is zero).
Thus, to stretch by k, rotate through θ, and then translate by (a, b), you should do this:

a cos θ sin θ k 0 x
+ .
b − sin θ cos θ 0 k y

By thinking of the operations as linear or affine transformations, it is very easy to write down the
formula.

If f : V → W is a linear transformation, the inverse of f is a linear transformation f −1 : W → V

which satisfies
(f −1 ◦ f )(v) = v for all v ∈ V,
(f ◦ f −1 )(w) = w for all w ∈ W.
That is, f −1 is a linear transformation which “undoes” the effect of f . If f can be represented by matrix
multiplication, it’s natural to suppose that if A is the matrix of f , then A−1 is the matrix of f −1 — and it’s
true.
Lemma. If A and B are n × n matrices and

ABu = u and BAu = u for all u ∈ F n,

then A and B are inverses — that is, B = A−1 .

Proof. Since ABu = u for any vector u,
1 1
   
0 0
 ...  =  ...  ,
AB    

0 0
0 0
   
1 1
 ...  =  ...  ,
AB    

0 0
..
.
0 0
   
0 0
 ...  =  ...  .
AB    

1 1

9
Put the vectors in the equations above into the columns of a matrix. They form the identity:

1 0 ··· 0 1 0 ··· 0
   
0 1 ··· 0 0 1 ··· 0
AB  ... ... ..  =. .
.   .. ..
.. 
.
, or ABI = I, so AB = I.
0 0 ··· 1 0 0 ··· 1

The same reasoning shows BA = I, so A and B are inverses.

Proposition. If f : F n → F n is a linear transformation whose matrix is A and f −1 is the inverse of f , then
the matrix of f −1 is A−1 .
Remark. If I use [f ] to denote the matrix of the linear transformation f , this result can be expressed more
concisely as −1
f = [f ]−1 .
Proof. Let A be the matrix of f and let B be the matrix of f −1 . Since f and f −1 are inverses, for all
u ∈ F n,
f −1 (f (u)) = u, so BAu = u.
Similarly, f (f −1 (u)) = u gives ABu = u for all u ∈ F n . By the lemma, A and B are inverses, which is
what I wanted to prove.
Of course, a linear transformation may not be invertible, and the last result gives an easy test — a
linear transformation is invertible if and only if its matrix is invertible. You should know almost half a dozen
conditions which are equivalent to the invertibility of a matrix.

Example. Suppose f : R2 → R2 is given by

f (x, y) = (x + 3y, x + 4y).

The matrix of f is
1 3
.
1 4
Therefore, the matrix of f −1 is
4 −3
.
−1 1
Hence, the inverse transformation is

f −1 (u, v) = (4u − 3v, −u + v).

So, for example,

f −1 (f (x, y)) = f −1 (x + 3y, x + 4y) = (4(x + 3y) − 3(x + 4y), −(x + 3y) + (x + 4y)) = (x, y).

You can check that f (f −1 (u, v)) = (u, v) as well.

Example. Let R2 [x] denote the set of polynomials with real coefficients of degree 2 or less. Thus,

R2 [x] = {ax2 + bx + c | a, b, c ∈ R}.

Here are some elements of R2 [x]:

x2 + 3x + 2, −7x2 , 0, 42x − 5.

10
You can represent polynomials in R2 [x] by vectors. For example, here is how to represent x2 + 3x + 2
as a vector:
( 2 , 3 , 1 )
↑ ↑ ↑
2 + 3x + x2

I’m writing the coefficients with the powers increasing to make it easy to extend this to higher powers.
For example,
( 7 , −1 , 4 , 5 )
↑ ↑ ↑ ↑
7 − x + 4x2 + 5x3

Returning to R2 [x], there is a function D : R2 [x] → R2 [x] defined by

D((c, b, a)) = (b, 2a, 0).

Do you see what it is?

If I switch back to polynomial notation, it is

D(c + bx + ax2 ) = b + 2ax.

Of course, D is just differentiation.

Now you know from calculus that:

• The derivative of a sum is the sum of the derivatives.

• Constants can be pulled out of derivatives.

This means that D is a linear transformation R2 [x] → R2 [x]. What is its matrix?
To find the matrix of a linear transformation (relative to the standard basis), apply the transformation
to the standard basis vectors. Use the results as the columns of your matrix.
In vector form, R2 [x] is just R3 , so the standard basis vectors are

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).

As I apply D, I’ll translate to polynomial notation to make it easier for you to follow:

D((1, 0, 0)) = D(1) = 0 = (0, 0, 0).

D((0, 1, 0)) = D(x) = 1 = (1, 0, 0).

D((0, 0, 1)) = D(x2 ) = 2x = (0, 2, 0).

Therefore, the matrix of D is

 
0 1 0
0 0 2.
0 0 0

Example. Construct a linear transformation which maps the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 to the
parallelogram in R3 determined by the vectors (2, −3, 6) and (1, −1, 1).

11
The idea is to send vectors for the square’s sides — namely, (1, 0) and (0, 1) — to the target vectors
(2, −3, 6) and (1, −1, 1). If I do this with a linear transformation, the rest of the square will go with them.

The following matrix does it:

       
2 1 2 2 1 1
 −3 −1  1 =  −3  ,  −3 −1  0 =  −1  .
0 1
6 1 6 6 1 1

Therefore, the transformation is

 
2 1
x x
f =  −3 −1  .
y y
6 1

[1] Benoit Mandelbrot, The fractal geometry of nature. New York: W. H. Freeman and Company, 1983.
[ISBN 0-7167-1186-9]

c 2008 by Bruce Ikenaga 12

3-5-2008
Coordinates and Change of Basis
Let V be a vector space and let B be a basis for V . Every vector v ∈ V can be uniquely expressed as a
linear combination of elements of B:

v = a1 v1 + · · · + an vn , vi ∈ B, ai ∈ F.

(Let me remind you of why this is true. Since a basis spans, every v ∈ V can be written in this way.
On the other hand, if a1 v1 + · · · + an vn = a′1 v1 + · · · + a′n vn are two ways of writing a given vector, then
(a1 − a′1 )v1 + · · · (an − a′n )vn = 0, and by independence a1 − a′1 = 0, . . . , an − a′n = 0 — that is, a1 = a′1 ,
. . . , an = a′n . So the representation of a vector in this way is unique.)
Consider the situation where B is a finite ordered basis — that is, fix a numbering v1 , . . . , vn of the
elements of B. If v = a1 v1 + · · · + an vn , the ordered list of coefficients (a1 , . . . , an ) is uniquely associated
with v. The {ai } are the components of v with respect to the (ordered) basis B; I will use the notation

v = (a1 , . . . , an )B .

It is easy to confuse a vector with the representation of the vector in terms of its components relative
to a basis. This confusion arises because representation of vectors which is most familiar is that of a vector
as an ordinary n-tuple in Rn :
Rn = {(a1 , . . . , an ) | ai ∈ R}.
This amounts to identifying the elements of Rn with their representation relative to the standard basis

e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1).

Example. (a) Show that      

 1 2 3 
B =  0  ,  −1  ,  1 
1 1 0
 

is a basis for R3 .
These are three vectors in R3 , which has dimension 3. Hence, it suffices to check that they’re indepen-
dent. Form the matrix with the elements of B as its rows and row reduce:
   
1 0 1 1 0 0
2 −1 1  →  0 1 0 
3 1 0 0 0 1

The vectors are independent. Three independent vectors in R3 must form a basis.
(b) Find the components of (15, −1, 2) relative to B.
I must find numbers a, b, and c such that
       
15 1 2 3
 −1  = a ·  0  + b ·  −1  + c ·  1  .
2 1 1 0

1
This is equivalent to the matrix equation
    
1 2 3 a 15
 0 −1 1   b  =  −1  .
1 1 0 c 2

Set up the matrix for the system and row reduce to solve:
   
1 2 3 15 1 0 0 −2
 0 −1 1 −1  → 0 1 0 4 
1 1 0 2 0 0 1 3

This says a = −2, b = 4, and c = 3. Therefore, (15, −1, 2) = (−2, 4, 3)B .

(c) Write (7, −2, 2)B in terms of the standard basis.
I’ll write vB for v relative to B and vstd for v relative to the standard basis. The matrix equation in (b)
    
1 2 3 a 15
 0 −1 1   b  =  −1 
1 1 0 c 2

says  
1 2 3
 0 −1 1  vB = vstd .
1 1 0
In (b), I knew vstd and I wanted vB ; this time it’s the other way around. So I simply put (7, −2, 2)B
into the vB spot and multiply:     
1 2 3 7 9
0 −1 1   −2  =  4  .
1 1 0 2 5

Let me generalize the observation I made in (c).

• If B is a basis for Rn — whose elements are written in terms of the standard basis, of course — and
M is the matrix whose columns are the vectors in B, then left multiplication by M translates vectors
written in terms of B to vectors written in terms of the standard basis.

M vB = vstd .

I’ll write [B → std] for M , and call it a translation matrix. Again, [B → std] translates vectors
written in terms of B to vectors written in terms of the standard basis.
The inverse of a square matrix M is a matrix M −1 such that M M −1 = M −1 M = I, where I is the
identity matrix. If I multiply the last equation on the left by M −1 , I get

M −1 M vB = M −1 vstd , or vB = M −1 vstd .

In words, this means:

• Left multiplication by M −1 translates vectors from the standard basis to B.
This means that M −1 = [std → B]. Dispensing with M , I can say that

[std → B] = [B → std]−1 .

2
In the example above, left multiplication by the following matrix translates vectors from B to the
standard basis:  
1 2 3
[B → std] =  0 −1 1  .
1 1 0
The inverse of is  1 3 5 
−
 4 4 4 
−1
 1 3 1
[std → B] = [B → std] =   − −  .
 4 4 4
1 1 1

−
4 4 4
Left multiplication by this matrix translates vectors from the standard basis to B.

Example. (Translating vectors from one basis to another) The translation analogy is a useful one,
since it makes it easy to see how to set up arbitrary changes of basis.
For example, suppose      
 1 2 1 
B′ =  −1  ,  0  ,  1 
2 1 1
 

is another basis for R3 .

Here’s how to translate vectors from B′ to B:

[B′ → B] = [standard → B][B′ → standard].

Remember that the product [standard → B][B′ → standard] is read from right to left! Thus, the
composite operation [standard → B][B′ → standard] translates a B′ vector to a standard vector, and then
translates the resulting standard vector to a B vector. Moreover, I have matrices which perform each of the
right-hand operations.
This matrix translates vectors from B′ to the standard basis:
 
1 2 1
 −1 0 1  .
2 1 1

This matrix translates vectors from the standard basis to B:

 1 3 5 
−
 4 4 4 
 1 3 1

 4 − −  .
4 4
1 1 1
 
−
4 4 4
Therefore, multiplication by the following matrix will translate vectors from B′ to B:
 1 3 5   3 3 7 
−
 4 4 4  1 2 1

 2 4 4 
 1 3 1   1 1 3
 − −  −1 0 1 =  −  .
 4 4 4 2 1 1
 2 4 4
1 1 1 1 1 1
  
− −
4 4 4 2 4 4

c 2008 by Bruce Ikenaga 3

5-26-2012
Representing Linear Transformations by Matrices
Let f : V → W be a linear transformation of finite dimensional vector spaces. Choose ordered bases
B = {v1 , . . . , vn } for V and C = {w1 , . . . , wm } for W .
For each j, f (vj ) ∈ W . Therefore, f (vj ) may be written uniquely as a linear combination of elements
of C: m
X
f (vj ) = aij wi .
i=0

The numbers {aij } are uniquely determined by f . The m × n matrix A = (aij ) is the matrix of f
relative to the ordered bases B and C. I’ll use [f ]B,C to denote this matrix. Here’s how to find it.
• To find [f ]B,C , take an element vj in the basis B, apply f to vj , and express the result as a linear
combination of elements of C The coefficients in the linear combination make up the j th column of
[f ]B,C .
B
 
↑ ↑ ↑
C  f (v1 ) f (v2 ) · · · f (vn ) 
↓ ↓ ↓
I’ll use std to denote the standard basis for F n .

Example. Here are two bases for R2 :

1 1
B= , ,
3 2

1 1
C= , .
1 −1

Suppose f : R2 → R2 is a linear transformation such that

1 1 1 1 1 1
f =6· + 11 · and f = (−13) · +π· .
3 1 −1 2 1 −1

Then
6 −13
[f ]B,C = .
11 π
Read the description of [f ]B,C preceding this example and verify that [f ]B,C was constructed by following
the steps in the description.

Example. (a) Define f : R2 → R3 by

f (x, y) = (x + 2y, 3x, −x + 5y).

Find [f ]std,std .
Apply f to the elements of the standard basis for R2 , and write the results in terms of the standard
basis for R3 :
f (1, 0) = (1, 3, −1) = 1 · (1, 0, 0) + 3 · (0, 1, 0) + (−1) · (0, 0, 1),
f (0, 1) = (2, 0, 5) = 2 · (1, 0, 0) + 0 · (0, 1, 0) + 5 · (0, 0, 1).

1
Take the coefficients in the linear combinations and use them to make the columns of the matrix:
 
1 2
[f ]std,std =  3 0  .
−1 5

Note that in matrix form,  

1 2
x x
f =  3 0 .
y y
−1 5
In other words, [f ]std,std is the same matrix as the one you’d usually use to represent f by matrix
multiplication.
(b) Let B = {(3, 1), (2, 1)}. Define f : R2 → R2 by

f (x, y) = (2x − y, 5x + y).

Find [f ]std,B .
Apply f to the elements of the standard basis for R2 , and write the results in terms of B:

f (1, 0) = (2, 5) = (−8) · (3, 1) + 13 · (2, 1),

f (0, 1) = (−1, 1) = (−3) · (3, 1) + 4 · (2, 1).

Take the coefficients in the linear combinations and use them to make the columns of the matrix:

−8 −3
[f ]std,B = .
13 4

Here is a description of [f ]B,C in words:

• Left multiplication by [f ]B,C takes a vector written in terms of B, applies f , and writes the result in
terms of C.
If you keep this in mind, change of coordinates will make much more sense.
I’ll verify the claim above for one of the basis elements vj . In terms of B,

vj = (0, . . . , 0, 1, 0, . . . , 0).
j

Then
0
 
a
11 ··· a1j ··· a1n 
 .. 
 a1j 
 a21 ··· a2j ··· a2n  .   a2j 
 . .. ..  1 =  .. .

 . 
. 
. . . .

 .. 
am1 · · · amj · · · amn amj
0
m
X m
X
This is correct, since f (vj ) = aij wi , and the representation of aij wi in terms of the basis C =
i=0 i=0
{w1 , . . . , wm } is
(a1j , a2j , . . . , amj ).
The matrix of a linear transformation is like a snapshot of a person — there are many pictures of a
person, but only one person. Likewise, a given linear transformation can be represented by matrices with
respect to many choices of bases for the domain and range.

2
In the last example, finding [f ]std,std turned out to be easy, whereas finding the matrix of f relative to
other bases is more difficult. Here’s how to use change of basis matrices to make things simpler.
Suppose you have bases B and C and you want [f ]B,C .
1. Find [f ]std,std . Usually, you can find this from the definition.
2. Find the change of basis matrices [B → std] and [C → std]. (Take the basis elements written in terms of
the standard bases and use them as the columns of the matrices.)
3. Find [std → C] = [C → std]−1 .
4. Then
[f ]B,C = [std → C][f ]std,std [B → std].
Do you see why this works? Reading from right to left, an input vector written in terms of B is translated
to the standard basis by [B → std]. Next, [f ]std,std takes the standard vector, applies f , and writes the output
as a standard vector. Finally, [std → C] takes the standard vector output and translates it to a C vector.
I’ll illustrate this in the next example.

Example. Define f : R2 → R3 by
   
x+y 1 1
x x
f =  2x − y  =  2 −1  .
y y
x−y 1 −1

The matrix above is the matrix of f relative to the standard bases of R2 and R3 .
Next, consider the following bases for R2 and R3 , respectively:

2 1
B= , ,
1 1
     
 1 1 −1 
C = 1,2, 0  .
1 1 −2
 

I’ll find the matrix [f ]B,C of f relative to B and C. Here’s how:

[f ]B,C = [std → C][f ]std,std [B → std].

This matrix translates vectors in R2 from B to the standard basis:

2 1
[B → std] = .
1 1

This matrix translates vectors in R3 from C to the standard basis:

 
1 1 −1
[C → std] =  1 2 0  .
1 1 −2

Hence, the inverse matrix translates vectors from the standard basis to C:
 −1  
1 1 −1 4 −1 −2
[std → C] =  1 2 0  =  −2 1 1 .
1 1 −2 1 0 −1

3
Therefore,     
4 −1 −2 1 1 7 7
2 1
[f ]B,C =  −2 1 1   2 −1  =  −2 −3  .
1 1
1 0 −1 1 −1 2 2

Example. (a) Suppose T : R2 → R2 satisfies

1 1 0 1
T = and T = .
0 1 1 −1

3
What is T ?
−5
Write (3, −5) as a linear combination of (1, 0) and (0, 1):

3 1 0
=3· + (−5) · .
−5 0 1

The numbers are simple enough that I could figure out the linear combination by inspection.
Apply T :

3 1 0 1 0
T =T 3· + (−5) · =3·T + (−5) · T =
−5 0 1 0 1

1 1 3 −5 −2
3· + (−5) · = + = .
1 −1 3 5 8

(b) B = {(1, 1), (1, −1)} is a basis for R2 . Suppose T : R2 → R2 satisfies

1 2 1 1
T = and T = .
1 −1 −1 2

3
What is T ?
−5
Consider the equations for T above. T is applied to the elements of B, and the results are written in
terms of the standard basis. Thus,
2 1
[T ]B,std = .
−1 2
Since I’m applying T to the standard vector (3, −5), I have to translate this to a B vector to use the
T -matrix I found.
−1
1 1 1 1 1 1 1
[B → std] = , so [std → B] = = .
1 −1 1 −1 2 1 −1

Therefore,
3 1 1 1 3 −1
= = .
−5 2 1 −1 −5 4 B
Hence,
3 2 1 −1 2
T = = .
−5 −1 2 4 B
9

4
Example. Suppose T : R2 → R2 is given by
T (x, y) = (4x − 2y, x + y).
Let
5 1
B= , .
3 1
(a) Find [T ]std,std .
4 −2
[T ]std,std = .
1 1
(b) Find [T ]B,B .
First,
5 1
[B → std] = .
3 1
Hence,
−1
−1 5 1 1 1 −1
[std → B] = [B → std] = = .
3 1 2 −3 5
Then
1 1 −1 4 −2 5 1 3 0
[T ]B,B = [std → B][T ]std,std [B → std] = = .
2 −3 5 1 1 3 1 −1 2
(c) Compute [T ((4, 3)B )]B .
This means: Apply T to the vector (4, 3)B and write the result in terms of B.

3 0 4 12
[T ((4, 3)B )]B = [T ]B,B · (4, 3)B = = .
−1 2 3 2 B

Example. Here are two bases for R2 :

1 1 3 2
B= , and C= , .
1 −1 1 1
Suppose T : R2 → R2 is defined by

1 3 2 1 3 2
T =3· + (−4) · and T = (−2) · +3· .
1 1 1 −1 1 1
(a) Find [T ]B,C .
1 1
T = (3, −4)C and T = (−2, 3).
1 −1
Make a matrix using these coordinate vectors as the columns:

3 −2
[T ]B,C = .
−4 3
(b) Find [T ]std,std .
[T ]std,std = [C → std][T ]B,C [std → B].
Find the translation matrices:
3 2
[C → std] = ,
1 1
−1
1 1 1 1 1 1 1
[B → std] = , so [std → B] = = .
1 −1 1 −1 2 1 −1
Therefore,
3 2 3 −2 1 1 1 1 1 1
[T ]std,std = · · = .
1 1 −4 3 2 1 −1 2 0 −2

c 2012 by Bruce Ikenaga 5

7-29-2024
Complex Numbers
In this section, I’ll review basic properties of the complex numbers. The emphasis will be on the
computational tools needed to use complex numbers in linear algebra.
A complex number is a number of the form a + bi, where a and b are real numbers. a is called the
real part and b is called the imaginary part; the notation is

re(a + bi) = a, im(a + bi) = b.

i is a special symbol, sometimes called the complex or imaginary unit. It behaves somewhat like a
variable, but it has a special multiplication property we’ll see below.
The word “imaginary” is traditional, but not very good: It might lead you to believe that there’s
something “fake” about complex numbers, as opposed to the real numbers. In fact, the complex numbers
are a number system like the real numbers or the integers, and there is nothing “fake” about them: I’ll
sketch a construction of the complex numbers using matrices below.
You can picture a complex number a+ bi as a point in the x-y-plane; we think of “a” as the x-coordinate
and bi (or b) as the y-coordinate.

a+bi

Thus, the y-axis is an “i-axis”.

The standard notation for the complex numbers is C.
You add, subtract, and multiply complex numbers in the following ways:

(a + bi) + (c + di) = (a + c) + (b + d)i,

(a + bi) − (c + di) = (a − c) + (b − d)i,

(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
In the last equation, take a = c = 0 and b = d = 1. The equation becomes

i · i = −1.

This means that i2 = −1, so i is described as “the square root of −1”. You might object that −1 doesn’t
have a square root. What is true is that no real number is a square root of −1, but the complex numbers
are a different number system.
You can divide by (nonzero) complex numbers by “multiplying the top and bottom by the conjugate”:
a + bi a + bi c − di (ac + bd) + (−ad + bc)i
= · = .
c + di c + di c − di c2 + d2
With these operations, the set of complex numbers forms a field.
At this point, you might be a bit suspicious — how can we just make up a bunch of symbols and
operations and call the result “the complex numbers”? Here’s a sketch which describes how you can “build”
the complex numbers using matrices. The complex number a + bi will correspond to the real 2 × 2 matrix

a b
.
−b a

1
Setting a = 0 and b = 1, we see that the complex number i should correspond to the matrix

0 1
.
−1 0

The operations will be matrix addition and multiplication. Let’s try them out:

a b c d a+c b+d
+ = ,
−b a −d c −b − d a + c

a b c d ac − bd ad + bc
= .
−b a −d c −ad − bc ac − bd
Notice that we’re getting the same expressions we gave above for addition and multiplication of complex
numbers. Thus, if you’re comfortable with matrices and matrix arithmetic, you can think of “a+bi” notation
as shorthand for the matrix equivalents above. In an abstract algebra course, you’ll probably see another
construction of the complex numbers using quotient rings. In linear algebra, we’ll just use the “a + bi”
form of complex numbers, as it is the simplest for computations.
To continue with our discussion of complex number arithmetic, I’ll note here that the conjugate of
complex number is obtained by flipping the sign of the imaginary part. The conjugate of a + bi is denoted
a + bi or sometimes (a + bi)∗ . Thus,
a + bi = a − bi.
The norm of a complex number is
p
|a + bi| = a2 + b 2 .

Note that
(a + bi)(a + bi) = (a + bi)(a − bi) = a2 + b2 = |a + bi|2 .
When a complex number is written in the form a + bi, it’s said to be in rectangular form. There is
another form for complex numbers that is useful: The polar form reiθ . In this form, r and θ have the same
meanings that they do in polar coordinates.
DeMoivre’s formula relates the polar and rectangular forms:

eiθ = cos θ + i sin θ.

This key result can be proven, for example, by expanding both sides in power series. Using this, we get

reiθ = r cos θ + i · r sin θ.

This relates the rectangular and polar forms of a complex number.

Note also that p
|eiθ | = | cos θ + i sin θ| = (cos θ)2 + (sin θ)2 = 1.
Thus, eiθ is a complex number of norm 1.

2
Example. Convert 3 + 4i to polar form.

3 + 4i 3 4
3 + 4i = |3 + 4i| · =5 + i .
|3 + 4i| 5 5
4 3
Let θ = sin−1 (or cos−1 ). Then
5 5
3 + 4i = 5(cos θ + i sin θ).

In the examples that follow, we’ll the polar and complex exponential forms of complex numbers to
simplify algebraic computations, derive trig identities, and compute integrals.
√
Example. (A trick with Demoivre’s formula) Find (1 + i 3)8 .
It would be tedious to try to multiply this out. Instead, I’ll try to write the expression in terms of
cos θ + i sin θ for a good choice of θ.
√ !8
√ 8 8 1 3 π π 8
(1 + i 3) = 2 · +i = 256 cos + i sin = 256(eπi/3 )8 = 256e8πi/3 =
2 2 3 3
√ !
√

8π 8π 2π 2π 1 3
256 cos + i sin = 256 cos + i sin = 256 − + i = −128 + 128i 3.
3 3 3 3 2 2

Example. (Proving trig identities) Prove the angle addition formulas:

cos(a + b) = cos a cos b − sin a sin b.
sin(a + b) = sin a cos b + sin b cos a.
I have
e(a+b)i = eai ebi
cos(a + b) + i sin(a + b) = (cos a + i sin a)(cos b + i sin b)
cos(a + b) + i sin(a + b) = (cos a cos b − sin a sin b) + i(sin a cos b + sin b cos a)
Equating real and imaginary parts on the left and right sides, I get
cos(a + b) = cos a cos b − sin a sin b (real parts).
sin(a + b) = sin a cos b + sin b cos a (imaginary parts).
Z
Example. (Computing integrals) Compute e2x cos 3x dx.

Note that cos 3x = re(e3xi ). Thus,

1
Z Z Z Z
e cos 3x dx = e re(e ) dx = re e e dx = re e(2+3i)x dx = re
2x 2x 3xi 2x 3xi
e(2+3i)x + c =
2 + 3i
2 − 3i 2x 1 2x
re e (cos 3x + i sin 3x) + c = e re(2 − 3i)(cos 3x + i sin 3x) + c =
13 13
1 2x 1 2x
e re ((2 cos 3x + 3 sin 3x) + i(2 sin 3x − 3 cos 3x)) + c = e (2 cos 3x + 3 sin 3x) + c.
13 13
Suppose we have a polynomial of degree n with real coefficients:
an xn + an−1 xn−1 + · · · + a1 x + a0 , ai ∈ R.
The Fundamental Theorem of Algebra says that the polynomial has n roots which are real or
complex numbers, some of which may be repeated. For example, the roots of x3 − 8 are
√ √
2, −1 + 3, −1 − 3.
In general, finding the roots may require numerical approximation. You may see a proof of this theorem
in advanced algebra or analysis courses; we’ll take the result for granted.

c 2024 by Bruce Ikenaga 3

4-6-2011
Eigenvalues and Eigenvectors
Definition. Let A ∈ Mn (F ). The characteristic polynomial of A is

p(λ) = det(A − λI).

(I is the n × n identity matrix.)

A root of the characteristic polynomial is called an eigenvalue (or a characteristic value) of A.
While the entries of A come from the field F , it makes sense to ask for the roots of p(λ) in an extension
field E of F . For example, if A is a matrix with real entries, you can ask for the eigenvalues of A in R or in
C.

Example. Consider the matrix

0 1
A= .
−1 0
The characteristic polynomial is x2 + 1. Hence, A has no eigenvalues in R. Its eigenvalues in C are
λ = ±i.

Example. Let  
2 −3 1
A = 1 −2 1  ∈ M (3, R).
1 −3 2
You can use row and column operations to simplify the computation of det(A − λI):

2−λ −3 1 2−λ −3 1
1 −2 − λ 1 = 0 1−λ −1 + λ
1 −3 2−λ 1 −3 2−λ
r2 → r2 − r3
2−λ −3 −2
= 0 1−λ 0 .
1 −3 −1 − λ
c3 → c3 + c2

(Adding a multiple of a row or a column to a row or column, respectively, does not change the determi-
nant.) Now expand by cofactors of the second row:

(1 − λ) ((2 − λ)(−1 − λ) + 2) = (1 − λ)(λ2 − λ) = −λ(λ − 1)2 .

The eigenvalues are λ = 0, λ = 1 (double).

Example. A matrix A ∈ Mn (F ) is upper triangular if Aij = 0 for i > j. Thus, the entries below the main
diagonal are zero. (Lower triangular matrices are defined in an analogous way.)
The eigenvalues of a triangular matrix

λ1 ∗ ··· ∗
 
 0 λ2 ··· ∗ 
A=
 ... .. .. .. 
. . . 
0 0 · · · λn

1
are just the diagonal entries λ1 , . . . , λn . (You can prove this by induction on n.)

Remark. To find the eigenvalues of a matrix, you need to find the roots of the characteristic polynomial.
There are formulas for finding the roots of polynomials of degree ≤ 4. (For example, the quadratic
formula gives the roots of a quadratic equation ax2 + bx + c = 0.) However, Abel showed in the early part
of the 19-th century that the general quintic is not solvable by radicals. (For example, 2x5 − 5x4 + 5 is
not solvable by radicals over Q.) In the real world, the computation of eigenvalues often requires numerical
approximation.
If λ is an eigenvalue of A, then det(A − λI) = 0. Hence, the n × n matrix A − λI is not invertible. It
follows that A−λI must row reduce to a row reduced echelon matrix R with fewer than n leading coefficients.
Thus, the system Rx = 0 has at least one free variable, and hence has more than one solution. In particular,
Rx = 0 — and therefore, (A − λI)x = 0 — has at least one nonzero solution.
Definition. Let A ∈ Mn (F ), and let λ be an eigenvalue of A. An eigenvector (or a characteristic vector)
of A for λ is a nonzero vector v ∈ F n such that

Av = λv.

Equivalently,
(A − λI)v = 0.

Example. Let  
2 −3 1
A = 1 −2 1  ∈ M (3, R).
1 −3 2
The eigenvalues are λ = 0, λ = 1 (double).
First, I’ll find an eigenvector for λ = 0.
 
2 −3 1
A − 0 · I = 1 −2 1  .
1 −3 2

I want v = (a, b, c) such that     

2 −3 1 a 0
 1 −2 1   b  =  0  .
1 −3 2 c 0
You can solve the system by row reduction. Since the column of zeros on the right will never change,
it’s enough to row reduce the 3 × 3 matrix on the right.
   
2 −3 1 1 0 −1
 1 −2 1  →  0 1 −1 
1 −3 2 0 0 0

This says
a − c = 0
b − c = 0
Therefore, a = c, b = c, and the eigenvector is
     
a c 1
 b  = c = c1.
c c 1

2
Notice that this is the usual algorithm for finding a basis for the solution space of a homogeneous system
(or the null space of a matrix).
I can set c to any nonzero number. For example, c = 1 gives the eigenvector (1, 1, 1). Notice that there
are infinitely many eigenvectors for this eigenvalue, but all of these eigenvectors are multiples of (1, 1, 1).
Likewise,    
1 −3 1 1 −3 1
A − I =  1 −3 1  →  0 0 0 
1 −3 1 0 0 0
Hence, the eigenvectors are
       
a 3b − c 3 −1
 b  =  b  = b1 + c 0 .
c c 0 1
Taking b = 1, c = 0 gives (3, 1, 0); taking b = 0, c = 1 gives (−1, 0, 1). This eigenvalue gives rise to two
independent eigenvectors.
Note, however, that a double root of the characteristic polynomial need not give rise to two independent
eigenvectors.

Definition. Matrices A, B ∈ M (n, F ) are similar if there is an invertible matrix P ∈ M (n, F ) such that
P AP −1 = B.
Lemma. Similar matrices have the same characteristic polynomial (and hence the same eigenvalues).
Proof.
P AP −1 − λI = P AP −1 − λP IP −1 = P (A − λI)P −1 .
Therefore, the matrices A − λI and P AP −1 − λI are similar. Hence, they have the same determinant.
The determinant of A − λI is the characteristic polynomial of A and the determinant of P AP −1 − λI is the
characteristic polynomial of P AP −1 .
Definition. Let T : V → V be a linear transformation, where V is a finite-dimensional vector space. The
characteristic polynomial of T is the characteristic polynomial of a matrix of T relative to a basis B of
V.
The preceding lemma shows that this is independent of the choice of basis. For if B and C are bases for
V , then
[T ]C,C = [B → C][T ]B,B [C → B] = [B → C][T ]B,B [B → C]−1 .
Therefore, [T ]C,C and [T ]B,B are similar, so they have the same characteristic polynomial.
This shows that it makes sense to speak of the eigenvalues and eigenvectors of a linear transformation
T.
Definition. A matrix A ∈ Mn (F ) is diagonalizable if A has n independent eigenvectors — that is, if there
is a basis for F n consisting of eigenvectors of A.
Proposition. A ∈ Mn (F ) is diagonalizable if and only if it is similar to a diagonal matrix.
Proof. Let {v1 , . . . , vn } be n independent eigenvectors for A corresponding to eigenvalues {λ1 , . . . , λn }. Let
T be the linear transformation corresponding to A:
T (v) = Av.
Since Avi = λi vi for all i, the matrix of T relative to the basis B = {v1 , . . . , vn } is
λ1 0 · · · 0
 
 0 λ2 · · · 0 
[T ]B,B = 
 ... .. ..  .
. . 
0 0 · · · λn

3
Now A is the matrix of T relative to the standard basis, so

[T ]B,B = [std → B] · A · [B → std].

The matrix [B → std] is obtained by building a matrix using the v1 , . . . , vn as the columns. Then
[std → B] = [B → std]−1 .
Hence,
λ1 0 · · · 0
 
 0 λ2 · · · 0 
 .
 .. .. ..  = [B → std]−1 · A · [B → std].
. . 
0 0 · · · λn
Conversely, if D is diagonal, P is invertible, and D = P −1 AP , the columns c1 , . . . , cn of P are indepen-
dent eigenvectors for A. In fact, if
λ1 0 · · · 0
 
 0 λ2 · · · 0 
D=  ... .. ..  ,
. . 
0 0 · · · λn
then P D = AP says

 λ1 0 ··· 0
 
  
↑ ↑ ↑  0 λ2 ··· 0  ↑ ↑ ↑
 c1 c2 · · · cn  
 ... .. .. 
 = A  c 1 c2 · · · cn  .
↓ ↓ ↓ . . ↓ ↓ ↓
0 0 · · · λn

Hence, λi ci = Aci .

Example. Consider the matrix matrix

 
2 −3 1
A = 1 −2 1  ∈ M (3, R).
1 −3 2

In an earlier example, I showed that A has 3 independent eigenvectors (1, 1, 1), (3, 1, 0), (−1, 0, 1).
Therefore, A is diagonalizable.
To find a diagonalizing matrix, build a matrix using the eigenvectors as the columns:
 
1 3 1
P = 1 1 0.
1 0 1

You can check by finding P −1 and doing the multiplication that you get a diagonal matrix:
 −1   
1 3 1 2 −3 1 1 3 1
P −1 AP =  1 1 0 1 −2 1   1 1 0 =
1 0 1 1 −3 2 1 0 1
     
−1 3 −1 2 −3 1 1 3 1 0 0 0
 1 −2 1   1 −2 1   1 1 0 = 0 1 0.
1 −3 2 1 −3 2 1 0 1 0 0 1
Of course, I knew this was the answer! I should get a diagonal matrix with the eigenvalues on the main
diagonal, in the same order that I put the corresponding eigenvectors into P .

4
You can put the eigenvectors in as the columns of P in any order: A different order will give a diagonal
matrix with the eigenvalues on the main diagonal in a different order.

Example. Let  
4 −4 −5
A = 1 0 −3  ∈ M (3, R).
0 0 2
Find the eigenvalues and, for each eigenvalue, a complete set of eigenvectors. If A is diagonalizable, find
a matrix P such that P −1 AP is a diagonal matrix.

4 − x −4 −5
4 − x −4
|A − xI| = 1 −x −3 = (2 − x) = (2 − x)(x2 − 4x + 4) = −(x − 2)3 .
1 −x
0 0 2−x

The eigenvalue is x = 2.
Now    
2 −4 −5 1 −2 0
A − 2I =  1 −2 −3  →  0 0 1.
0 0 0 0 0 0
Thinking of this as the coefficient matrix of a homogeneous linear system with variables a, b, and c, I
obtain the equations
a − 2b = 0, c = 0.
Then a = 2b, so      
a 2b 2
 b =  b  = b · 1.
c 0 0
(2, 1, 0) is an eigenvector. Since there’s only one independent eigenvector — as opposed to 3 — the
matrix A is not diagonalizable.

Example. The following matrixhas eigenvalue λ = 1 (a triple root):

 
−3 3 −5
B =  12 −7 14  ∈ M (3, R).
10 −7 13

Now
1
   
−4 3 −5 1 0
B − 1 · I =  12 −8 14  →  0
 2 
1 −1 
10 −7 12 0 0 0
Thinking of this as the coefficient matrix of a homogeneous linear system with variables a, b, and c, I
obtain the equations
1
a + c = 0, b − c = 0.
2
Set c = 2. This gives a = −1 and b = 2. Thus, the only eigenvectors are the nonzero multiples of
(−1, 2, 2). Since there is only one independent eigenvectors, B is not diagonalizable.

Proposition. Let T : V → V be a linear transformation on an n dimensional vector space. If v1 , . . . , vn are

eigenvectors corresponding to the distinct eigenvalues λ1 , . . . , λn , then {v1 , . . . , vn } is independent.

5
Proof. Suppose to the contrary that {v1 , . . . , vn } is dependent. Let p be the smallest number such that the
subset {v1 , . . . , vp } is dependent. Then there is a nontrivial linear relation

a1 v1 + · · · ap vp = 0.

Note that ap 6= 0, else

a1 v1 + · · · ap−1 vp−1 = 0.
This would contradict minimality of p.
Hence, I can rewrite the equation above in the form

vp = b1 v1 + · · · + bp−1 vp−1 .

Apply T to both sides, and use T vi = λi vi :

λp vp = b1 λ1 v1 + · · · + bp−1 λp−1 vp−1 .

On the other hand,

λp vp = λp b1 v1 + · · · + λp bp−1 vp−1 .
Subtract the last equation from the one before it to obtain

0 = b1 (λ1 − λp )v1 + · · · + bp−1 (λp−1 − λp )vp−1 .

Since the eigenvalues are distinct, the terms λp − λi are nonzero. Hence, this is a linear relation in
v1 , . . . , vp−1 which contradicts minimality of p — unless b1 = · · · = bp−1 = 0.
In this case, vp = 0, which contradicts the fact that vp is an eigenvector. Therefore, the original set
must in fact be independent.

Example. Let A be an n × n real matrix. The complex eigenvalues of A always come in conjugate pairs
a + bi and a − bi.
Moreover, if v is an eigenvector for λ = a + bi, then the conjugate v ∗ is an eigenvector for λ∗ = a − bi.
For suppose Av = λv. Taking complex conjugates, I get

A∗ v ∗ = λ∗ v ∗ , Av ∗ = λ∗ v ∗ .

(A∗ = A because A is a real matrix.)

In practical terms, this means that once you’ve found an eigenvector for one complex eigenvalue, you
can get an eigenvector for the conjugate eigenvalue by taking the conjugate of the eigenvector. You don’t
need to do a separate eigenvector computation.
For example, suppose
1 −1
A= ∈ M (2, R).
2 3
The characteristic polynomial is λ2 − 4λ + 5. The eigenvalues are 2 ± i.
Find an eigenvector for λ = 2 + i:

−1 − i −1 −1 − i −1
A − (2 + i)I = →
2 2−i 0 0

I knew that the second row (2, 2 − i) must be a multiple of the first row, because I know the system has
nontrivial solutions. So I don’t have to work out what multiple it is; I can just zero out the second row on
general principles.
This only works for 2 × 2 matrices, and only for those which are A − λI’s in eigenvector computations.

6
Next, there’s no point in going all the way to row reduced echelon form. I just need some nonzero vector
(a, b) such that
−1 − i −1 a 0
= .
0 0 b 0
That is, I want
(−1 − i)a + (−1)b = 0.
I can find an a and b that work by swapping −1 − i and −1, and negating one of them. For example,
take a = 1 (−1 negated) and b = −1 − i. This checks:

(−1 − i)(1) + (−1)(−1 − i) = 0.

So (a, b) = (1, −1 − i) is an eigenvector for λ = 2 + i.

By the discussion at the start of the example, I don’t need to do a computation for λ = 2 − i. Just
conjugate the previous eigenvector: (1, −1 + i) must be an eigenvector for 2 − i.
Since there are 2 independent eigenvectors, you can use them construct a diagonalizing matrix for A:
−1
1 1 1 −1 1 1 2+i 0
= .
−1 − i −1 + i 2 3 −1 − i −1 + i 0 2−i

Notice that you get a diagonal matrix with the eigenvalues on the main diagonal, in the same order in
which you listed the eigenvectors.

Example. For the following matrix, find the eigenvalues over C, and for each eigenvalue, a complete set of
independent eigenvectors.
Find a diagonalizing matrix and the corresponding diagonal matrix.
 
−2 0 5
A= 0 2 0.
−5 0 4

The characteristic polynomial is

−2 − x 0 5
−2 − x 5
|A − xI| = 0 2−x 0 = (2 − x) = (2 − x)[(x + 2)(x − 4) + 25] =
−5 4−x
−5 0 4−x

(2 − x)(x2 − 2x + 17).
Now
x2 − 2x + 17 = 0
(x − 1)2 + 16 = 0
(x − 1)2 = −16
x − 1 = ±4i
x = 1 ± 4i
The eigenvalues are x = 2 and x = 1 ± 4i.
For x = 2, I have    
−4 0 5 1 0 0
A − 2I =  0 0 0 → 0 0 1.
−5 0 2 0 0 0

7
With variables a, b, and c, the corresponding homogeneous system is a = 0 and c = 0. This gives the
solution vector      
a 0 0
b  = b = b · 1.
c 0 0
Taking b = 1, I obtain the eigenvector (0, 1, 0).
For x = 1 + 4i, I have
   
−3 − 4i 0 5 −5 0 3 − 4i
 0 1 − 4i 0  →  0 1 0 
−5 0 3 − 4i −5 0 3 − 4i

I multiplied the first row by 3 − 4i, then divided it by 5. This made it the same as the third row.
I divided the second row by 1 − 4i.
(I knew the the first and third rows had to be multiples, since they’re clearly independent of the second
row. Thus, if they weren’t multiples, the three rows would be independent, the eigenvector matrix would be
invertible, and there would be no eigenvectors [which must be nonzero].)
Now I can wipe out the third row by subtracting the first:
   
−5 0 3 − 4i −5 0 3 − 4i
 0 1 0  →  0 1 0 .
−5 0 3 − 4i 0 0 0

With variables a, b, and c, the corresponding homogeneous system is

−5a + (3 − 4i)c = 0 and b = 0.

There will only be one parameter (c), so there will only be one independent eigenvector. To get one,
switch the “−5” and “3 − 4i” and negate the “−5” to get “5”. This gives a = 3 − 4i, b = 0, and c = 5. You
can see that these values for a and c work:

(−5)(3 − 4i) + (3 − 4i)(5) = 0.

Thus, my eigenvector is (3 − 4i, 0, 5).

Hence, an eigenvector for 1 − 4i is the conjugate (3 + 4i, 0, 5).
A diagonalizing matrix is given by
 
0 3 − 4i 3 + 4i
P = 1 0 0 .
0 5 5

With this diagonalizing matrix, I have

 
2 0 0
P −1 AP =  0 1 + 4i 0 .
0 0 1 − 4i

c 2010 by Bruce Ikenaga 8

7-12-2011
The Cayley-Hamilton Theorem
Terminology. A linear transformation T from a vector space V to itself (i.e. T : V → V ) is called a linear
operator on V .
Theorem. (Cayley-Hamilton) Let T : V → V be a linear operator on a finite dimensional vector space V .
Let p be the characteristic polynomial of T . Then p(T ) = 0.
Proof. Choose a basis B = {v1 , . . . , vn } for V . I will show that p(T ) = 0 by showing that p(T )vi = 0 for all
i.
Let A = [T ]B,B . Then X
T vi = Aji vj , i = 1, . . . , n.
j

Now X X
T vi = δijT vj , so (Aji I − δijT ) vj = 0.
j j

To save writing, let

Bij = Aji I − δijT.
Observe that the matrix B = (Bij ) has linear operators as its entries. For example, for n = 2,

A11 I − T A21 I
.
A12 I A22 I − T

In fact, B is just the transpose of A − λI with λ = T . Hence, |B| = p(T ).

Next, I will show that |B|vk = 0 for all k. Observe that
X
Bij vj = 0.
j

Hence, X X
0 = (adj B)ki Bij vj = (adj B)ki Bij vj .
j j

This equation holds for all i and all k, so I’ll still get 0 if I sum on i. So I’ll sum on i, then interchange
the order of summation: XX
0= (adj B)ki Bij vj ,
i j
!
X X
0= (adj B)ki Bij vj .
j i
P
Now i (adj B)ki Bij is the (k, j)-th entry of (adj B · B) = |B|I. Hence,
X
0= |B|δkjvj = |B|vk .
j

Since p(T ) = |B| kills vk for all k, p(T ) = 0.

Definition. If A is an n × n matrix, the minimal polynomial of A is the polynomial m(x) of smallest
degree with leading coefficient 1 such that m(A) = 0. If T is a linear operator on a vector space V , the
minimal polynomial of T is the minimal polynomial of any matrix for T .
It’s implicit in the last sentence that it doesn’t matter which matrix for T you use. Can you prove it?

1
Corollary. The minimal polynomial divides the characteristic polynomial.

Example. Consider the matrix  

9 1 3
A =  −3 1 −1  ∈ M (3, R).
−16 −2 −5
The characteristic polynomial is 4 − 8λ + 5λ2 − λ3 ; the eigenvalues are λ = 2 (double) and λ = 1.
Since A is evidently neither 0 nor a multiple of the identity, its minimal polynomial must be a quadratic
or cubic factor of the characteristic polynomial.
Note that
   
5 1 2 −2 0 −1
(A − 2I)(A − I) =  −5 −1 −2  and (A − 2I)2 =  −2 0 −1  .
−10 −2 −4 6 0 3

Hence, the minimal polynomial is the characteristic polynomial 4 − 8λ + 5λ2 − λ3 .

Here is a more precise version of the previous corollary.

Proposition. Let T : V → V be a linear operator on a finite dimensional vector space. The minimal and
characteristic polynomials of T have the same roots, up to multiplicity.
Proof. Let m(x) denote the minimal polynomial and p(x) the characteristic polynomial. Cayley-Hamilton
says that m | p, so a root of m is a root of p.
Conversely, let λ be a root of p — i.e. an eigenvalue. Let v be an eigenvector corresponding to λ, so
T v = λv. It follows that if f (x) is an arbitrary polynomial over F , then f (T )v = f (λ)v. In particular, this
is true of the minimal polynomial:
0 = m(T )v = m(λ)v.
Since v 6= 0, m(λ) = 0. Therefore, every root of p is a root of m, and the roots of m and p coincide.

Example. Consider the matrix  

9 1 3
A =  −3 1 −1  ∈ M (3, R).
−16 −2 −5
The characteristic polynomial is 4 − 8λ + 5λ2 − λ3 = (λ − 2)2 (λ − 1). In view of the Corollary, I did
more work than necessary in determining the minimal polynomial the first time. The only possibilities for
the minimal polynomial are (λ − 2)2 (λ − 1) and (λ − 2)(λ − 1).
Computation showed that (λ − 2)(λ − 1) doesn’t kill A, so the minimal polynomial is (λ − 2)2 (λ − 1).

c 2009 by Bruce Ikenaga 2

3-23-2008
Linear Homogeneous Differential Equations
The full description of these equations is: Linear constant coefficient homogeneous equations. The
equations described in the title have the form

an y (n) + · · · + a2 y ′′ + a1 y ′ + a0 y = 0.

Here y is a function of x, and an , . . . , a0 are constants. Linear means the equation is a sum of the
derivatives of y, each multiplied by x stuff. (In this case, the x stuff is constant.) Homogeneous means that
the right side is 0 — there’s no term involving only x.
d
It’s convenient to let D = stand for the operation of differentiating with respect to x. (Note that
dx
d dy
D = is the operation of differentiation, whereas Dy = is the derivative.) In this notation, D2
dx dx
computes the second derivative, D3 computes the third derivative, and so on. The equation above becomes

(an Dn + · · · a2 D2 + a1 D + a0 )y = 0.

Example. The following equations are linear homogeneous equations with constant coefficients:

y ′′′ + 3y ′′ + 3y ′ + y = 0,

y ′′ − 5y ′ − 6y = 0,
((D − 1)(D − 2)(D − π)y = 0.

A solution to the equation is a function y = f (x) which satisfies the equation. Equivalently, if you think
of an Dn + · · · a2 D2 + a1 D + a0 as a linear transformation, it is an element of the kernel of the transformation.
The general solution is a linear combination of the elements of a basis for the kernel, with the
coefficients being arbitrary constants.
The form of the equation makes it reasonable that a solution should be a function whose derivatives are
constant multiples of itself. emx is such a function:

d mx d mx dn mx
e = remx , e = m2 emx , ..., e = mn emx .
dx dx2 dxn
Plug emx into
y (n) + bn−1 y (n−1) + · · · + b2 y ′′ + b1 y ′ + b0 y = 0.
The result:
mn emx + bn−1 mn−1 emx + · · · + b2 m2 emx + b1 memx + b0 emx = 0.
Factor out emx and cancel it. This leaves

mn + bn−1 mn−1 + · · · + b2 m2 + b1 m + b0 = 0.

Thus, emx is a solution to the original equation exactly when m is a root of this polynomial. The
polynomial is called the characteristic polynomial; as the derivation showed, it’s obtained by building a
polynomial using the coefficients of the original differential equation.

1
Example. Solve y ′′ − 5y ′ − 6y = 0.
The characteristic polynomial is m2 − 5m − 6; solving m2 − 5m − 6 = 0 yields (m − 6)(m + 1) = 0, so
m = 6 or m = −1. The general solution is

y = c1 e6x + c2 e−x .

You can check this by plugging back in. Here are the derivatives:

y = c1 e6x + c2 e−x , y ′ = 6c1 e6x − c2 e−x , y ′′ = 36c1 e6x + c2 e−x .

Therefore,

y ′′ − 5y ′ + 6y = 36c1 e6x + c2 e−x − 5 6c1 e6x − c2 e−x − 6 c1 e6x + c2 e−x = 0.

Example. Solve (D4 − 9D2 + 20)y = 0.

It’s easy to write down the characteristic equation: just replace the D’s with m’s:
√ √
m4 − 9m2 + 20 = 0, (x2 − 4)(x2 − 5) = 0, (x − 2)(x + 2)(x − 5)(x + 5) = 0.
√
The roots are ±2 and ± 5. (Don’t fall into the trap of assuming that roots must be integers, or even
rationals!) The solution is √ √
y = c1 e2x + c2 e−2x + c3 e 5x + c4 e− 5x .

What happens if there are repeated roots? Look at the equation y ′′ − 2y ′ + y = 0. The characteristic
equation is x2 − 2x + 1 = 0, which has x = 1 as a double root. It is true that ex is a solution, but it would
be incorrect to write y = c1 ex + c2 ex .
The terms c1 ex and c2 ex are redundant — you could combine them to get y = (c1 + c2 )ex = c3 ex . To
put it another way, as function ex and ex are linearly dependent.
It is reasonable to suppose that for a second order equation you should have two different solutions. ex
is one; how can you find another?
The idea is to guess the form that such a solution might take. Guess:

y = f (x)ex ,

i.e. something times the known solution ex . What should f be?

To find f , plug f (x)ex into the equation. Here are the derivatives:

y = f (x)ex , y ′ = f ′ (x)ex + f (x)ex , y ′′ = f ′′ (x)ex + 2f ′ (x)ex + f (x)ex .

Plug them in:

y ′′ − 2y ′ + y = (f ′′ (x)ex + 2f ′ (x)ex + f (x)ex ) − 2(f ′ (x)ex + f (x)ex ) + f (x)ex = f ′′ (x)ex = 0.

Hence, f ′′ (x) = 0. Integrate twice and obtain f (x) = c1 + c2 x. Thus,

y = c1 ex + c2 xex .

In fact, this is the general solution — notice the two arbitrary constants. The functions ex and xex are
indpendent solutions to the original equation.

2
In general, if m is a repeated root of multiplicity k in the characteristic polynomial, you get terms emx ,
xe , . . . , xk−1 emx in the general solution.
mx

d3 y d2 y dy
Example. Solve 3
− 3 2
+3 − y = 0.
dx dx dx
The characteristic equation is m3 − 3m2 + 3m − 1 = 0, which has m = 1 as a root with multiplicity 3.
The general solution is
y = c1 ex + c2 xex + c3 x2 ex .

Example. Solve (D2 − 1)(D2 − D − 2)y = 0.

The characteristic equation is (m2 − 1)(m2 − m − 2) = 0, or (m − 1)(m + 1)2 (m − 2) = 0. The roots are
1, 2, and −1 (double). The general solution is

y = c1 ex + c2 e2x + c3 e−x + c4 xe−x .

Note: You can write the terms in the solution in any order you please. Nor does it matter which “c”
goes with which term, since they are arbitrary constants.

Example. (Linear systems) Suppose x and y are functions of t. Consider the system of differential
equations
dx dy
= x′ = x + 4y, = y ′ = 2x + 3y.
dt dt
I want to solve for x and y in terms of t.
Solve the second equation for x:
1
x = (y ′ − 3y) .
2
Differentiate:
1
x′ = (y ′′ − 3y ′ ) .
2
Plug the expressions for x and x′ into the first equation:

1 ′′ 1
(y − 3y ′ ) = (y ′ − 3y) + 4y.
2 2
Simplify:
y ′′ − 4y ′ − 5y = 0.
The characteristic equation is m2 − 4m − 5 = 0, or (m − 5)(m + 1) = 0. The roots are m = 5 and
m = −1. Therefore,
y = c1 e5t + c2 e−t .
Now y ′ = 5c1 e5t − c2 e−t , so

1 ′ 1
5c1 e5t − c2 e−t − 3 c1 e5t + c2 e−t = c1 e5t − 2c2 e−t .

x= (y − 3y) =
2 2
There are other ways of solving linear systems, but for small systems brute force works reasonably well!

3
Now suppose the characteristic equation has a complex root a + bi. From basic algebra, complex roots
of real polynomials come in conjugate pairs: a + bi and a − bi. It’s reasonable to expect solutions
c1 e(a+bi)x and c2 e(a−bi)x .
However, these are complex solutions, and you should have real solutions to the original real differential
equation. I’ll use the complex exponential formula
eiθ = cos θ + i sin θ, θ ∈ R.
You can derive this formula by considering the Taylor series for eiθ , cos θ, and sin θ.
Now
c1 e(a+bi)x + c2 e(a−bi)x = c1 eax eibx + c2 eax e−ibx =
eax (c1 (cos bx + i sin bx) + c2 (cos bx − i sin bx)) = eax ((c1 + c2 ) cos bx + i(c1 − c2 ) sin bx) .
Let c3 = c1 + c2 and c4 = i(c1 − c2 ). Observe that c1 and c2 can be solved for in terms of c3 and c4 , so
no generality is lost with this substitution. Then
c1 e(a+bi)x + c2 e(a−bi)x = eax (c3 cos bx + c4 sin bx).
Each pair of conjugate complex roots a±bi in the characteristic equation generates a pair of independent
solutions of this form.

Example. Solve y ′′ + y = 0.
The characteristic equation m2 + 1 = 0 has roots ±i. The solution is
y = c1 cos x + c2 sin x.

Example. Solve (D2 + 2D + 5)y = 0.

The characteristic equation m2 + 2m + 5 = 0 has roots m = −1 ± 2i. The solution is
y = e−x (c1 cos 2x + c2 sin 2x).

Example. Solve (D3 + 1)y = 0.

The characteristic
√ polynomial m3 + 1 factors into (m + 1)(m2 − m + 1). The roots are m = −1 and
1 3
m= ±i . The solution is
2 2
√ √ !
−x x/2 3 3
y = c1 e + e c2 cos x + c3 sin x .
2 2

Example. Solve (D2 + 4)2 y = 0.

The characteristic equation (m2 + 4)2 = 0 has repeated complex roots: m = ±2i (each double). The
solution is
y = c1 cos 2x + c2 sin 2x + c3 x cos 2x + c4 x sin 2x.
(What’s the solution to (D2 + 2D + 5)2 y = 0?)

c 2008 by Bruce Ikenaga 4

4-3-2015
Linear Systems with Constant Coefficients
Here is a system of n differential equations in n unknowns:

x′1 = a11 x1 + · · · + a1n xn ,

x′2 = a21 x1 + · · · + a2n xn ,
..
.
′
xn = an1 x1 + · · · + ann xn .

This is a constant coefficient linear homogeneous system. Thus, the coefficients aij are constant,
and you can see that the equations are linear in the variables x1 , . . . , xn and their derivatives. The reason
for the term “homogeneous” will be clear when I’ve written the system in matrix form.
The primes on x′1 , . . . , x′n denote differentiation with respect to an independent variable t. The problem
is to solve for x1 , . . . , xn in terms of t.
Write the system in matrix form as
  a11 ··· a1n  
 
x′1 x
 ..   a21 ··· a2n  .1
 . = ... ..  ..  .
 
.
x′n xn
an1 · · · ann

Equivalently,
~x′ = A~x.
(A nonhomogeneous system would look like ~x′ = A~x + ~b.)
It’s possible to solve such a system if you know the eigenvalues (and possibly the eigenvectors) for the
coefficient matrix
a11 · · · a1n
 
 a21 · · · a2n 
 . .. 
 .. . 
an1 · · · ann
First, I’ll do an example which shows that you can solve small linear systems by brute force.

Example. Consider the system of differential equations

dx1
= x′1 = −29x1 − 48x2 ,
dt
dx2
= x′2 = 16x1 + 27x2 .
dt
The idea is to solve for x1 and x2 in terms of t.
One approach is to use brute force. Solve the first equation for x2 , then differentiate to find x′2 :

1 1
x2 = (−x′1 − 29x1 ), x′2 = (−x′′1 − 29x′1 ).
48 48
Plug these into second equation:

1 1
(−x′′1 − 29x′1 ) = 16x1 + 27 · (−x′1 − 29x1 ), x′′1 + 2x′1 − 15x1 = 0.
48 48
1
This is a constant coefficient linear homogeneous equation in x1 . The characteristic equation is m2 −
2m − 15 = 0. The roots are m = −5 and m = 3. Therefore,

x1 = c1 e−5t + c2 e3t .

Plug back in to find x2 :

1 1 1 2
−(−5c1 e−5t + 3c2 e3t ) − 29(c1 e−5t + c2 e3t ) = − c1 e−5t − c2 e3t .

x2 = (−x′1 − 29x1 ) =
48 48 2 3

The procedure works, but it’s clear that the computations would be pretty horrible for larger systems.
To describe a better approach, look at the coefficient matrix:

−29 −48
A=
16 27

Find the eigenvalues:

−29 − λ −48
|A − λI| = = (λ + 29)(λ − 27) + 16 · 48 = λ2 + 2λ − 15.
16 27 − λ

This is the same polynomial that appeared in the example. Since λ2 + 2λ − 15 = (λ + 5)(λ − 3), the
eigenvalues are λ = −5 and λ = 3.
Thus, you don’t need to go through the process of eliminating x2 and isolating x1 . You know that

x1 = c1 e−5t + c2 e3t

once you know the eigenvalues of the coefficient matrix. You can now finish the problem as above by plugging
x1 back in to solve for x2 .
This is better than brute force, but it’s still cumbersome if the system has more than two variables.
I can improve things further by making use of eigenvectors as well as eigenvalues. Consider the system

~x′ = A~x.

Suppose λ is an eigenvalue of A with eigenvector v. This means that

Av = λv.

I claim that ~x = ceλt v is a solution to the equation, where c is a constant. To see this, plug it in:

~x′ = cλeλt v = ceλt (λv) = ceλt (Av) = A(ceλt v) = A~x.

To obtain the general solution to ~x′ = A~x, you should have “one arbitrary constant for each differenti-
ation”. In this case, you’d expect n arbitrary constants. This discussion should make the following result
plausible.
• Suppose the matrix A has n independent eigenvectors v1 , . . . , vn with corresponding eigenvalues λ1 ,
. . . , λn . Then the general solution to ~x′ = A~x is

~x = c1 eλ1 t v1 + · · · cn eλn t vn .

2
Example. Solve
dx1
= x′1 = −29x1 − 48x2 ,
dt
dx2
= x′2 = 16x1 + 27x2 .
dt

The matrix form is

x′1 −29 −48 x1
= .
x′2 16 27 x2
The matrix
−29 −48
A=
16 27
has eigenvalues λ = −5 and λ = 3. I need to find the eigenvectors.
Consider λ = −5:
−24 −48 1 2
A + 5I = →
16 32 0 0
The last matrix says a + 2b = 0, or a = −2b. Therefore,

a −2b −2
= =b .
b b 1

Take b = 1. The eigenvector is (−2, 1).

Now consider λ = 3: " 3#
−32 −48 1
A − 3I = → 2
16 24 0 0
3 3
The last matrix says a + b = 0, or a = − b. Therefore,
2 2
" 3 # " 3#
a − b −
= 2 =b 2 .
b b 1

Take b = 2. The eigenvector is (−3, 2).

You can check that the vectors (−2, 1), (−3, 2), are independent. Hence, the solution is

x1 −5t −2 3t −3
~x = = c1 e + c2 e .
x2 1 2

Example. Find the general solution (x(t), y(t)) to the linear system

dx
=x+y
dt
dy
= 6x + 2y
dt
The matrix form is
x′ 1 1 x
= .
y′ 6 2 y
Let
1 1
A= .
6 2

3
1−x 1
det(A − xI) = = x2 − 3x − 4 = (x − 4)(x + 1).
6 2−x
The eigenvalues are x = 4 and x = −1.
For x = 4, I have
−3 1 3 −1
A − 4I = → .
6 −2 0 0
If (a, b) is an eigenvector, then
3a − b = 0, b = 3a.
So
a a 1
= =a· .
b 3a 3
(1, 3) is an eigenvector.
For x = −1, I have
2 1 2 1
A+I = → .
6 3 0 0
If (a, b) is an eigenvector, then
2a + b = 0, b = −2a.
So
a a 1
= =a· .
b −2a −2
(1, −2) is an eigenvector.
The solution is
x 1 1
= c1 e4t + c2 e−t .
y 3 −2

Example. (Complex roots) Solve

x′1 = 5x1 + 5x2 ,
x′2 = −4x1 − 3x2 .

The characteristic polynomial is

5−λ 5
= λ2 − 2λ + 5.
−4 −3 − λ

The eigenvalues are λ = 1 ± 2i. You can check that the eigenvectors are:

−2 + i
λ = 1 − 2i : b ·
2

−2 − i
λ = 1 + 2i : b ·
2

Observe that the eigenvectors are conjugates of one another. This is always true when you have a
complex eigenvalue.
The eigenvector method gives the following complex solution:

x1 −2 + i −2 − i
= c1 e(1−2i)t + c2 e(1+2i)t =
x2 2 2

4

(−2(c1 + c2 ) + i(c1 − c2 )) cos 2t + ((c1 + c2 ) + 2i(c1 − c2 )) sin 2t
et .
2(c1 + c2 ) cos 2t − 2i(c1 − c2 ) sin 2t
Note that the constants occur in the combinations c1 + c2 and i(c1 − c2 ). Something like this will always
happen in the complex case. Set d1 = c1 + c2 and d2 = i(c1 − c2 ). The solution is

x1 t (−2d1 + d2 ) cos 2t + (d1 + 2d2 ) sin 2t
=e .
x2 2d1 cos 2t − 2d2 sin 2t

In fact, if you’re given initial conditions for x1 and x2 , the new constants d1 and d2 will turn out to be
real numbers.

You can get a picture of the solution curves for a system x~′ = f (~x) even if you can’t solve it by sketching
the direction field. Suppose you have a two-variable linear system
′
x a b x
= .
y′ c d y

This is equivalent to the equations

dx dy
= ax + by and = cx + dy.
dt dt
Then
dy
dy cx + dy
= dt = .
dx dx ax + by
dt
That is, the expression on the right gives the slope of the solution curve at the point (x, y).
To sketch the direction field, pick a set of sample points — for example, the points on a grid. At each
point (x, y), draw the vector (ax + by, cx + dy) starting at the point (x, y). The collection of vectors is the
direction field. You can approximate the solution curves by sketching in curves which are tangent to the
direction field.

Example. Sketch the direction field for

x′ = x − y, y ′ = x + y.

I’ve computed the vectors for 9 points:

x y x−y x+y vector

−1 −1 0 −2 (0, −2)
−1 0 −1 −1 (−1, −1)
−1 1 −2 0 (−2, 0)
0 −1 1 −1 (1, −1)
0 0 0 0 (0, 0)
0 1 −1 1 (−1, 1)
1 −1 2 0 (2, 0)
1 0 1 1 (1, 1)
1 1 0 2 (0, 2)

5
Thus, from the second line of the table, I’d draw the vector (−1, −1) starting at the point (1, 0).
Here’s a sketch of the vectors: y

While it’s possible to plot fields this way, it’s very tedious. You can use software to plot fields quickly.
Here is the same field as plotted by Mathematica:

The first picture shows the field as it would be if you plotted it by hand. As you can see, the vectors
overlap each other, making the picture a bit ugly. The second picture is the way Mathematica draws the
field by default: The vectors’ lengths are scaled so that the vectors don’t overlap. In subsequent examples,
I’ll adopt the second alternative when I display a direction field picture.
The arrows in the pictures show the direction of increasing t on the solution curves. You can see from
these pictures that the solution curves for this system appear to spiral out from the origin.

Example. (A compartment model) Two tanks hold 50 gallons of liquid each. The first tank starts with
25 pounds of dissolved salt, while the second starts with pure water. Pure water flows into the first tank at
3 gallons per minute; the well-stirred micture flows into tank 2 at 4 gallons per minute. The mixture in tank
2 is pumped back into tank 1 at 1 gallon per minute, and also drains out at 3 gallons per minute. Find the
amount of salt in each tank after t minutes.
Let x be the number of pounds of salt dissolved in the first tank at time t and let y be the number of
pounds of salt dissolved in the second tank at time t. The rate equations are

dx gal lbs gal ylbs gal xlbs
= 3 0 + 1 − 4 ,
dt min gal min 50gal min 50gal

dy gal xlbs gal ylbs gal ylbs
= 4 − 1 − 3 .
dt min 50gal min 50gal min 50gal
Simplify:
x′ = −0.08x + 0.02y, y ′ = 0.08x − 0.08y.
Next, find the characteristic polynomial:

−0.08 − λ 0.02
= λ2 + 0.16λ + 0.048 = (λ + 0.04)(λ + 0.12).
0.08 −0.08 − λ

6
The eigenvalues are λ = −0.04, λ = −0.12.
Consider λ = −0.04:
" 1#
−0.04 0.02 1 −
A + 0.04I = → 2
0.08 −0.04 0 0

1 1
This says a − b = 0, so a = b. Therefore,
2 2
"1 # "1#
a b
= 2 =b 2 .
b b 1

Set b = 2. The eigenvector is (1, 2).

Now consider λ = −0.12:
" 1#
0.04 0.02 1
A + 0.12I = → 2
0.08 0.04 0 0

1 1
This says a + b = 0, so a = − b. Therefore,
2 2
" 1 # " 1#
a − b −
= 2 =b 2 .
b b 1

Set b = 2. The eigenvector is (−1, 2).

The solution is
−0.04t 1 −0.12t −1
~x = c1 e + c2 e .
2 2
When t = 0, x = 25 and y = 0. Plug in:

25 1 −1 1 −1 c1
= c1 + c2 = .
0 2 2 2 2 c2

Solving for the constants, I obtain c1 = 12.5, c2 = 12.5. Thus,

−0.04t 1 −0.12t −1 12.5e−0.04t + 12.5e−0.12t
~x = 12.5e + 12.5e = .
2 2 25e−0.04t − 25e−0.12t

The direction field for the system is shown in the first picture. In the second picture, I’ve sketched in
some solution curves.

7
The solution curve picture is referred to as the phase portrait.
The eigenvectors (1, 2) and (−1, 2) have slopes 2 and −2, respectively. These appear as the two lines
(linear solutions).

Consider the linear system

~x ′ = Ax.
Suppose it has has conjugate complex eigenvalues λ, λ∗ with eigenvectors ~v , ~v ∗ , respectively. This yields
solutions
∗
eλt~v , eλ t~v ∗ .
If a + bi is a complex number,

1 1
re(a + bi) = a = ((a + bi) + (a − bi)) = ((a + bi) + (a + bi)∗ ) ,
2 2

1 1
im(a + bi) = b = ((a + bi) − (a − bi)) = ((a + bi) − (a + bi)∗ ) .
2 2
I’ll apply this to eλt~v , using the fact that
∗ ∗
eλt~v = eλ t~v ∗ .

Then
1 λt ∗

re eλt~v = e ~v + eλ t~v ∗ ,
2
1 λt ∗

im eλt~v = e ~v − eλ t~v ∗ .

2
The point is that since the terms on the right are independent solutions, so are the terms on the left.
The terms on the left, however, are real solutions. Here is what this means.

• If a linear system has a pair of complex conjugate eigenvalues, find the eigenvector solution for one
of them (the “eλt~v ” above). Then take the real and imaginary parts to obtain two independent real
solutions.

Example. Solve the system

x ′ = x − y,
y ′ = x + y.

Set
1 −1
A= .
1 1
The eigenvalues are λ = 1 ± i.
Consider λ = 1 + i:
−i −1 1 −i
A − (1 + i)I = →
1 −i 0 0
The last matrix says a − bi = 0, so a = bi. The eigenvectors are

a bi i
= =b .
b b 1

8
Take b = 1. This yields the eigenvector (i, 1).
Write down the complex solution

i i i − sin t + i cos t
e(1+i)t = et eit = et (cos t + i sin t) = et .
1 1 1 cos t + i sin t

Take the real and imaginary parts:

− sin t + i cos t − sin t
re et = et ,
cos t + i sin t cos t

− sin t + i cos t cos t
im et = et .
cos t + i sin t sin t
The general solution is
t − sin t t cos t
~x = c1 e + c2 e .
cos t sin t

The eigenvector method produces a solution to a (constant coefficient homogeneous) linear system
whenever there are “enough eigenvectors”. There might not be “enough eigenvectors” if the characteristic
polynomial has repeated roots.
I’ll consider the case of repeated roots with multiplicity two or three (i.e. double or triple roots). The
general case can be handled by using the exponential of a matrix.
Consider the following linear system:
~x ′ = A~x.
Suppose λ is an eigenvalue of A of multiplicity 2, and ~v is an eigenvector for λ. eλt~v is one solution; I
want to find a second independent solution.
Recall that the constant coefficient equation (D − 3)2 y = 0 had independent solutions e3x and xe3x .
By analogy, it’s reasonable to guess a solution of the form

~x = teλt w.
~

Here w
~ is a constant vector.
Plug the guess into ~x ′ = A~x:

~x ′ = teλt λw
~ + eλt w
~ = A(teλt w).
~

Compare terms in teλt and eλt on the left and right:

~ = λw
Aw ~ and ~ = 0.
w

While it’s true that teλt · 0 = 0 is a solution, it’s not a very useful solution. I’ll try again, this time using

~x = teλt w
~ 1 + eλt w
~ 2.

Then
~x ′ = teλt λw
~ 1 + eλt w
~ 1 + λeλt w
~ 2.
Note that
A~x = teλt Aw
~ 1 + eλt Aw
~ 2.
Hence,
teλt λw
~ 1 + eλt w
~ 1 + λeλt w
~ 2 = teλt Aw
~ 1 + eλt Aw
~ 2.

9
Equate coefficients in eλt , teλt :

~ 1 = λw
Aw ~1 so (A − λI)w
~ 1 = 0,

~2 = w
Aw ~ 1 + λw
~2 so (A − λI)w
~2 = w
~ 1.
In other words, w
~ 1 is an eigenvector, and w
~ 2 is a vector which is mapped by A − λI to the eigenvector.
~ 2 is called a generalized eigenvector.
w

Example. Solve
′−3 −8
~x = ~x.
2 5

−3 − λ −8
= (λ + 3)(λ − 5) + 16 = λ2 − 2λ + 1 = (λ − 1)2 .
2 5−λ
Therefore, λ = 1 is an eigenvalue of multiplicity 2.
Now
−4 −8 1 2
A−I = →
2 4 0 0
The last matrix says a + 2b = 0, or a = −2b. Therefore,

a −2b −2
= =b .
b b 1

Take b = 1. The eigenvector is (−2, 1). This gives a solution

−2
et .
1

Next, I’ll try to find a vector w

~ such that

−2
(A − I)w
~= .
1

Write w
~ = (c, d). The equation becomes

−4 −8 c −2
= .
2 4 d 1

Row reduce:
" 1#
−4 −8 −2 1 2
→ 2
2 4 1 0 0 0
1 1
The last matrix says that c + 2d = , so c = −2d + . In this situation, I may take d = 0; doing so
2 2
1
produces w
~= ,0 .
2
This work generates the solution "1#

t −2 t
te +e 2 .
1 0

10
The general solution is
" 1 #!
t −2 t −2 t
~x = c1 e + c2 te +e 2 .
1 1 0

The first picture shows the direction field; the second shows the phase portrait, with some typical
solution curves. This kind of phase portrait is called an improper node.

Example. Solve the system  

1 0 0
~x ′ =  1 1 0  ~x.
2 −1 2
The eigenvalues are λ = 2 and λ = 1 (double).
I’ll do λ = 2 first.    
−1 0 0 1 0 0
A − 2I =  1 −1 0 → 0 1 0
2 −1 0 0 0 0
The last matrix implies that a = 0 and b = 0, so the eigenvectors are
     
a 0 0
b  = 0 = c · 0.
c c 1

For λ = 1,    
0 0 0 1 0 0
A − I =  1 0 0  →  0 1 −1 
2 −1 1 0 0 0
The last matrix implies that a = 0 and b = c, so the eigenvectors are
     
a 0 0
b  = c = c · 1.
c c 1

I’ll use ~v = (0, 1, 1).

~ = (a′ , b′ , c′ ). It must satisfy
Next, I find a generalized eigenvector w

(A − I)w
~ = ~v .

11
That is,   ′   
0 0 0 a 0
 1 0 0   b′  =  1  .
2 −1 1 c′ 1
Solving this system yields a′ = 1, b′ = c′ + 1. I can take c′ = 0, so b′ = 1, and
   
a′ 1
~ =  b′  =  1  .
w
c′ 0

The solution is         
0 0 0 1
~x = c1 e2t  0  + c2 et  1  + c2 tet  1  + et  1  .
1 1 1 0

I’ll give a brief description of the situation for an eigenvalue λ of multiplicity 3. First, if there are three
independent eigenvectors ~u, ~v , w,
~ the solution is

~x = c1 eλt ~u + c2 eλt~v + c2 eλt w.

Suppose there is one independent eigenvector, say ~u. One solution is

eλt ~u.

Find a generalized eigenvector ~v by solving

(A − λI)~v = ~u.

A second solution is
teλt ~u + eλt~v .
Next, obtain another generalized eigenvector w
~ by solving

(A − λI)w
~ = ~v .

A third independent solution is

1 2 λt
t e ~u + teλt~v + eλt w.
~
2
Finally, combine the solutions to obtain the general solution.
The only other possibility is that there are two independent eigenvectors ~u and ~v . These give solutions

eλt ~u and eλt~v .

Find a generalized eigenvector w

~ by solving

(A − λI)w
~ = a~u + b~v .

The constants a and b are chosen so that the equation is solvable.

~ yields the solution
w
teλt (a~u + b~v ) + eλt w.
~
The best way of explaining why this works involves something called the Jordan canonical form for
matrices. It’s also possible to circumvent these technicalities by using the exponential of a matrix.

c 2015 by Bruce Ikenaga 12

5-10-2023
The Exponential of a Matrix
In this section, all of the matrices will be real or complex matrices.
The solution to the exponential growth equation

dy
= ky is given by y = c0 ekt .
dt

A constant coefficient linear system has a similar form, but we have vectors and a matrix instead of
scalars:
y ′ = Ay.
Thus, if A is an n × n real matrix, then y = (y1 , y2 , . . . , yn ).
It’s natural to ask whether you can solve a constant coefficient linear system using some kind of expo-
nential, as with the exponential growth equation.
If a solution to the system is to have the same form as the growth equation solution, it should look like

y = eAt y0 .

But “eAt ” seems to be e raised to a matrix power! How does that make any sense? It turns out that
the matrix exponential eAt can be defined in a reasonable way.
From calculus, the Taylor series for ez is
∞
X zn
ez = .
n=0
n!

It converges absolutely for all z.

It A is an n × n matrix with real entries, define
∞ n n
X t A
eAt = .
n=0
n!

The powers An make sense, since A is a square matrix. It is possible to show that this series converges
for all t and every matrix A.
As a consequence, I can differentiate the series term-by-term:
∞ ∞ ∞ n−1 n−1 ∞
d At X tn−1 An X tn−1 An X t A X tm Am
e = n = =A =A = AeAt .
dt n=0
n! n=1
(n − 1)! n=1
(n − 1)! m=0
m!

This shows that eAt solves the differential equation y ′ = Ay. The initial condition vector y(0) = y0
yields the particular solution
y = eAt y0 .
This works, because e0·A = I (by setting t = 0 in the power series).
Another familiar property of ordinary exponentials holds for the matrix exponential: If A and B com-
mute (that is, AB = BA), then
eA eB = eA+B .
You can prove this by multiplying the power series for the exponentials on the left. (eA is just eAt with
t = 1.)

1
Example. Compute eAt if
2 0
A= .
0 3

Compute the successive powers of A:

2n

2 0 4 0 0
A= , A2 = , . . . , An = .
0 3 0 9 0 3n

Therefore,

(2t)n
P 
∞
∞ n n
t 2 0

n=0 0 2t
e 0

n!
X
At
e = = = .
 
n! 0 3n P∞ (3t)n 0 e3t
n=0 0 n=0
n!

You can compute the exponential of an arbitrary diagonal matrix in the same way:

λ1 0 ··· 0 eλ1 t 0 ··· 0

   
 0 λ2 ··· 0   0 eλ2 t ··· 0 
A=
 ... .. ..  , eAt =
 ... .. ..  .
. .  . . 
0 0 · · · λn 0 0 · · · eλn t

For example, if
   
−3 0 0 e−3t 0 0
A= 0 4 0 , then eAt =  0 e4t 0 .
1.73t
0 0 1.73 0 0 e

Using this idea, we can compute eAt when A is diagonalizable. First, I’ll give examples where we can
compute eAt : First, using a “lucky” pattern, and second, using our knowledge of how to solve a system of
differential equations.
Example. Compute eAt if
1 2
A= .
0 1

Compute the successive powers of A:

1 2 2 1 4 1
3 6 1 2n
n
A= , A = , A = , ...,A = .
0 1 0 1 0 1 0 1

Hence,
tn 2ntn
P 
∞ n ∞ P∞ t
2tet

At
X t 1 2n n=0
n! n=0
n!  e
e = = P ∞ tn  = 0 .

n! 0 1 et
n=0 0 n=0
n!
Here’s where the last equality came from:
∞ n
X t
= et ,
n=0
n!

∞ ∞ ∞
X 2ntn X tn−1 X tm
= 2t = 2t = 2tet .
n=0
n! n=1
(n − 1)! m=0
m!

2
In the last example, we got lucky in being able to recognize a pattern in the terms of the power series.
Here is what happens if we aren’t so lucky.
Example. Compute eAt , if
3 −10
A= .
1 −4

If you compute powers of A as in the last two examples, there is no evident pattern:

2 −1 10 3 7 −30 4 −9 50
A = , A = , A = ,....
−1 6 3 −14 −5 26

It looks like it would be difficult to compute the matrix exponential using the power series.
I’ll use the fact that eAt is the solution to a linear system. The system’s coefficient matrix is A, so the
system is
x′ = 3x − 10y
y ′ = x − 4y
You can solve this system by hand. For instance, the first equation gives

1 ′ 3 1 ′′ 3
y=− x + x so y′ = − x + x′ .
10 10 10 10

Plugging these expressions for y and y ′ into y ′ = x − 4y, I get

1 ′′ 3 ′ 1 ′ 3
− x + x =x−4 − x + x .
10 10 10 10

After some simplification, this becomes

x′′ + x′ − 2x = 0.

The solution is
x = c1 et + c2 e−2t .
Plugging this into the expression for y above and doing some ugly algebra gives

1 t 1 −2t
y= c1 e + c2 e .
5 2
Next, remember that if B is a 2 × 2 matrix,

1 0
B = first column of B and B = second column of B.
0 1

Try this out with a particular B to see how it works.

In particular, this is true for eAt . Now x = eAt x0 is the solution satisfying x(0) = x0 , but

c1 et + c2 e−2t
" #
x= 1 t 1 −2t .
c1 e + c2 e
5 2

Set x(0) = (1, 0) to get the first column of eAt :

" c +c #
1 1 2
= 1 1 .
0 c1 + c2
5 2
3
5 2
Solving this system of equations for c1 and c2 , I get c1 = , c2 = − . So
3 3
5 t 2 −2t
 

x e − e
3 3
= .

y 1 t 1 −2t 
e − e
3 3

Set x(0) = (0, 1) to get the second column of eAt :

" c1 + c2 #
0
= 1 1 .
1 c1 + c2
5 2
10 10
Therefore, c1 = − , c2 = . Hence,
3 3
10 10 −2t
 

x − et + e
 3 3
= .

y 2 5 −2t 
− et + e
3 3
Therefore,
5 t 2 −2t 10 t 10 −2t
 
e − e − e + e
3 3 3 3
eAt = .

1 t 1 −2t 2 5 −2t 
e − e − et + e
3 3 3 3
So I found eAt , but this was a lot of work (not all of which I wrote out!), and A was just a 2 × 2 matrix.

We noted earlier that we can compute eAt fairly easily in case A is diagonalizable. Recall that an n × n
matrix A is diagonalizable if it has n independent eigenvectors. (This is true, for example, if A has n
distinct eigenvalues.)
Suppose A is diagonalizable with independent eigenvectors v1 , . . . , vn and corresponding eigenvalues
λ1 , . . . , λn . Let S be the matrix whose columns are the eigenvectors:
 
↑ ↑ ↑
S =  v1 v2 · · · vn  .
↓ ↓ ↓

Then
λ1 0 ··· 0
 
 0 λ2 ··· 0 
S −1 AS = 
 ... .. ..  = D.
. . 
0 0 · · · λn
We saw earlier how to compute the exponential for the diagonal matrix D:

eλ1 t 0 ··· 0
 
 0 eλ2 t ··· 0 
eDt  ...
= .. ..  .
. . 
0 0 · · · eλn t

But note that

(S −1 AS)2 = (S −1 AS)(S −1 AS) = S −1 A2 S
(S −1 AS)3 = (S −1 AS)(S −1 AS)(S −1 AS) = S −1 A3 S

4
Notice how each “SS −1 ” pair cancels. Continuing in this way — you can give a formal proof using
induction — we have (S −1 AS)n = S −1 An S. Therefore,

∞ n ∞ n n
!
Dt
X t (S −1 AS)n X t A
e = = S −1 S = S −1 eAt S.
n=0
n! n=0
n!

Then
SeDt S −1 = SS −1 eAt SS −1
SeDt S −1 = eAt
Notice that S and S −1 have “switched places” from the original diagonalization equation.
Hence,  λ1 t
e 0 ··· 0

 0 eλ2 t · · · 0  −1
eAt = S 
 ... .. ..  S .
. . 
0 0 · · · eλn t
Thus, if A is diagonalizable, find the eigenvalues and use them to construct the diagonal matrix with
the exponentials in the middle. Find a set of independent eigenvectors and use them to construct S and
S −1 . Putting everything into the equation above gives eAt .
Example. Compute eAt if
3 5
A= .
1 −1

The characteristic polynomial is (x − 4)(x + 2) and the eigenvalues are λ = 4, λ = −2. Since there are
two different eigenvalues and A is a 2 matrix, A is diagonalizable. The corresponding eigenvectors are (5, 1)
and (−1, 1). Thus,
5 −1 −1 1 1 1
S= , S = .
1 1 6 −1 5
Hence,

e4t 1 5e4t + e−2t 5e4t − 5e−2t

5 −1 0 1 1 1
eAt = = .
1 1 0 e−2t 6 −1 5 6 e4t − e−2t e4t + 5e−2t

Example. Compute eAt if  

5 −6 −6
A =  −1 4 2 .
3 −6 −4

The characteristic polynomial is (x − 1)(x − 2)2 and the eigenvalues are λ = 1 and λ = 2 (a double
root). The corresponding eigenvectors are (3, −1, 3) for λ = 1, and (2, 1, 0) and (2, 0, 1) for λ = 2. Since I
have 3 independent eigenvectors, the matrix is diagonalizable.
I have    
3 2 2 −1 2 2
S =  −1 1 0  , S −1 =  −1 3 2 .
3 0 1 3 −6 −5
From this, it follows that

−3et + 4e2t 6et − 6e2t 6et − 6e2t

 

eAt =  et − e2t −2et + 3e2t −2et + 2e2t  .

−3et + 3e2t 6et − 6e2t 6et − 5e2t

5
Here’s a quick check you can use when computing eAt . Plugging t = 0 into eAt gives e0 = I, the identity
matrix. For instance, in the last example, if you set t = 0 in the right side, it checks:
 
1 0 0
0 1 0.
0 0 1

However, this check isn’t foolproof — just because you get I by setting t = 0 doesn’t mean your answer
is right. However, if you don’t get I, your answer is surely wrong!
A better check that is a little more work is to compute the derivative of eAt , and then set t = 0. You
should get A. To see this, note that
d At
e = AeAt .
dt
d At
Evaluating both sides of this equation at t = 0 gives e = A. Since the theory of differential
dt t=0
equations tells us that the solution to an initial value problem of this kind is unique, if your answer passes
these checks then it is eAt . I think it’s good to do the first (easier) check, even if you don’t do the second.
If you try this in the previous example, you’ll find that the second check works as well.
Unfortunately, not every matrix is diagonalizable. How do we compute eAt for an arbitrary real matrix?
One approach is to use the Jordan canonical form for a matrix, but this would require a discussion
of canonical forms, a large subject in itself.
Note that any method for finding eAt requires finding the eigenvalues of A which is, in general, a difficult
problem. For instance, there are methods from numerical analysis for approximating the eigenvalues of a
matrix.
I’ll describe an iterative algorithm for computing eAt that only requires that one know the eigenvalues
of A. There are various such algorithms for computing the matrix exponential; this one, which is due to
Richard Williamson [1], seems to me to be the easiest for hand computation. It’s also possible to implement
this method using a computer algebra system like maxima or Mathematica.
Let A be an n × n matrix. Let {λ1 , λ2 , . . . , λn } be a list of the eigenvalues, with multiple eigenvalues
repeated according to their multiplicity.
The last phrase means that if the characteristic polynomial is (x − 1)3 (x − 5), the eigenvalue 1 is listed
3 times. So your list of eigenvalues might be {1, 1, 1, 5}. But you can list them in any order; if you wanted
to show off, you could make your list {1, 5, 1, 1}. It will probably make the computations easier and less
error-prone if you list the eigenvalues in some “nice” way (so either {1, 1, 1, 5} or {5, 1, 1, 1}).
Let
a1 = eλ1 t ,
Z t
ak = eλk (t−u) ak−1 (u) du, k = 2, . . . , n,
0

B1 = I,
Bk = (A − λk−1 I) · Bk−1 , k = 2, . . . , n,
Then
eAt = a1 B1 + a2 B2 + . . . + an Bn .

Remark. If you’ve seen convolutions before, you might recognize that the expression for ak is a convolu-
tion:
ak = eλk t ⋆ ak−1 (t).
In general, the convolution of f and g is
Z t
(f ⋆ g)(t) = f (t − u)g(u) du.
0

6
If you haven’t seen this before, don’t worry: you do not need to know this! The important thing (which
gives the definition of ak ) is the integral on the right side.
To prove that this algorithm works, I’ll show that the expression on the right satisfies the differential
equation x′ = Ax. To do this, I’ll need two facts about the characteristic polynomial p(x).

1. (x − λ1 )(x − λ2 ) · · · (x − λn ) = p(x).
2. (Cayley-Hamilton Theorem) p(A) = 0.

Observe that if p(x) is the characteristic polynomial, then using the first fact and the definition of the
B’s,
p(x) = (x − λ1 )(x − λ2 ) · · · (x − λn )
p(A) = (A − λ1 I)(A − λ2 I) · · · (A − λn I)
= I(A − λ1 I)(A − λ2 I) · · · (A − λn I)
= B1 (A − λ1 I)(A − λ2 I) · · · (A − λn I)
= B2 (A − λ2 I) · · · (A − λn I)
..
.
= Bn (A − λn I)
By the Cayley-Hamilton Theorem,
Bn (A − λn I) = 0. (∗)
I will use this fact in the proof below. First, let’s see an example of the Cayley-Hamilton theorem. Let

2 3
A= .
2 1

The characteristic polynomial is (x − 2)(x − 1) − 6 = x2 − 3x − 4. The Cayley-Hamilton theorem asserts

that if you plug A into x2 − 3x − 4, you’ll get the zero matrix.
First,
2 3 2 3 10 9
A2 = = .
2 1 2 1 6 7
Therefore,
2 10 9 6 9 4 0 0 0
A − A − 4I = − − = .
6 7 6 3 0 4 0 0
We’ve verified the Cayley-Hamilton theorem for this matrix. We’ll provide a proof elsewhere. Let’s give
a proof of the algorithm for eAt .
Proof of the algorithm. First,
Z t Z t
λk (t−u) λk t
ak = e ak−1 (u) du = e e−λk u ak−1 (u) du.
0 0

Recall that the Fundamental Theorem of Calculus says that

t
d
Z
f (u) du = f (t).
dt 0

Applying this and the Product Rule, I can differentiate ak to obtain

Z t
a′k = λk eλk t e−λk u ak−1 (u) du + eλk t e−λk t ak−1 (t),
0

7
a′k = λk ak + ak−1 .
Therefore,
(a1 B1 + a2 B2 + . . . + an Bn )′ =
λ1 a1 B1 +
λ2 a2 B2 + a1 B2 +
λ3 a3 B3 + a2 B3 +
..
.
λn an Bn + an−1 Bn .
Expand the ai−1 Bi terms using

ai−1 Bi = ai−1 (A − λi−1 I)Bi−1 = ai−1 ABi−1 − λi−1 ai−1 Bi−1 .

Making this substitution and telescoping the sum, I have

λ1 a1 B1 +
λ2 a2 B2 + a1 AB1 − λ1 a1 B1 +
λ3 a3 B3 + a2 AB2 − λ2 a2 B2 +
..
.
λn an Bn + an−1 ABn−1 − λn−1 an−1 Bn−1 =
λn an Bn + A(a1 B1 + a2 B2 + . . . + an−1 Bn−1 ) =
λn an Bn − Aan Bn + A(a1 B1 + a2 B2 + . . . + an Bn ) =
−an (A − λn I)Bn + A(a1 B1 + a2 B2 + . . . + an Bn ) =
−an · 0 + A(a1 B1 + a2 B2 + . . . + an Bn ) =
A(a1 B1 + a2 B2 + . . . + an Bn )

(The result (*) proved above was used in the next-to-the-last equality.) Combining the results above,
I’ve shown that
(a1 B1 + a2 B2 + . . . + an Bn )′ = A(a1 B1 + a2 B2 + . . . + an Bn ).
This shows that M = a1 B1 + a2 B2 + . . . + an Bn satisfies M ′ = AM .
Using the power series expansion, I have e−At A = Ae−At . So

(e−At M )′ = −Ae−At M + e−At AM = −e−At AM + e−At AM = 0.

(Remember that matrix multiplication is not commutative in general!) It follows that e−At M is a
constant matrix.
Set t = 0. Since a2 = · · · = an = 0, it follows that M (0) = I. In addition, e−A·0 = I. Therefore,
e −At
M = I, and hence M = eAt .
Example. Use the matrix exponential to solve

′ 3 −1 3
y = y, y(0) = .
1 1 4

y = (y1 , y2 ) is the solution vector.

The characteristic polynomial is (x − 2)2 . You can check that there is only one independent eigenvector,
so I can’t solve the system by diagonalizing. I could use generalized eigenvectors to solve the system,
but I will use the matrix exponential to illustrate the algorithm.
First, list the eigenvalues: {2, 2}. Since λ = 2 is a double root, it is listed twice.

8
First, I’ll compute the ak ’s:
a1 = e2t ,
Z t Z t
a2 = e2t ⋆ a1 (t) = e2(t−u) e2u du = e2t du = te2t .
0 0

Here are the Bk ’s:

1 −1
B1 = I, B2 = (A − 2I)B1 = A − 2I = .
1 −1
Therefore, 2t
e + te2t −te2t

At 2t 1 0 2t 1 −1
e =e + te = .
0 1 1 −1 te2t e2t − te2t
As a rough check, note that setting t = 0 produces the identity.)
The solution to the given initial value problem is
2t
e + te2t −te2t

3
y= .
te2t e2t − te2t 4

You can get the general solution by replacing (3, 4) with (c1 , c2 ).
Example. Find eAt if  
1 0 0
A= 1 1 0.
−1 −1 2

The eigenvalues are obviously λ = 1 (double) and λ = 2.

First, I’ll compute the ak ’s. I have a1 = et , and
Z t Z t
a2 = et−u eu du = et du = tet ,
0 0

Z t
a3 = e2(t−u) ueu du = −tet − et + e2t .
0

Next, I’ll compute the Bk ’s. B1 = I, and

 
0 0 0
B2 = A − I =  1 0 0,
−1 −1 1
 
0 0 0
B3 = (A − I)B2 =  0 0 0.
−2 −1 1
Therefore,
et
 
0 0
eAt =  tet et 0 .
te + 2et − 2e2t
t
et − e2t e2t

Example. Use the matrix exponential to solve

2 −5
y′ = y.
2 −4

y = (y1 , y2 ) is the solution vector.

9
This example will demonstrate how the algorithm for eAt works when the eigenvalues are complex.
The characteristic polynomial is x2 + 2x + 2. The eigenvalues are λ = −1 ± i. I will list them as
{−1 + i, −1 − i}.
First, I’ll compute the ak ’s. a1 = e(−1+i)t , and
Z t Z t
a2 = e(−1+i)(t−u) e(−1−i)u du = e(−1+i)t e(1−i)u e(−1−i)u du =
0 0

Z t
i −2it i (−1−i)t
e(−1+i)t e−2iu du = e(−1+i)t − e(−1+i)t .

e −1 = e
0 2 2
Next, I’ll compute the Bk ’s. B1 = I, and

3−i −5
B2 = A − (−1 + i)I = .
2 −3 − i

Therefore,
i (−1−i)t
1 0 3−i −5
eAt = e(−1+i)t + e − e(−1+i)t .
0 1 2 2 −3 − i
I want a real solution, so I’ll use DeMoivre’s Formula to simplify:

e(−1+i)t = e−t (cos t + i sin t)

e(−1−i)t − e(−1+i)t = e−t (cos t − i sin t) − e−t (cos t + i sin t)
= −2ie−t sin t
i (−1−i)t
e − e(−1+i)t = e−t sin t
2

Plugging these into the expression for eAt above, I have

At −t 1 0 −t 3−i −5 −t cos t + 3 sin t −5 sin t
e = e (cos t + i sin t) + e sin t =e .
0 1 2 −3 − i 2 sin t cos t − 3 sin t

Notice that all the i’s have dropped out! This reflects the obvious fact that the exponential of a real
matrix must be a real matrix.
Finally, the general solution to the original system is

−t cos t + 3 sin t −5 sin t c1
y=e .
2 sin t cos t − 3 sin t c2

Example. Solve the following system using both the matrix exponential and the eigenvector methods.

2 −1
y′ = y.
1 2

y = (y1 , y2 ) is the solution vector.

The characteristic polynomial is x2 − 4x + 5. The eigenvalues are λ = 2 ± i.
Consider λ = 2 + i:
−i −1
A − (2 + i)I = .
1 −i
As this is an eigenvector matrix, it must be singular, and hence the rows must be multiples. So ignore
the second row. I want a vector (a, b) such that (−i)a + (−1)b = 0. To get such a vector, switch the −i and
−1 and negate one of them: a = 1, b = −i. Thus, (1, −i) is an eigenvector.

10
The corresponding solution is

(2+i)t 1 2t cos t + i sin t
e =e .
−i sin t − i cos t

Take the real and imaginary parts:

1 cos t
re e(2+i)t = e2t ,
−i sin t

(2+i)t 1 2t sin t
im e =e .
−i − cos t
The solution is
cos t
2t sin t
y=e c1 + c2 .
sin t − cos t
Now I’ll solve the equation using the matrix exponential. The eigenvalues are {2 + i, 2 − i}. Compute
the ak ’s. a1 = e(2+i)t , and
Z t Z t
(2−i)t (2+i)t (2−i)(t−u) (2+i)u (2−i)t
a2 = e ⋆e = e e du = e e2iu du =
0 0
t
i i (2−i)t i
e(2−i)t − e2iu 1 − e2it = e2t e−it − e−it = e2t sin t.

= e
2 0 2 2
(Here and below, I’m cheating a little in the comparison by not showing all the algebra involved in the
simplification. You need to use DeMoivre’s Formula to eliminate the complex exponentials.)
Next, compute the Bk ’s. B1 = I, and

−i −1
B2 = A − (2 + i)I = .
1 −i

Therefore,
At (2+i)t 1 0 2t −i −1 2t cos t − sin t
e =e + e sin t =e .
0 1 1 −i sin t cos t
The solution is
cos t − sin t c1
y = e2t .
sin t cos t c2
Taking into account some of the algebra I didn’t show for the matrix exponential, I think the eigenvector
approach is easier.
Example. Solve the following system using both the matrix exponential and the (generalized) eigenvector
methods.
′ 5 −8
y = y.
2 −3
y = (y1 , y2 ) is the solution vector.

I’ll do this first using the generalized eigenvector method, then using the matrix exponential.
The characteristic polynomial is x2 − 2x + 1. The eigenvalue is λ = 1 (double).

4 −8
A−I = .
2 −4

Ignore the first row, and divide the second row by 2, obtaining the vector (1, −2). I want (a, b) such
that (1)a + (−2)b = 0. Swap 1 and −2 and negate the −2: I get (a, b) = (2, 1). This is an eigenvector for
λ = −1.

11
Since I only have one eigenvector, I need a generalized eigenvector. This means I need (a′ , b′ ) such that

4 −8 a′ 2
= .
2 −4 b′ 1

Row reduce: " 1#

4 −8 2 1 −2
→ 2
2 −4 1 0 0 0

1 1 1
This means that a′ = 2b′ + . Setting b′ = 0 yields a′ = . The generalized eigenvector is ,0 .
2 2 2
The solution is " 1 #!
t 2 t 2 t
y = c1 e + c2 te +e 2 .
1 1 0
Next, I’ll solve the system using the matrix exponential. The eigenvalues are {1, 1}. First, I’ll compute
the ak ’s. a1 = et , and Z t Z t
a2 = e t ⋆ e t = et−u eu du = et du = tet .
0 0

Next, compute the Bk ’s. B1 = I, and

4 −8
B2 = A − I = .
2 −4

Therefore, t
e + 4tet −8tet

1 0 4 −8
eAt = et + tet = .
0 1 2 −4 2tet et − 4tet
The solution is
et + 4tet −8tet

c1
y= .
2tet e − 4tet
t
c2
In this case, finding the solution using the matrix exponential may be a little bit easier.

[1] Richard Williamson, Introduction to differential equations. Englewood Cliffs, NJ: Prentice-Hall, 1986.

c 2023 by Bruce Ikenaga 12

11-12-2014
Inner Product Spaces
Definition. Let V be a vector space over F , where F = R or C. An inner product on V is a function
h·, ·i : V × V → F which satisfies:
(a) (Linearity) hax + y, zi = a hx, zi + hy, zi, for x, y, z ∈ V , a ∈ F .
(b) (Symmetry) hx, yi = hy, xi,for x, y ∈ V . (“x” denotes the complex conjugate of x.)

(c) (Positive-definiteness) If x 6= 0, then hx, xi ∈ R and hx, xi > 0.

A vector space with an inner product is an inner product space. If F = R, V is a real inner
product space; if F = C, V is a complex inner product space.
Notation. There are various notations for inner products. You may see “(x, y)” or “(x | y)” or “hx | yi”,
for instance. Some specific inner products come with established notation. For example, the dot product,
which I’ll discuss below, is denoted “x · y”.
Proposition. Let V be an inner product space over F , where F = R or C. Let x, y, z ∈ V , and let a ∈ F .
D E D E
(a) x, ~0 = 0 and ~0, x = 0.

(b) hx, ay + zi = a hx, yi + hx, zi.

In particular, if a ∈ R, then hx, ay + zi = a hx, yi + hx, zi.

(c) If F = R, then hx, yi = hy, xi.

(a) D E D E D E D E D E
~0, x = ~0 + ~0, x = ~0, x + ~0, x , so 0 = ~0, x .
D E
By symmetry, x, ~0 = 0 as well.

(b)
hx, ay + zi = hay + z, xi = a hy, xi + hz, xi = ahy, xi + hz, xi = a hx, yi + hx, zi .
If a ∈ R, then a = a, and so hx, ay + zi = a hx, yi + hx, zi.
(c) As in the proof of (b), I have hy, xi = hy, xi, so

hx, yi = hy, xi = hy, xi .

Remarks. Why include complex conjugation in the symmetry axiom? Suppose the symmetry axiom had
read
hx, yi = hy, xi .
Then
0 < hix, ixi = i hx, ixi = i hix, xi = i · i hx, xi = − hx, xi .
This contradicts hx, xi > 0. That is, I can’t have both pure symmetry and positive definiteness.

Example. Suppose u, v, and w are vectors in a real inner product space V . Suppose

hu, vi = 5, hv, wi = −2, hu, wi = 6,

hv, vi = 4, hw, wi = 3.

1
(a) Compute hu + 3w, v − 5wi.
(b) Compute hv + w, v − wi.
(a) Using the linearity and symmetry properties, I have

hu + 3w, v − 5wi = hu, v − 5wi + h3w, v − 5wi = hu, vi − hu, 5wi + h3w, vi − h3w, 5wi =

hu, vi − 5 hu, wi + hw, vi − 15 hw, wi = 5 − 5 · 6 + (−2) − 15 · 3 = −72.

Notice that this “looks like” the polynomial multiplication you learned in basic algebra:

(x + 3z)(y − 5z) = xy − 5xz + 3zy − 15z 2 .

(b)
hv + w, v − wi = hv, vi − hv, wi + hw, vi − hw, wi = hv, vi − hw, wi = 4 − 3 = 1.

Example. Let (a1 , . . . , an ), (b1 , . . . , bn ) ∈ Rn . The dot product on Rn is given by

(a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn .

It’s easy to verify that the axioms for an inner product hold. For example, suppose (a1 , . . . , an ) 6= ~0.
Then at least one of a1 , . . . , an is nonzero, so

(a1 , . . . , an ) · (a1 , . . . , an ) = a21 + · + a2n > 0.

This proves that the dot product is positive-definite.

I can use an inner product to define lengths and angles. Thus, an inner product introduces (metric)
geometry into vector spaces.
Definition. Let V be an inner product space, and let x, y ∈ V .

(a) The length of x is kxk = hx, xi1/2 .

(b) The distance between x and y is kx − yk.
(c) The angle between x and y is the smallest positive real number θ satisfying

hx, yi
cos θ = .
kxkkyk

hx, yi
Remark. The definition of the angle between x and y wouldn’t make sense if the expression was
kxkkyk
greater than 1 or less than −1, since I’m asserting that it’s the cosine of an angle.
In fact, the Cauchy-Schwarz inequality (which I’ll prove below) will show that

hx, yi
−1 ≤ ≤ 1.
kxkkyk

Proposition. Let V be a real inner product space, a ∈ R, x, y ∈ V .

(a) kxk2 = hx, xi.

2
(b) kaxk = |a|kxk. (“|a|” denotes the absolute value of a.)

(c) x 6= 0 if and only if kxk > 0.

(d) (Cauchy- Schwarz inequality) | hx, yi | ≤ kxkkyk.

(e) (Triangle inequality) kx + yk ≤ kxk + kyk.

Proof. (a) Squaring kxk = hx, xi1/2 gives kxk2 = hx, xi.

(b) Since hax, axi = a2 hx, xi,

√
kaxk = a2 kxk = |a|kxk.

(d) If x = ~0, then D E

| hx, yi | = | ~0, y | = |0| = 0 and k~0k|kyk = 0 · kyk = 0.

Hence, | hx, yi | ≤ kxkkyk. The same is true if y = ~0.

Thus, I may assume that x 6= ~0 and y 6= ~0.
The major part of the proof comes next, and it involves a trick. Don’t feel bad if you wouldn’t have
thought of this yourself: Try to follow along and understand the steps.
If a, b ∈ R, then by positive-definiteness and linearity,

0 ≤ hax − by, ax − byi = a2 hx, xi − 2ab hx, yi + b2 hy, yi = a2 kxk2 − 2ab hx, yi + b2 kyk2 .

The trick is to pick “nice” values for a and b. I will set a = kyk and b = kxk. (A rationale for this is
that I want the expression kxkkyk to appear in the inequality.)
I get
kyk2 kxk2 − 2kykkxk hx, yi + kxk2 kyk2 ≥ 0
2kxk2 kyk2 − 2kxkkyk hx, yi ≥ 0
2kxk2 kyk2 ≥ 2kxkkyk hx, yi
kxk2 kyk2 ≥ kxkkyk hx, yi

Since x 6= ~0 and y 6= ~0, I have kxk =

6 0 and kyk =
6 0. So I can divide the inequality by kxkkyk to obtain

kxkkyk ≥ hx, yi .

In the last inequality, x and y are arbitrary vectors. So the inequality is still true if x is replaced by
−x. If I replace x with −x, then k − xk = kxk and h−x, yi = − hx, yi, and the inequality becomes

kxkkyk ≥ − hx, yi .

Since kxkkyk is greater than or equal to both hx, yi and − hx, yi, I have

kxkkyk ≥ | hx, yi |.

(e)
kx + yk2 = hx + y, x + yi = kxk2 + 2 hx, yi + kyk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .

Hence, kx + yk ≤ kxk + kyk.

3
Example. R3 is an inner product space using the standard dot product of vectors. The cosine of the angle
between (2, −2, 1) and (6, −8, 24) is

(2, −2, 1) · (6, −8, 24) 12 + 16 + 24 2

cos θ = = = .
k(2, −2, 1)kk(6, −8, 24)k (3)(26) 3

Example. Let C[0, 1] denote the real vector space of continuous functions on the interval [0, 1]. Define an
inner product on C[0, 1] by
Z 1
hf, gi = f (x)g(x) dx.
0

Note that f (x)g(x) is integrable, since it’s continuous on a closed interval.

The verification that this gives an inner product relies on standard properties of Riemann integrals. For
example, if f 6= 0, Z 1
hf, f i = f (x)2 dx > 0.
0

Given that this is a real inner product, I may apply the preceding proposition to produce some useful
results. For example, the Cauchy-Schwarz inequality says that
Z 1 1/2 Z 1 1/2 Z 1
2 2
f (x) dx g(x) dx ≥ f (x)g(x) dx .
0 0 0

Definition. A set of vectors S in an inner product space V is orthogonal if hvi , vj i = 0 for vi , vj ∈ S,

vi 6= vj .
An orthogonal set S is orthonormal if kvi k = 1 for all vi ∈ S.
If you’ve seen dot products in a multivariable calculus course, you know that vectors in Rn whose dot
product is 0 are perpendicular. With this interpretation, the vectors in an orthogonal set are mutually
perpendicular. The vectors in an orthonormal set are mutually perpendicular unit vectors.
Notation. If I is an index set, the Kronecker delta δij (or δ(i, j)) is defined by

0 if i 6= j
δij =
1 if i = j

With this notation, a set S = {vi } is orthonormal if

hvi , vj i = δij .

Note that the n × n matrix whose (i, j)-th component is δij is the n × n identity matrix.

Example. The standard basis for Rn is

e1 = (1, 0, 0, . . . , 0)
e2 = (0, 1, 0, . . . , 0)
..
.
en = (0, 0, 0, . . . , 1)
It’s clear that relative to the dot product on Rn , each of these vectors as length 1, and each pair of the
vectors has dot product 0. Hence, the standard basis is an orthonormal set relative to the dot product on
Rn .

4
Example. Consider the following set of vectors in R2 :

3 4 4 3
, , − , .
5 5 5 5

I have
3 4 3 4 9 16
, · , = + = 1,
5 5 5 5 25 25

3 4 4 3 12 12
, · − , =− + = 0,
5 5 5 5 25 25

4 3 4 3 16 9
− , · − , = + = 1.
5 5 5 5 25 25
It follows that the set is orthonormal relative to the dot product on R2 .

Example. Let C[0, 2π] denote the complex-valued continuous functions on [0, 2π]. Define an inner product
by Z 2π
1
hf, gi = f (x)g(x) dx.
2π 0
Let m, n ∈ Z. Then
2π
1
Z
eimx e−inx dx = δmn .
2π 0

It follows that the following set is orthonormal in C[0, 2π] relative to this inner product:

1
√ eimx m = . . . , −1, 0, 1, . . .
2π

Proposition. Let {vi } be an orthogonal set of vectors, vi 6= 0 for all i. Then {vi } is independent.
Proof. Suppose
a1 vi1 + a2 vi2 + · · · + an vin = ~0.
Take the inner product of both sides with vi1 :

a1 hvi1 , vi1 i + a2 hvi1 , vi2 i + · · · + an hvi1 , vin i = 0.

Since {vi } is orthogonal,

hvi1 , vi2 i = · · · = hvi1 , vin i = 0.
The equation becomes
a1 hvi1 , vi1 i = 0.
But hvi1 , vi1 i > 0 by positive-definiteness, since vi1 6= ~0. Therefore, a1 = 0.
Similarly, taking the inner product of both sides of the original equation with vi2 , . . . vin shows aj = 0
for all j. Therefore, {vi } is independent.
An orthonormal set consists of vectors of length 1, so the vectors are obviously nonzero. Hence, an
orthonormal set is independent, and forms a basis for the subspace it spans. A basis which is an orthonormal
set is called an orthonormal basis.
It is very easy to find the components of a vector relative to an orthonormal basis.

5
Proposition. Let {vi } be an orthonormal basis for V , and let v ∈ V . Then
X
v= hv, vi i vi .
i

Note: In fact, the sum above is a finite sum — that is, only finitely many terms are nonzero.
Proof. Since {vi } is a basis, there are elements aj ∈ F and viv ∈ {vi } such that

v = a1 vi1 + a2 vi2 + · · · + an vin .

Take the inner product of both sides with vi1 . Then

hv, vi1 i = a1 hvi1 , vi1 i + a2 hvi2 , vi1 i + · · · + an hvin , vi1 i .

As in the proof of the last proposition, all the inner product terms on the right vanish, except that
hvi1 , vi1 i = 1 by orthonormality. Thus,
hv, vi1 i = a1 .
Taking the inner product of both sides of the original equation with vi2 , . . . vin shows

hv, vi2 i = a2 , . . . hv, vin i = an .

Example. Here is an orthonormal basis for R2 :

3 4 
   

 − 
5  5
 4 , 3 

 

5 5
To express (−7, 6) in terms of this basis, take the dot product of the vector with each element of the
basis:
3 4 3
(−7, 6) · , = ,
5 5 5

4 3 46
(−7, 6) · − , = .
5 5 5
Hence,
3 4
   

−7

3   46  5 −
= · 5 + · .
6 5 4 5 3 
5 5

Example. Let C[0, 2π] denote the complex inner product space of complex-valued continuous functions on
[0, 2π], where the inner product is defined by
2π
1
Z
hf, gi = f (x)g(x) dx.
2π 0

I noted earlier that the following set is orthonormal:

1
S = √ emix m = . . . , −1, 0, 1, . . . .
2π

6
Suppose I try to compute the “components” of f (x) = x relative to this orthonormal set by taking inner
products — that is, using the approach of the preceding example.
For m = 0,
1
Z 2π √
√ x dx = π 2π.
2π 0
Suppose m 6= 0. Then

2π 2π √
1 1 1 −mix i 2π
Z
−mix i −mix
√ xe dx = √ xe − 2e = .
2π 0 2π m m 0 m

There are infinitely many nonzero components! Of course, the reason this does not contradict the earlier
result is that f (x) = x may not lie in the span of S. S is orthonormal, hence independent, but it is not a
basis for C[0, 2π].
In fact, since emix = cos mx + i sin mx, a finite linear combination of elements of S must be periodic.
It is still reasonable to ask whether (or in what sense) f (x) = x can be represented by the the infinite
sum √ √ !
∞
√ X i 2π mix i 2π −mix
π 2π + e − e .
m=1
m m

For example, it is reasonable to ask whether the series converges uniformly to f at each point of [0, 2π].
The answers to these kinds of questions would require an excursion into the theory of Fourier series.

Since it’s so easy to find the components of a vector relative to an orthonormal basis, it’s of interest to
have an algorithm which converts a given basis to an orthonormal one.
The Gram-Schmidt algorithm converts a basis to an orthonormal basis by “straightening out” the
vectors one by one.

v2
v - proj v v2
2 1

proj v v2
1

The picture shows the first step in the straightening process. Given vectors v1 and v2 , I want to replace
v2 with a vector perpendicular to v1 . I can do this by taking the component of v2 perpendicular to v1 , which
is
hv1 , v2 i
v2 − v1 .
hv1 , v1 i

Lemma. (Gram-Schmidt algorithm) Let {v1 , . . . , vk } be a set of nonzero vectors in an inner product
space V . Suppose v1 , . . . , vk−1 are pairwise orthogonal. Let

k−1
X hvi , vk i
vk′ = vk − vi .
i=1
hvi , vi i

Then vk′ is orthogonal to v1 , . . . , vk−1 .

7
Proof. Let j ∈ {1, . . . , k − 1}. Then

k−1
X hvi , vk i
hvj , vk′ i = hvj , vk i − hvj , vi i .
i=1
hvi , vi i

Now hvj , vi i = 0 for i 6= j because the set is orthogonal. Hence, the right side collapses to

hvj , vk i
hvj , vk i − hvj , vj i = hvj , vk i − hvj , vk i = 0.
hvj , vj i

Suppose that I start with an independent set {v1 , . . . , vn }. Apply the Gram-Schmidt procedure to the
set, beginning with v1′ = v1 . This produces an orthogonal set {v1′ , . . . , vn′ }. In fact, {v1′ , . . . , vn′ } is a nonzero
orthogonal set, so it is independent as well.
To see that each vk′ is nonzero, suppose

k−1
X hvi , vk i
0 = vk′ = vk − vi .
i=1
hvi , vi i

Then
k−1
X hvi , vk i
vk = vi .
i=1
hvi , vi i

This contradicts the independence of {vi }, because vk is expressed as the linear combination of v1 ,
. . . vk−1 .
In general, if the algorithm is applied iteratively to a set of vectors, the span is preserved at each state.
That is,
hv1 , . . . , vk i = hv1′ , . . . , vk′ i.
This is true at the start, since v1 = v1′ . Assume inductively that

hv1 , . . . , vk−1 i = hv1′ , . . . , vk−1

′
i.

Consider the equation

k−1
X hvi , vk i
vk′ = vk − vi .
i=1
hvi , vi i

It expresses vk′ as a linear combination of {v1 , . . . , vk }. Hence, hv1′ , . . . , vk′ i ⊂ hv1 , . . . , vk i.

Conversely,
k−1
X hvi , vk i
vk = vk′ + vi ⊂ hvk′ i + hv1 , . . . , vk−1 i =
i=1
hvi , vi i

hvk′ i + hv1′ , . . . , vk−1

′
i = hv1′ , . . . , vk′ i.
It follows that hv1 , . . . , vk i ⊂ hv1′ , . . . , vk′ i, so hv1 , . . . , vk i = hv1′ , . . . , vk′ i, by induction.
To summarize: If you apply Gram-Schmidt to a set of vectors, the algorithm produces a new set of
vectors with the same span as the old set. If the original set was independent, the new set is independent
(and orthogonal) as well.
So, for example, if Gram-Schmidt is applied to a basis for an inner product space, it will produce an
orthogonal basis for the space.
Finally, you can always produce orthonormal set from a orthogonal set (of nonzero vectors) — merely
divide each vector in the orthogonal set by its length.

8
Example. (Gram-Schmidt) Apply Gram-Schmidt to the following set of vectors in R3 (relative to the
usual dot product):      
 3 −1 2 
0, 0 , 9  .
4 7 11
 

v1′ = v1 = (3, 0, 4),

(−1, 0, 7) · (3, 0, 4)
v2′ = (−1, 0, 7) − (3, 0, 4) = (−4, 0, 3).
(3, 0, 4) · (3, 0, 4)
(2, 9, 11) · (3, 0, 4) (2, 9, 11) · (−4, 0, 3)
v3′ = (2, 9, 11) − (3, 0, 4) − (−4, 0, 3) = (0, 9, 0).
(3, 0, 4) · (3, 0, 4) (−4, 0, 3) · (−4, 0, 3)
(A common mistake here is to project onto v1 , v2 , . . . . I need to project onto the vectors that have
already been orthogonalized. That is why I projected onto (3, 0, 4) and (−4, 0, 3) rather than (3, 0, 4) and
(−1, 0, 7).)
The algorithm has produced the following orthogonal set:
     
 3 −4 0 
0, 0 ,9 .
4 3 0
 

The lengths of these vectors are 5, 5, and 9. For example

p
k(3, 0, 4)k = 32 + 02 + 42 = 5.

The correponding orthonormal set is

      
1 3 1
−4 0 
0,  0 ,1 .
5 5
4 3 0


Example. (Gram-Schmidt) Find an orthonormal basis (relative to the usual dot product) for the subspace
of R4 spanned by the vectors

v1 = (1, 0, 2, 2), v2 = (10, 1, 0, 4), v3 = (1, 1, 0, 13).

I’ll use v1′ , v2′ , v3′ to denote the orthonormal basis.

To simplify the computations, you should fix the vectors so they’re mutually perpendicular first. Then
you can divide each by its length to get vectors of length 1.
First,
v1′ = v1 = h1, 0, 2, 2i.
Next,
v2 · v1′ ′ (10, 1, 0, 4) · (1, 0, 2, 2)
v2′ = v2 − v = (10, 1, 0, 4) − (1, 0, 2, 2) =
v1′ · v1′ 1 (1, 0, 2, 2) · (1, 0, 2, 2)
(10, 1, 0, 4) − 2 · (1, 0, 2, 2) = (10, 1, 0, 4) − (2, 0, 4, 4) = (8, 1, −4, 0).
You can check that v2′ · v1′ = 0, so the first two are perpendicular.
Finally,
v3 · v ′ v3 · v ′
v3′ = v3 − ′ 1′ v1′ − ′ 2′ v2′ =
v1 · v1 v2 · v2

9
(1, 1, 0, 13) · (1, 0, 2, 2) (1, 1, 0, 13) · (8, 1, −4, 0)
(1, 1, 0, 13) − (1, 0, 2, 2) − (8, 1, −4, 0) =
(1, 0, 2, 2) · (1, 0, 2, 2) (8, 1, −4, 0) · (8, 1, −4, 0)

27 9 8 1 4
(1, 1, 0, 13) − · (1, 0, 2, 2) − · (8, 1, −4, 0) = (1, 1, 0, 13) − (3, 0, 6, 6) − , ,− ,0 =
9 81 9 9 9

26 8 50
− , ,− ,7 .
9 9 9
If at any point you wind up with a vector with fractions, it’s a good idea to clear the fractions before
continuing. Since multiplying a vector by a number doesn’t change its direction, it remains perpendicular
to the vectors already constructed.
Thus, I’ll multiply the last vector by 9 and use

v3′ = (−26, 8, −50, 63).

Thus, the orthogonal set is

v1′ = (1, 0, 2, 2),

v2′ = (8, 1, −4, 0), v3′ = (−26, 8, −50, 63).
√ √
The lengths of these vectors are 3, 9, and 7209 = 9 89. Dividing the vectors by their lengths gives
and orthonormal basis:

1 2 2 8 1 4 26 8 50 7
, 0, , , , ,− ,0 , − √ , √ ,− √ , √ .
3 3 3 9 9 9 9 89 9 89 9 89 89

Recall that when an n-dimensional vector is interpreted as a matrix, it is taken to be an n × 1 matrix:

that is, an n-dimensional column vector

v1
 
 v2 
(v1 , v2 , . . . vn ) = 
 ...  .


If I need an n-dimensional row vector, I’ll take the transpose. Thus,

(v1 , v2 , . . . vn )T = [ v1 v2 · · · vn ] .

Lemma. Let A be an invertible n × n matrix with entries in R. Let

hx, yi = xT AT Ay.

Then h, i defines an inner product on Rn .

Proof. I have to check linearity, symmetry, and positive definiteness.
First, if a ∈ R, then

hax1 + x2 , yi = (ax1 + x2 )AT Ay = a(x1 AT Ay) + x2 AT Ay = a hx1 , yi + hx2 , yi .

This proves that the function is linear in the first slot.

Next,
hx, yi = xT AT Ay = (y T AT Ax)T = y T AT Ax = hy, xi .
The second equality comes from the fact that (BC)T = C T B T for matrices. The third inequality comes
from the fact that y T AT Ax is a 1 × 1 matrix, so it equals its transpose.
This proves that the function is symmetric.

10
Finally,
hx, xi = xT AT Ax = (Ax)T (Ax).
Now Ax is an n × 1 vector — I’ll label its components this way:

u1
 
 u2 
Ax = 
 ...  .


Then
u1
 
 u2 
hx, xi = (Ax)T (Ax) = [ u1 u2 · · · un ]  2 2 2
 ...  = u1 + u2 + · · · + un ≥ 0.


un
That is, the inner product of a vector with itself is a nonnegative number. All that remains is to show
that if the inner product of a vector with itself is 0, them the vector is ~0.
Using the notation above, suppose

0 = (x, x) = u21 + u22 + · · · + u2n .

Then u1 = u2 = · · · = un = 0, because a nonzero u would produce a positive number on the right side
of the equation.
So
u1 0
   
 u2   0 
Ax =     ~
 ...  =  ...  = 0.
un 0
Finally, I’ll use the fact that A is invertible:

A−1 Ax = A−1~0, x = ~0.

This proves that the function is positive definite, so it’s an inner product.

Example. The previous lemma provides lots of examples of inner products on Rn besides the usual dot
product. All I have to do is take an invertible matrix A and form AT A, defining the inner product as above.
For example, this 2 × 2 real matrix is invertible:

5 2
A= .
2 1

Now
29 12
AT A = .
12 5
(Notice that AT A will always be symmetric.) The inner product defined by this matrix is

29 12 y1
h(x1 , x2 ), y1 , y2 )i = [ x1 x2 ] = 29x1 y1 + 12x2 y1 + 12x1 y2 + 5x2 y2 .
12 5 y2

For example, under this inner product,

√
h(1, 2), (−8, 3)i = −358, and k(5, −2))k = 505.

11
Definition. A matrix A in M (n, R) is orthogonal if AAT = I.
Proposition. Let A be an orthogonal matrix.
(a) det(A) = ±1.
(b) AAT = I = AT A — in other words, AT = A−1 .
(c) The rows of A form an orthonormal set. The columns of A form an orthonormal set.

(d) A preserves dot products — and hence, lengths and angles — in the sense that

(Ax) · (Ay) = x · y.

Proof. (a) If A is orthogonal,

det(AAT ) = det(I) = 1
det(A)det(AT ) = 1
(det(A))2 = 1
Therefore, det(A) = ±1.
(b) Since det A = ±1, the determinant is certainly nonzero, so A is invertible. Hence,

AAT = I
A−1 AAT = A−1 I
IAT = A−1 I
AT = A−1

But A−1 A = I, so AT A = I as well.

(c) The equation AAT = I implies that the rows of A form an orthonormal set of vector. Likewise, AT A = I
shows that the same is true for the columns of A.

(d) The ordinary dot product of vectors x = (x1 , x2 , . . . xn ) and y = (y1 , y2 , . . . yn ) can be written as a matrix
multiplication:
y1
 
 y2  T
x · y = [ x1 x2 · · · xn ]   ..  = x y.

.
yn
(Remember the convention that vectors are column vectors.)
Suppose A is orthogonal. Then

(Ax) · (Ay) = (Ax)T (Ay) = xT AT Ay = xT Iy = xT y = x · y.

In other words, orthogonal matrices preserve dot products. It follows that orthogonal matrices will also
preserve lengths of vectors and angles between vectors, because these are defined in terms of dot products.

Example. Find real numbers a and b such that the following matrix is orthogonal:

a 0.6
A= .
b 0.8

12
Since the columns of A must form an orthonormal set, I must have

(a, b) · (0.6, 0.8) = 0 and k(a, b)k = 1.

(Note that k(0.6, 0.8)k = 1 already.) The first equation gives

0.6a + 0.8b = 0.

The easy way to get a solution is to swap 0.6 and 0.8 and negate one of them; thus, a = −0.8 and
b = 0.6.
Since k(−0.8, 0.6)k = 1, I’m done. (If the a and b I chose had made k(a, b)k =
6 1, then I’d simply divide
(a, b) by its length.)

Example. Orthogonal 2 × 2 matrices represent rotations of the plane about the origin or reflections
across a line through the origin.
Rotations are represented by matrices

cos θ − sin θ
.
sin θ cos θ

You can check that this works by considering the effect of multiplying the standard basis vectors (1, 0)
and (0, 1) by this matrix.
Multiplying a vector by the following matrix product reflects the vector across the line L that makes an
angle θ with the x-axis:
cos θ − sin θ 1 0 cos θ sin θ
.
sin θ cos θ 0 −1 − sin θ cos θ
Reading from right to left, the first matrix rotates everything by −θ radians, so L coincides with the
x-axis. The second matrix reflects everything across the x-axis. The third matrix rotates everything by θ
radians. Hence, a given vector is rotated by −θ and reflected across the x-axis, after which the reflected
vector is rotated by θ. The net effect is to reflect across L.
Many transformation problems can be easily accomplished by doing transformations to reduce a general
problem to a special case.

c 2014 by Bruce Ikenaga 13

4-13-2008
Coordinate Transformations
A coordinate transformation of the plane is a function f : R2 → R2 . I’ll usually assume that f has
continuous partial derivatives, and that f is “essentially” one-to-one in the region of interest. (A function is
one-to-one if different inputs produce different outputs.)
A coordinate transformation will usually be given by an equation (x, y) = f (u, v). You can think of it
as deforming or moving things in the u-v plane and placing them in the x-y plane.
v y

u x

I’m going to look at some important special cases.

1. Linear transformations. You know that a linear transformation R2 → R2 has the form

x a b u
= .
y c d v
a, b, c, and d are numbers. In equation form, this is
x = au + bv, y = cu + dv.
“Linear transformation” is long to write and say, so I’ll often use linear map for short.
Linear maps always leave the origin fixed, since u = 0, v = 0 gives x = 0, y = 0. (This is another way
of saying that linear maps take the zero vector to the zero vector.) And as you might expect, linear maps
carry lines to lines.
Moreover, a linear map takes squares to parallelograms:
v y

u x

For example, look at what happens to the unit square 0 ≤ u ≤ 1, 0 ≤ v ≤ 1:

a b 1 a
=
c d 0 c

a b 0 b
=
c d 1 d
The vectors h1, 0i, h0, 1i determine the unit square. They are mapped to the vectors ha, ci, hb, di, two
adjacent sides of the parallelogram.

1
2. Translations. A translation R2 → R2 has the form

x u e
= + .
y v f
e and f are numbers. This translation just translates everything by the vector he, f i.
v y

<e,f>

u x
A translation by a nonzero vector is not a linear map, because linear maps must send the zero vector
to the zero vector. However, translations are very useful in performing coordinate transformations. I’ll
introduce the following terminology for the composite of a linear transformation and a translation.
Definition. Let A be a real m × n matrix. An affine map is a function f : Rn → Rm of the form
f (x) = Ax + b, where x ∈ Rn and b ∈ Rm .

Example. Find an affine map which carries the unit square 0 ≤ u ≤ 1, 0 ≤ v ≤ 1 to the parallelogram in
the x-y plane with vertices A(2, −1), B(5, 3), C(4, −6), D(7, −2).
I’m going to make a rough sketch of the parallelogram first. I want to know the orientation of the
vertices — e.g. that B and C are next to A, and that D is opposite A.
y

A
D

I’m going to do the transformation in two steps. First, I’ll take the square to a parallelogram which is
the right size and shape, but which has a corner at the origin. Next, I’ll move the parallelogram so it’s at
the right place.

2
−−→ −→
The vectors from A to B and to C are AB = h3, 4i and AC = h2, −5i. I saw above that if I construct a
matrix with h3, 4i as the first column and h2, −5i as the second column, then it will multiply h1, 0i to h3, 4i
and h0, 1i to h2, −5i:
3 2 1 3 3 2 0 2
= =
4 −5 0 4 4 −5 1 −5
So the following transformation takes the unit square to a parallelogram of the right shape:

x 3 2 u
= .
y 4 −5 v

This transformation takes (0, 0) to (0, 0); I want (0, 0) to go to the point A (which I used as the base
point for my two vectors). To fix this, just translate by A:

x 3 2 u 2
= + .
y 4 −5 v −1

3. Rotations. A rotation counterclockwise through an angle θ is given by

x cos θ − sin θ u
= .
y sin θ cos θ v

To see this, just check that the unit vectors h1, 0i, h0, 1i go to the right places:

cos θ − sin θ 1 cos θ cos θ − sin θ 0 − sin θ
= =
sin θ cos θ 0 sin θ sin θ cos θ 1 cos θ

<cos θ , sin θ >

<-sin θ , cos θ >

θ θ

You can see from the picture that h1, 0i and h0, 1i have both been rotated by an angle θ. Other vectors
can be built out of these two vectors, so other vectors are rotated by θ as well.
4. Reflections. This is how to do a reflection across the x-axis:

x 1 0 u u
= = .
y 0 −1 v −v

It is clear that this reflects things across the x-axis, because it simply negates the second component.

3
What about reflection across a line L making an angle θ with the origin? It’s messy to do using analytic
geometry, but very easy using matrices. Simply do a rotation through −θ which carries L to the x − axis.
Next, reflect across the x-axis. Finally, do a rotation through −θ which carries the x-axis back to L. Each
of these transformations can be accomplished by matrix multiplication; just multiply the three matrices to
do reflection across L.
Translations, rotations, and reflections are examples of rigid motions. They preserve distances between
points, as well as areas.

Example. Find a real 2 × 2 matrix A such that

x x
f =A
y y
√
reflects points across the line y = 3 x.
√ π
The line y = 3 x makes an angle of with the positive x-axis. I can accomplish this transformation
3
as follows:
y y y

-p/3 p/3
x x x

Rotate the line Reflect across Rotate the x-axis

down to the x-axis the x-axis up to the line

Thus, √ 
π π  cos π π 1 3
  
cos − sin sin −
A= 3 3 1 0  3 3 =
√ 2 2 .
π π 0 −1 π π 3 1
sin cos − sin cos
3 3 3 3 2 2

c 2008 by Bruce Ikenaga 4

11-24-2014
Unitary Matrices and Hermitian Matrices
Recall that the conjugate of a complex number a + bi is a − bi. The conjugate of a + bi is denoted
a + bi or (a + bi)∗ .
In this section, I’ll use ( ) for complex conjugation of numbers of matrices. I want to use ( )∗ to denote
an operation on matrices, the conjugate transpose.
Thus,
3 + 4i = 3 − 4i, 5 − 6i = 5 + 6i, 7i = −7i, 10 = 10.
Complex conjugation satisfies the following properties:

(a) If z ∈ C, then z = z if and only if z is a real number.

(b) If z1 , z2 ∈ C, then
z1 + z2 = z1 + z2 .

The proofs are easy; just write out the complex numbers (e.g. z1 = a + bi and z2 = c + di) and compute.
The conjugate of a matrix A is the matrix A obtained by conjugating each element: That is,

(A)ij = Aij .

You can check that if A and B are matrices and k ∈ C, then

kA + B = k · A + B and AB = A · B.

You can prove these results by looking at individual elements of the matrices and using the properties
of conjugation of numbers given above.
Definition. If A is a complex matrix, A∗ is the conjugate transpose of A:

A∗ = AT .

Note that the conjugation and transposition can be done in either order: That is, AT = (A)T . To see
this, consider the (i, j)th element of the matrices:

[(AT )]ij = (AT )ij = Aji = (A)ji = [(A)T ]ij .

Example. If  
1 − 2i 4
1 + 2i 2−i 3i
A= , then A∗ =  2 + i −2 − 7i  .
4 −2 + 7i 6 + 6i
−3i 6 − 6i
Since the complex conjugate of a real number is the real number, if B is a real matrix, then B ∗ = B T .

Remark. Most people call A∗ the adjoint of A — though, unfortunately, the word “adjoint” has already
been used for the transpose of the matrix of cofactors in the determinant formula for A−1 . (Sometimes

1
people try to get around this by using the term “classical adjoint” to refer to the transpose of the matrix
of cofactors.) In modern mathematics, the word “adjoint” refers to a property of A∗ that I’ll prove below.
This property generalizes to other things which you might see in more advanced courses.
The ( )∗ operation is sometimes called the Hermitian — but this has always sounded ugly to me, so
I won’t use this terminology.
Since this is an introduction to linear algebra, I’ll usually refer to A∗ as the conjugate transpose,
which at least has the virtue of saying what the thing is.
Proposition. Let U and V be complex matrices, and let k ∈ C.
(a) (U ∗ )∗ = U .
(b) (kU + V )∗ = kU ∗ + V ∗ .
(c) (U V )∗ = V ∗ U ∗ .
(d) If u, v ∈ Cn , their dot product is given by

u · v = v ∗ u.

Proof. I’ll prove (a), (c), and (d).

For (a), I use the fact noted above that ( ) and ( )T can be done in either order, along with the facts
that
A = A and (AT )T = A.
I have
(U ∗ )∗ = [(U T )]T = [(U T )T ] = U = U.
This proves (a).
For (c), I have
(U V )∗ = (U V )T = V T U T = V T · U T = V ∗ · U ∗ .
For (d), recall that the dot product of complex vectors u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) is

u · v = u1 v1 + u2 v2 + · · · + un vn .

Notice that you take the complex conjugates of the components of v before multiplying!
This can be expressed as the matrix multiplication

u1
 
 u2 
u · v = [ v1 v2  ...  = v u.
· · · vn ]   ∗

Example. In this example, use the complex dot product.

(a) Compute (1 + 3i, 2 + i) · (4 − 5i, 2 + 3i).
(b) Find k(2 + i, 3 − 5i)k.
(c) Find a nonzero vector (a, b) which is orthogonal to (1 + 8i, 2 − 3i).
(a)

1 + 3i
(1 + 3i, 2 + i) · (4 − 5i, 2 + 3i) = [ 4 + 5i 2 − 3i ] = (4 + 5i)(1 + 3i) + (2 − 3i)(2 + i) = −4 + 13i.
2+i

2
It’s a common notational abuse to write the number “−4 + 13i” instead of writing it as a 1 × 1 matrix
“[−4 + 13i]”.
(b)

k(2 + i, 3 − 5i)k2 = (2 + i, 3 − 5i) · (2 + i, 3 − 5i) = (2 − i)(2 + i) + (3 + 5i)(3 − 5i) = 4 + 1 + 9 + 25 = 39.

√
Hence, k(2 + i, 3 − 5i)k = 39.
The following formula is evident from this example:
p
k(a + bi, c + di)k = a2 + b2 + c2 + d2 .

This extends in the obvious way to vectors in Cn .

(c) I need
(a, b) · (1 + 8i, 2 − 3i) = 0.
In matrix form, this is
a
[ 1 − 8i 2 + 3i ] = 0.
b
Note that the vector (1 + 8i, 2 − 3i) was conjugated and transposed.
Doing the matrix multiplication,

(1 − 8i)a + (2 + 3i)b = 0.

I can get a solution (a, b) by switching the numbers 1 − 8i and 2 + 3i and negating one of them:
(a, b) = (2 + 3i, −1 + 8i).

There are two points about the equation u · v = v ∗ u which might be confusing. First, why is it necessary
to conjugate and transpose v? The reason for the conjugation goes back to the need for inner products to
be positive definite (so u · u is a nonnegative real number).
The reason for the transpose is that I’m using the convention that vectors are column vectors. So if u
and v are n-dimensional column vectors and I want the product to be a number — i.e. a 1 × 1 matrix — I
have to multiply an n-dimensional row vector (1 × n) and an n-dimensional column vector (n × 1). To get
the row vector, I have to transpose the column vector.
Finally, why do u and v switch places in going from the left side to the right side? The reason you write
v ∗ u instead of u∗ v is because inner products are defined to be linear in the first variable. If you use u∗ v you
get a product which is linear in the second variable.
Of course, none of this makes any difference if you’re dealing with real numbers. So if x and y are
vectors in Rn , you can write
x · y = xT y or x · y = y T x.

Definition. A complex matrix U is unitary if U U ∗ = I.

Notice that if U happens to be a real matrix, U ∗ = U T , and the equation says U U T = I — that is, U
is orthogonal. In other words, unitary is the complex analog of orthogonal.
By the same kind of argument I gave for orthogonal matrices, U U ∗ = I implies U ∗ U = I — that is, U ∗
is U .
−1

Proposition. Let U be a unitary matrix.

(a) U preserves inner products: x · y = (U x) · (U y). Consequently, it also preserves lengths: kU xk = kxk.
(b) An eigenvalue of U must have length 1.

3
(c) The columns of a unitary matrix form an orthonormal set.
Proof. (a)
(U x) · (U y) = (U y)∗ (U x) = y ∗ U ∗ U x = y ∗ Ix = y ∗ x = x · y.
Since U preserves inner products, it also preserves lengths of vectors, and the angles between them. For
example,
kxk2 = x · x = (U x) · (U x) = kU xk2 , so kxk = kU xk.

(b) Suppose x is an eigenvector corresponding to the eigenvalue λ of U . Then U x = λx, so

kU xk = kλxk = |λ|kxk.

But U preserves lengths, so kU xk = |xk, and hence |λ| = 1.

(c) Suppose  
↑ ↑ ↑
U =  c1 c2 · cn  .
↓ ↓ ↓
Then U ∗ U = I means
1 0 0 ··· 0
 
← c1 T → 
 
0 1 0 ··· 0

← c2 T → ↑ ↑ ↑
0 0 1 ··· 0.
 
 ..   c1 c2 · cn  = 
.
 . 
 .. .. .. .. 
↓ ↓ ↓ . . .
← cn T →
0 0 0 ··· 1

Here ck T is the complex conjugate of the kth column ck , transposed to make it a row vector. If you look
at the dot products of the rows of U ∗ and the columns of U , and note that the result is I, you see that the
equation above exactly expresses the fact that the columns of U are orthonormal.
For example, take the first row c1 T . Its product with the columns c1 , c2 , and so on give the first row of
the identity matrix, so
c1 · c1 = 1, c1 · c2 = 0, . . . , c1 · cn = 0.
This says that c1 has length 1 and is perpendicular to the other columns. Similar statements hold for
c2 , . . . , cn .

Example. Find c and d so that the following matrix is unitary:

 1 
√ (1 + 2i) c
 7
.

 1
√ (1 − i) d
7
I want the columns to be orthogonal, so their complex dot product should be 0. First, I’ll find a vector that
1
is orthogonal to the first column. I may ignore the factor of √ ; I need
7

(a, b) · (1 + 2i, 1 − i) = 0

a
[ 1 − 2i 1 + i ] =0
b

This gives
(1 − 2i)a + (1 + i)b = 0.

4
I may take a = 1 + i and b = −1 + 2i. Then
√
k(1 + i, −1 + 2i)k = 7.
√
So I need to divide each of a and b by 7 to get a unit vector. Thus,

1 1
(c, d) = √ (1 + i), √ (−1 + 2i) .
7 7

Proposition. (Adjointness) let A ∈ M (n, C) and let u, v ∈ Cn . Then

Au · v = u · A∗ v.

Proof.
u · A∗ v = (A∗ v)∗ u = v ∗ (A∗ )∗ u = v ∗ Au = Au · v.
Remark. If (·, ·) is any inner product on a vector space V and T : V → V is a linear transformation, the
adjoint T ∗ of T is the linear transformation which satisfies

(T (u), v) = (u, T ∗ (v)) for all u, v ∈ V.

(This definition assumes that there is such a transformation.) This explains why, in the special case
of the complex inner product, the matrix A∗ is called the adjoint. It also explains the term self-adjoint in
the next definition.
Corollary. (Adjointness) let A ∈ M (n, R) and let u, v ∈ Rn . Then

Au · v = u · AT v.

Proof. This follows from adjointness in the complex case, because A∗ = AT for a real matrix.
Definition. An complex matrix A is Hermitian (or self-adjoint) if A∗ = A.
Note that a Hermitian matrix is automatically square.
For real matrices, A∗ = AT , and the definition above is just the definition of a symmetric matrix.

Example. Here are examples of Hermitian matrices:

 
5 6i 2
−4 2 + 3i
,  −6i 0.87 1 − 5i  .
2 − 3i 17
2 1 + 5i 42

It is no accident that the diagonal entries are real numbers — see the result that follows.

Here’s a table of the correspondences between the real and complex cases:

Real Case Complex Case

T T
u·v =u v =v u u · v = v∗ u
Transpose ()T Conjugate transpose ()∗
Orthogonal matrix AAT = I Unitary matrix U U ∗ = I
Symmetric matrix A = AT Hermitian matrix H = H ∗

5
Proposition. Let A be a Hermitian matrix.

(a) The diagonal elements of A are real numbers, and elements on opposite sides of the main diagonal are
conjugates.

(b) The eigenvalues of a Hermitian matrix are real numbers.

(c) Eigenvectors of A corresponding to different eigenvalues are orthogonal.

Proof. (a) Since A = A∗ , I have Aij = Aji . This shows that elements on opposite sides of the main diagonal
are conjugates.
Taking i = j, I have
Aii = Aii .
But a complex number is equal to its conjugate if and only if it’s a real number, so Aii is real.

(b) Suppose A is Hermitian and λ is an eigenvalue of A with eigenvector v. Then

λ(v · v) = (λv) · v = (Av) · v = v · A∗ v = v · Av = v · (λv) = λ(v · v).

Therefore, λ = λ — but a number that equals its complex conjugate must be real.

µ(u · v) = (µu) · v = Au · v = u · A∗ v = u · Av = u · (λv) = λ(u · v) = λ(u · v).

u · v 6= 0 implies µ = λ, so if the eigenvalues are different, then u · v = 0.

Example. Let
1 2−i
A= .
2 + i −3
Show that the eigenvalues are real, and that eigenvectors for different eigenvalues are orthogonal.

The matrix is Hermitian. The characteristic polynomial is

x2 + 2x − 8 = (x + 4)(x − 2).

The eigenvalues are real numbers: −4 and 2.

For −4, the eigenvector matrix is

5 2−i
A + 4I = .
2+i 1

(2 − i, −5) is an eigenvector.
For 2, the eigenvector matrix is
−1 2 − i
A − 2I = .
2 + i −5
(2 − i, 1) is an eigenvector.
Note that
(2 − i, −5) · (2 − i, 1) = (2 + i)(2 − i) + (1)(−5) = 5 − 5 = 0.
Thus, the eigenvectors are orthogonal.

6
Since real symmetric matrices are Hermitian, the previous results apply to them as well. I’ll restate the
previous result for the case of a symmetric matrix.
Corollary. Let A be a symmetric matrix.
(a) The elements on opposite sides of the main diagonal are equal.

(b) The eigenvalues of a symmetric matrix are real numbers.

Example. Consider the symmetric matrix

3 2
A= .
2 6
The characteristic polynomial is x2 − 9x + 14 = (x − 7)(x − 2).
Note that the eigenvalues are real numbers.
For λ = 7, an eigenvector is (1, 2).
For λ = 2, an eigenvector is (−2, 1).
Since (1, 2) · (−2, 1) = 0, the eigenvectors are orthogonal.

Example. A 2 × 2 real symmetric matrix A has eigenvalues 1 and 3.

(2, −3) is an eigenvector corresponding to the eigenvalue 1.
(a) Find an eigenvector corresponding to the eigenvalue 3.
Let (a, b) be an eigenvector corresponding to the eigenvalue 3.
Since eigenvectors for different eigenvalues of a symmetric matrix must be orthogonal, I have

(2, −3) · (a, b) = 0, or 2a − 3b = 0.

So, for example, (a, b) = (3, 2) is a solution.

(b) Find A.

From (a), a diagonalizing matrix and the corresponding diagonal matrix are

2 3 1 0
P = and D = .
−3 2 0 3

Now P −1 AP = D, so

2 3 1 0 1 2 −3 1 31 12
A = P DP −1
= = .
−3 2 0 3 13 3 2 13 12 21

Note that the result is indeed symmetric.

Example. Let p, q, r, s ∈ R, and consider the 2 × 2 Hermitian matrix

p q + ri
A= .
q − ri s

Compute the characteristic polynomial of A, and show directly that the eigenvalues must be real num-
bers.

7
p − x q + ri
|A − xI| = = (x − p)(x − s) − (q + ri)(q − ri) = x2 − (p + s)x + [ps − (q 2 + r 2 )].
q − ri s − x

The discriminant is

(p+s)2 −4(1)[ps−(q 2 +r 2 )] = (p2 +2ps+s2 )−4ps+4(q 2 +r 2 ) = (p2 −2ps+s2 )+4(q 2 +r 2 ) = (p−s)2 +4(q 2 +r 2 ).

Since this is a sum of squares, it can’t be negative. Hence, the roots of the characteristic polynomial —
the eigenvalues — must be real numbers.

c 2014 by Bruce Ikenaga 8

11-24-2014
The Spectral Theorem
Theorem. (Schur) If A is an n × n matrix, then there is a unitary matrix U such that U AU −1 is upper
triangular. (Recall that a matrix is upper triangular if the entries below the main diagonal are 0.)
Proof. Use induction on n, the size of A. If A is 1 × 1, it’s already upper triangular, so there’s nothing to
do.
Take n > 1, and assume the result is true for (n − 1) × (n − 1) matrices.
Over the complex numbers, the characteristic polynomial of A must have at least one root, so A has an
eigenvalue λ and a corresponding eigenvector ~v :

A~v = λ~v .

By dividing ~v by its length, I can assume k~v k = 1.

Build a basis with ~v as the first vector, then use Gram-Schmidt to construct an orthonormal basis with
~v as the first vector:
−
→2 , . . . , u
B = {~v , u −
→ n }.

Let  
↑ ↑ ↑
U =  ~v −
→2 . . . −
u u→n .

↓ ↓ ↓
Since the columns are orthonormal, U is unitary. Then
 
λ junk
0
 
U ∗ AU = [std → B] · A · [B → std] = 
 
 ... .

 B
0

The issue here is why the first column of the last matrix is what it is. To see this, notice that [std →
B] · A · [B → std] is the matrix [T ]B,B of the linear transformation T (~x) = A~x relative to B. But

T (~v ) = A~v = λ~v = (λ, 0 . . . , 0)B ,

so this is the first column of [T ]B,B .

The matrix B is (n − 1) × (n − 1), so by induction I can find a unitary matrix V such that V ∗ BV is
upper triangular. Then 
1 0 · · · 0

0 
 .
 .. 

V  
0
is unitary, and ∗
1 0 · · · 0 1 0 · · · 0
  
0  ∗ 0 
 .
 ..   U AU  ...
  
 V  V 
0 0
is upper triangular.
This completes the induction step.
Theorem. (The Spectral Theorem) If A is Hermitian, then there is a unitary matrix U and a diagonal
matrix D such that
U ∗ AU = D.

1
(Note that since U is unitary, U ∗ = U −1 .)
Proof. Find a unitary matrix U such that U ∗ AU = T , where T is upper triangular. Then since A∗ = A,
(U ∗ AU )∗ = T ∗ , U ∗ A∗ U = T ∗ , U ∗ AU = T ∗ .
But then T = T ∗ . T is upper triangular, T ∗ (the conjugate transpose) is lower triangular, so T must
be diagonal.
Corollary. (The Principal Axis Theorem) If A is a real symmetric matrix, there is an orthogonal matrix
O and a diagonal matrix D such that
O T AO = D.
(Note that since O is orthogonal, O T = O −1 .)
Proof. Real symmetric matrices are Hermitian and real orthogonal matrices are unitary, so the result follows
from the Spectral Theorem.
I showed earlier that for a Hermitian matrix (or in the real case, a symmetric matrix), eigenvectors
corresponding to different eigenvalues are perpendicular. Consequently, if I have an n × n Hermitian matrix
(or in the real case, an n × n symmetric matrix) with n different eigenvalues, the corresponding eigenvectors
form an orthogonal basis. I can get an orthonormal basis — and hence, a unitary diagonalizing matrix (or
in the real case, an orthogonal diagonalizing matrix) — by simply dividing each vector by its length.
Things are a little more complicated if I have fewer than n eigenvalues. The Spectral Theorem guarantees
that I’ll have n independent eigenvectors, but some eigenvalues will have several eigenvectors. In this case,
I’d need to use Gram-Schmidt on the eigenvectors for each eigenvalue to get an orthogonal set of eigenvectors
for each eigenvalue. Eigenvectors corresponding to different eigenvalues are still perpendicular by the result
cited earlier, so the orthogonal sets for the eigenvalues fit together to form an orthogonal basis. As before, I
get an orthonormal basis by dividing each vector by its length.
To keep the computations simple, I’ll stick to the first case (n different eigenvalues) in the examples
below.

Example. Let  
−1 2 0
A =  2 2 0.
0 0 3
Find an orthogonal matrices O which diagonalizes A. Find O −1 and the corresponding diagonal matrix.
The characteristic polynomial is
(x − 3)[(x − 2)(x + 1) − (2)(2)] = −(x − 3)2 (x + 2).
The eigenvalues are x = 3 and x = −2.
For x = 3, the eigenvector matrix is
   
−4 2 0 2 −1 0
A − 3I =  2 −1 0  → 0 0 0
0 0 0 0 0 0
This gives the independent eigenvectors (1, 2, 0) and (0, 0, 1). Dividing them by their lengths, I get
1
√ (1, 2, 0) and (0, 0, 1).
5
For x = −2, the eigenvector matrix is
   
1 2 0 1 2 0
A + 2I =  2 4 0  →  0 0 1
0 0 5 0 0 0

2
1
This gives the independent eigenvector (−2, 1, 0). Dividing it by its length, I get √ (−2, 1, 0).
5
Thus, the orthogonal diagonalizing matrix is
 1 2 
√ 0 −√
 5 5
 2 1 
O= .
√ 0 √ 
5 5

0 1 0

Then
1
 2 
√ √ 0
 5 5 
O −1 = OT =  0 0 1.
 
 2 1 
−√ √ 0
5 5
The diagonal matrix is  
3 0 0
O T AO =  0 3 0  .
0 0 −2

Example. Let
3 1 − 2i
A= .
1 + 2i −1
Find a unitary matrix U which diagonalizes A.
Since A is Hermitian, the Spectral Theorem applies.
The characteristic polynomial is

3 − λ 1 − 2i
det(A − λI) = det = λ2 − 2λ − 8 = (λ − 4)(λ + 2).
1 + 2i −1 − λ

The eigenvalues are 4 and −2.

For λ = 4, partial row reduction gives

−1 1 − 2i −1 1 − 2i
A − 4I = → .
1 + 2i −5 0 0

(Since this is a 2 × 2 A − λI matrix, I know that the second row must be a multiple of the first.)
An eigenvector (a, b) must satisfy
(−1)a + (1 − 2i)b = 0.
I can get a solution by swapping the −1 and the 1 − 2i and negating the −1 to give 1: (a, b) = (1 − 2i, 1)
is an eigenvector for λ = 4.
For λ = −2, partial row reduction gives

5 1 − 2i 5 1 − 2i
A + 2I = → .
1 + 2i 1 0 0

Using the same technique as I used for λ = 4, I see that (a, b) = (1−2i, −5) is an eigenvector for λ = −2.
The result I proved earlier says that these eigenvectors are automatically perpendicular. Check by
taking their complex dot product:

(1 − 2i, 1) · (1 − 2i, −5) = (1 − 2i)(1 + 2i) + (1)(−5) = 5 − 5 = 0.

3
Find the lengths of the eigenvectors:
√ √
k(1 − 2i, 1)k = 6, k(1 − 2i, −5)k = 30.

Next, divide each eigenvector by its length:

1 1
√ (1 − 2i, 1), √ (1 − 2i, −5).
6 30

Finally, U is constructed using these normalized vectors as the columns:

 1 − 2i 1 − 2i 
√ √
6 30 
U = .

1 5 
√ −√
6 30

c 2014 by Bruce Ikenaga 4

4-26-2016
Fourier Series
Imagine a thin piece of wire, which only gains or loses heat through its ends. The temperature u(x, t)
is a function of the position x along the wire and the time t. It is a solution to the heat equation

ut = a2 uxx .

The solution to the one-dimensional heat equation with an arbitrary initial distribution is an infinite
sum
∞
n2 π 2 a2 t
X πnx
bn exp − sin .
n=1
L2 L

If u(x, 0) is the initial distribution, then the bn ’s are given by

∞
X πnx
u(x, 0) = bn sin .
n=1
L

That is, u(x, 0) is expressed as an infinite sum of sines.

A Fourier series represents a function f (x) as an infinite sum of trigonometric functions:
∞
a0 X πnx πnx
f (x) ∼ + an cos + bn sin .
2 n=1
L L

Consider integrable functions f and g defined on the interval −L ≤ x ≤ L, where L > 0. Define the
inner product of f and g by
1 L
Z
hf, gi = f (x)g(x) dx.
L −L
Example. Verify the linearity and symmetry axioms for an inner product for
L
1
Z
hf, gi = f (x)g(x) dx.
L −L

Let a ∈ R and f1 and f2 be integrable functions on −L ≤ x ≤ L. Using properties of integrals, I have

L
1
Z
haf1 + f2 , gi = (af1 (x) + f2 (x))g(x) dx =
L −L

L L
1 1
Z Z
a· f1 (x)g(x) dx + f2 (x)g(x) dx = a hf1 , gi + hf2 , gi .
L −L L −L

Symmetry is easy:
L L
1 1
Z Z
hf, gi = f (x)g(x) dx = g(x)f (x) dx = hg, f i .
L −L L −L

Note that this should be called “inner product” (with quotes), since it isn’t positive definite. Since
[f (x)]2 ≥ 0, it follows that
L L
1 1
Z Z
hf, f i = f (x) · f (x) dx = [f (x)]2 dx ≥ 0.
L −L L −L

1
But you could have a function which was nonzero — for example, a function which was 0 at every point
in −L ≤ x ≤ L, but nonzero at a single point — such that (f, f ) = 0.
We can get around this problem by confining ourselves to continuous functions. Unfortunately, many
of the functions we’d like to apply Fourier series to aren’t continuous.

Theorem. (Orthogonality Relations)

1 L
Z
πjx πkx 0 if j=
6 k
cos cos dx =
L −L L L 1 if j=k
Z L
1 πjx πkx
cos sin dx = 0
L −L L L
1 L
Z
πjx πkx 0 if j=
6 k
sin sin dx =
L −L L L 1 if j=k

Example. I’ll illustrate the first formula with some numerical examples.
If I take different cosines, I should get 0. I’ll try
1 L

3πx 7πx
Z
cos cos dx.
L −L L L
Use the identity
1
cos a cos b = (cos(a + b) + cos(a − b)) .
2
I get
3πx 7πx 1 10πx 4πx
cos cos = cos + cos .
L L 2 L L
So
L L
1 3πx 7πx 1 1 10πx 4πx
Z Z
cos cos dx = cos + cos dx =
L −L L L L −L 2 L L
L
1 L 10πx L 4πx
sin + sin = 0.
2L 10π L 4π L −L
(Remember that the sine of a multiple of π is 0!)
If I do the integral with both cosines the same, I use the double angle formula:
1 L
Z L
3πx 3πx 1 6πx
Z
cos cos dx = 1 + cos dx =
L −L L L 2L −L L
L
1 L 6πx
x+ sin = 1.
2L 6π L −L
There is nothing essentially different in the derivation of the general formulas.

In terms of the (almost) inner product defined above, the orthogonality relations are:

πjx πkx 0 if j 6= k
cos , cos =
L L 1 if j = k

πjx πkx
cos , sin =0
L L

πjx πkx 0 if j 6= k
sin , sin =
L L 1 if j = k

2
In other words, the sine and cosine functions form an orthnormal set (again, allowing that we don’t
quite have an inner product here).
1
As a constant function, I’ll use √ . You can verify that this is perpendicular to the sines and cosines
2
above, and that it has length 1.
For vectors in an inner product space, you can get the components of a vectors by taking the inner prod-
uct of the vector with elements of an orthonormal basis. For example, if {u1 , u2 , . . . , un } is an orthonormal
basis, then a vector x can be written as a linear combination of the u’s by taking inner product of x with
the u’s:
x = hx, u1 iu1 + hx, u2 iu2 + · · · + hx, un iun .
By analogy, I might try to expand a function in terms of the sines and cosines by taking the inner
1
product of f with the sines and cosines, and the constant function √ . Doing so, I get
2
L
1
Z πnx
an = f (x) cos dx
L −L L

L
1
Z πnx
bn = f (x) sin dx
L −L L
πnx
The cosine term is an cos .
L
πnx
The sine term is bn sin .
L
π·0·x
If I set n = 0 in the an formula, then since cos = 1, I’d get
L
L
1
Z
a0 = f (x) dx.
L −L

1
So for the constant function √ , the coefficient is
2
L L
1 1 1 1 1
Z Z
f (x) · √ dx = √ · f (x) dx = √ a0 .
L −L 2 2 L −L 2

1
But my constant function is √ , so the constant term in the series will be
2

1 1 1
√ a0 · √ = a0 .
2 2 2

Putting everything together gives the Fourier series for f (x):

∞
a0 X πnx πnx
f (x) = + an cos + bn sin .
2 n=1
L L

You can interpret it as an expression for f as an infinite “linear combination” of the orthnormal sine
and cosine functions.
There are several issues here. For one thing, this is an infinite sum, not the usual finite linear combination
which expresses a vector in terms of a basis. Questions of convergence always arise with infinite sums.
Moreover, even if the infinite sum converges, why should it converge to f (x)?
In addition, we’re only doing this by analogy with our results on inner product spaces, because this
“inner product” only satisfies two of the inner product axioms.

3
It’s important to know that these issues exist, but their treatment will be deferred to an advanced course
in analysis.

Before doing some examples, here are some notes about computations.
You can often use the complex exponential (DeMoivre’s formula) to simplify computations:

πinx πnx πnx
exp = cos + i sin .
L L L

Specifically, let
L
1
Z
πinx
ck = f (x) exp dx.
L −L L

Then
an = re cn , bn = im cn .

This allows me to compute a single integral, then find the real and imaginary parts of the result to get
the sine and cosine coefficients.

On some occasions, symmetry can be used to obtain the values of some of the coefficients.
1. If a function is even — that is, if f (x) = f (−x) for all x, so the graph is symmetric about the y-axis
— then bn = 0 for all n.

2. If a function is odd — that is, if f (−x) = −f (x) for all x, so the graph is symmetric about the origin
— then an = 0 for all n.

This makes sense, since the cosine functions are even and the sine functions are odd.
Here’s a final remark before I get to the examples. A sum of periodic functions is periodic. Since sine
and cosine are periodic, you can only expect to represent periodic functions by a Fourier series. Therefore,
outside the interval −L ≤ x ≤ L, you must “repeat” f (x) in periodic fashion (so f (x) = f (x + L) for all x).
For example, consider the function f (x) = x, −1 ≤ x < 1. L = 1, so I must “repeat” the function every
two units. Picture:
y

x
-1 1

The Fourier expansion of f (x) only converges to y = x on the interval −1 < x < 1. Outside of that
interval, it converges to the periodic function in the picture (except perhaps at the jump discontinuities).

Example. Find the Fourier expansion of

−1 if −1 ≤ x < 0
f (x) = and f (x) = f (x + 2) for all x.
1 if 0 ≤ x < 1

4
1

0.5

-4 -2 2 4

-0.5

-1

This graph is called a square wave.

This is an odd function, so ak = 0 for all k — that is, all the cosine terms vanish. I’ll compute all the
coefficients anyway so you can see this directly.
Compute the constant term first:
1
1
Z
a0 = f (x) dx = 0.
1 −1

Next, compute the cn terms:

2 0 1
1
Z Z Z
πinx πinx
cn = f (x)e dx = − e dx + eπinx dx =
1 −1 −1 0

0 1
i πinx i i i
+ − eπinx = eπin − 1 .

e 1 − e−πin −
πn −1 πn 0 πn πn
Now
e−πni = cos πn − i sin πn = cos πn = (−1)n
eπni = cos πn + i sin πn = cos πn = (−1)n
So
2i
cn = (1 − (−1)n ) .
πn
Now
2
an = re cn = 0, and bn = im cn = (1 − (−1)n ) .
πn
So the Fourier expansion is
∞
X 2
f (x) ∼ (1 − (−1)n ) sin πnx.
n=1
πn

Here are the graphs of the first, fifth, and tenth partial sums:

1 1 1

0.5 0.5 0.5

-4 -2 2 4 -4 -2 2 4 -4 -2 2 4

-0.5 -0.5 -0.5

-1 -1 -1

Notice the “ringing” at the points of discontinuity, and how the graphs are approaching the original
square wave.
Note that if x = 0, all the terms of the series are 0, and the series converges to 0. In fact, under
reasonable conditions — specifically, if f is of bounded variation in an interval around a point c — the

5
Fourier series will converges to the average of the left and right-hand limits at the point. In this case, the
left-hand limit is −1, the right-hand limit is 1, and their average is 0. On the other hand, f (0) was defined
to be 1.

Example. Find the Fourier expansion of

1 if −1 ≤ x < 0
f (x) = and f (x) = f (x + 2) for all x.
(x − 1)2 if 0 ≤ x < 1

0.5

-4 -2 2 4

Compute the constant term first:

1 0 1 1
1 4
Z Z Z
2 0
a0 = f (x) dx = 1 dx + (x − 1) dx = [x]−1 + (x − 1)3 = .
−1 −1 0 3 0 3

Next, compute the higher order terms:

Z 1 Z 0 Z 1
cn = f (x)eπinx dx = eπinx dx + (x − 1)2 eπinx dx =
−1 −1 0

0 1
−i πinx i 2 2i
e + − (x − 1)2 + 2 2 (x − 1) + 3 3 eπinx =
πn −1 πn π n π n 0

i 2i i 2 2i
1 − e−πni + 3 3 eπin +

− + 2 2 − 3 3.
πn π n πn π n π n
Therefore,
i 2i i 2 2i
cn = − (1 − (−1)n ) + 3 3 (−1)n + + 2 2− 3 3 =
πn π n πn π n π n
i 2i 2 2i
(−1)n + 3 3 (−1)n + 2 2 − 3 3 .
πn π n π n π n
Now take real and imaginary parts:

2 1 2
an = re cn = , bn = im cn = (−1)n − 3 3 (1 − (−1)n ) .
π 2 n2 πn π n

The Fourier series is

∞
2 X 2 1 n 2 n
f (x) ∼ + cos πnx + (−1) − 3 3 (1 − (−1) ) sin πnx .
3 n=1 π n2
2 πn π n

6
Example. Find the Fourier expansion of

x+π if −π ≤ x < 0
f (x) = and f (x) = f (x + 2π) for all x.
0 if 0 ≤ x < π

Compute the constant term first:

π 0 π 0
1 1 1 1 1 2
Z Z Z
π
a0 = f (x) dx = (x + π) dx + (0) dx = x + πx = .
π −π π −π π 0 π 2 −π 2

Next, compute the higher order terms. As in the computation of a0 , I only need the integral from π to
0, since the function equals 0 from 0 to π:

π 0 0
1 1 1 1 1 inx
Z Z
πinx/π inx inx
cn = f (x)e dx = (x + π)e dx = (x + π)e + 2e =
π −π π −π π in n −π

1 1 1 1 1 1 1 1 −πin
·π+ 2 − · 0 · e−πin − 2 e−πin = + − e =
π in n in n in πn2 πn2
1 1 1
− i+ − (−1)n .
n πn2 πn2
Thus,
1 1 1
an = − (−1)n and bn = − .
πn2 πn2 n
The series is
∞
π X 1 1 n 1
f (x) ∼ + − (−1) cos nx − sin nx .
4 n=1 πn2 πn2 n

c 2016 by Bruce Ikenaga 7

07 Sets
100% (2)
07 Sets
81 pages
Order and Repetition Do Not Matter in ( ) : The Notation ( ) Describes A Set
No ratings yet
Order and Repetition Do Not Matter in ( ) : The Notation ( ) Describes A Set
9 pages
Transition Book 3
No ratings yet
Transition Book 3
290 pages
mt1003purelecturenotes
No ratings yet
mt1003purelecturenotes
126 pages
Acc 103 - Notes To Class
No ratings yet
Acc 103 - Notes To Class
7 pages
MATU9S1 A Notes
No ratings yet
MATU9S1 A Notes
50 pages
Lesson 1 - Real Numbers and Integer Exponent
No ratings yet
Lesson 1 - Real Numbers and Integer Exponent
29 pages
Sets
No ratings yet
Sets
20 pages
Introduction and Preliminaries - Sets and Functions
No ratings yet
Introduction and Preliminaries - Sets and Functions
33 pages
Text8 Introducing To Sets
No ratings yet
Text8 Introducing To Sets
2 pages
RM_Number Sense_ Set Theory (1)
No ratings yet
RM_Number Sense_ Set Theory (1)
11 pages
Sets Recent
No ratings yet
Sets Recent
33 pages
MAT1110 Lecture Notes Sets 1
No ratings yet
MAT1110 Lecture Notes Sets 1
32 pages
Discrete Mathematics: Chapter 1 - Sets
No ratings yet
Discrete Mathematics: Chapter 1 - Sets
83 pages
Math 203: Number Theory: Alberto D. Yazon, PHD
No ratings yet
Math 203: Number Theory: Alberto D. Yazon, PHD
56 pages
MOD004428 2022-23 Lecture 01 - Set Theory I
No ratings yet
MOD004428 2022-23 Lecture 01 - Set Theory I
91 pages
1 2 3 4 5 6 7 8 Merged
No ratings yet
1 2 3 4 5 6 7 8 Merged
54 pages
Discrete Math Module 1
No ratings yet
Discrete Math Module 1
9 pages
Sets WRITEUP
No ratings yet
Sets WRITEUP
21 pages
Math Reviewer. Entrance Examination
No ratings yet
Math Reviewer. Entrance Examination
13 pages
02 Lecture (Set)
No ratings yet
02 Lecture (Set)
36 pages
L-11 Set Theory
No ratings yet
L-11 Set Theory
80 pages
MTH111-Algebra and Trigonometry-Alhassan Charity Jumai-2019
No ratings yet
MTH111-Algebra and Trigonometry-Alhassan Charity Jumai-2019
46 pages
Sets
No ratings yet
Sets
16 pages
Language of Sets (Hand Out)
No ratings yet
Language of Sets (Hand Out)
4 pages
01 WS1 Set Language and Notation
No ratings yet
01 WS1 Set Language and Notation
27 pages
Algebra For Biological Science PDF
No ratings yet
Algebra For Biological Science PDF
87 pages
1013L1 Chapter1
No ratings yet
1013L1 Chapter1
39 pages
MMW Concepts of Sets 1
No ratings yet
MMW Concepts of Sets 1
20 pages
Math Stage 1
No ratings yet
Math Stage 1
171 pages
Mth 101 Coursewaredocx
No ratings yet
Mth 101 Coursewaredocx
72 pages
Introduction To Sets and Functions
No ratings yet
Introduction To Sets and Functions
6 pages
Chapter 2
No ratings yet
Chapter 2
13 pages
Module 3 GE3 Math
No ratings yet
Module 3 GE3 Math
13 pages
Lecture1-sets
No ratings yet
Lecture1-sets
56 pages
Progress in Mathematics 10
No ratings yet
Progress in Mathematics 10
185 pages
MATH 10-3 Lesson 1 Sets and The Real Number System
No ratings yet
MATH 10-3 Lesson 1 Sets and The Real Number System
55 pages
MTH 111 Algebra and Trigonometry Lecture Notes 20182019
No ratings yet
MTH 111 Algebra and Trigonometry Lecture Notes 20182019
46 pages
LogicSection PDF
No ratings yet
LogicSection PDF
16 pages
Discourse Notes
No ratings yet
Discourse Notes
56 pages
Logic Section
No ratings yet
Logic Section
16 pages
MTH100 Highlight Handout by Vu Topper RM
100% (1)
MTH100 Highlight Handout by Vu Topper RM
70 pages
Lecture 7
No ratings yet
Lecture 7
4 pages
Calculus - Sets (University College Dublin)
No ratings yet
Calculus - Sets (University College Dublin)
14 pages
Better Discrete Maths
No ratings yet
Better Discrete Maths
24 pages
Introduction to Advanced Mathematics Randall R. Holmes pdf download
No ratings yet
Introduction to Advanced Mathematics Randall R. Holmes pdf download
63 pages
Math 100 - General Mathematics-3 (Repaired)
No ratings yet
Math 100 - General Mathematics-3 (Repaired)
87 pages
MATH 10-3 Lesson 1 Sets and The Real Number System
No ratings yet
MATH 10-3 Lesson 1 Sets and The Real Number System
55 pages
Lecture 1-2
No ratings yet
Lecture 1-2
24 pages
Math Language and Sets
No ratings yet
Math Language and Sets
56 pages
Suhail Q. Mir: University of Kashmir
No ratings yet
Suhail Q. Mir: University of Kashmir
31 pages
Introduction to Advanced Mathematics Randall R. Holmes instant download
No ratings yet
Introduction to Advanced Mathematics Randall R. Holmes instant download
65 pages
Mathematic courses
No ratings yet
Mathematic courses
5 pages
Eulogio T. Catalan, JR., MBA
No ratings yet
Eulogio T. Catalan, JR., MBA
140 pages
Introduction to Advanced Mathematics Randall R. Holmes pdf download
No ratings yet
Introduction to Advanced Mathematics Randall R. Holmes pdf download
66 pages
9.01 Sets
No ratings yet
9.01 Sets
17 pages
0.1 Basic Set Theory and Interval Notation
No ratings yet
0.1 Basic Set Theory and Interval Notation
12 pages
Introduction to Advanced Mathematics Randall R. Holmes instant download
No ratings yet
Introduction to Advanced Mathematics Randall R. Holmes instant download
63 pages
The Conaissance Grand Scholastic Tome Part II
No ratings yet
The Conaissance Grand Scholastic Tome Part II
321 pages
Lebesgue Integration
From Everand
Lebesgue Integration
J.H. Williamson
No ratings yet
Untitled
No ratings yet
Untitled
5 pages
2023 Memo,Vrae - Semester
No ratings yet
2023 Memo,Vrae - Semester
13 pages
10
No ratings yet
10
49 pages
Assignment Solution
No ratings yet
Assignment Solution
6 pages
Digital
No ratings yet
Digital
2 pages
Congruences 1 The Congruence Relation
No ratings yet
Congruences 1 The Congruence Relation
3 pages
DSA Syllabus
No ratings yet
DSA Syllabus
2 pages
B: Ü.:: A:::::: A:: Xxy XXX Yxx: Acb. A:I - , 2I, B: ("R - I3 X: 2 - Xy
No ratings yet
B: Ü.:: A:::::: A:: Xxy XXX Yxx: Acb. A:I - , 2I, B: ("R - I3 X: 2 - Xy
9 pages
AVL Tree
No ratings yet
AVL Tree
36 pages
Mathematical Logic
No ratings yet
Mathematical Logic
121 pages
Sma 112 Basic Mathematics Course Description
No ratings yet
Sma 112 Basic Mathematics Course Description
2 pages
Math 8 - Mathematics As An Axiomatic System: Ms. Andi Fullido © Quipper
No ratings yet
Math 8 - Mathematics As An Axiomatic System: Ms. Andi Fullido © Quipper
17 pages
Programs On Stack
No ratings yet
Programs On Stack
9 pages
3.1 One-To-One Functions and The Horizontal Line Test PDF
No ratings yet
3.1 One-To-One Functions and The Horizontal Line Test PDF
8 pages
A Proof of Arrow's Impossibility Theorem
No ratings yet
A Proof of Arrow's Impossibility Theorem
3 pages
BITE202P-Digital Logic & Microprocessor Lab: Assessment 1: Combinational Logic Circuits
No ratings yet
BITE202P-Digital Logic & Microprocessor Lab: Assessment 1: Combinational Logic Circuits
2 pages
AO Star Algorithm
No ratings yet
AO Star Algorithm
15 pages
Practice Test - Math As A Language - MATHEMATICS IN THE MODERN WORLD
No ratings yet
Practice Test - Math As A Language - MATHEMATICS IN THE MODERN WORLD
8 pages
Divide and Conquer: Strassen's Algorithm, Fibonacci Numbers
No ratings yet
Divide and Conquer: Strassen's Algorithm, Fibonacci Numbers
25 pages
practice question_Unit 3
No ratings yet
practice question_Unit 3
24 pages
Combinational Logic
No ratings yet
Combinational Logic
26 pages
Numerical Methods For Graduate School: JP Bersamina October 11,2018
No ratings yet
Numerical Methods For Graduate School: JP Bersamina October 11,2018
67 pages
Greedy, Dynamic Programming, and Divide and Conquer Algorithm
No ratings yet
Greedy, Dynamic Programming, and Divide and Conquer Algorithm
15 pages
MAT1001 - Calculus and Laplace Transformation: Dr.A.Benevatho Jaison
No ratings yet
MAT1001 - Calculus and Laplace Transformation: Dr.A.Benevatho Jaison
15 pages
1ch4 PDF
No ratings yet
1ch4 PDF
7 pages
A Lesson Plan On Teaching Concepts of Sets in Grade 7 Mathematics
No ratings yet
A Lesson Plan On Teaching Concepts of Sets in Grade 7 Mathematics
4 pages
Factoring Large Integers 1 1631601412765
No ratings yet
Factoring Large Integers 1 1631601412765
25 pages
AI Endsem Notes by Dk?
No ratings yet
AI Endsem Notes by Dk?
74 pages
Brilliant Org Wiki...
No ratings yet
Brilliant Org Wiki...
6 pages
ch2 MCQ 11e
No ratings yet
ch2 MCQ 11e
33 pages