Linear Algebra Notes
Linear Algebra Notes
I’ll review some basics about sets and some of the number systems we’ll use. If you’re familiar with
these things, you can skim this section and refer to it later when you need to.
A set is informally described as a collection of objects. The objects are elements of the set, or
members of the set. So a set is kind of like a bag, and its elements are things in the bag.
However, it’s easy to get into trouble with that informal description. Trouble can be fun, so let’s see
how you can get into trouble here. Russell’s paradox is named after the British mathematician Bertrand
Russell. Here’s how it goes. Suppose X is the set whose members are sets which are not members of
themselves. (Thus, X is a bag, and the members of X — the things in the bag — are themselves bags.)
Now is X a member of X? It either is or it isn’t.
If X is a member of X, then X is a set which is not a member of itself, by the definition of X. So X is
both a member of itself and not a member of itself.
If X is not a member of X, then X is a set which is not a member of itself, so it should be a member
of X, by definition of X. Again, X is both a member of itself and not a member of itself.
You may have to think about this a bit, since the words can make your eyes glaze over!
You can see that both alternatives lead to a contradiction. Something is wrong, and it lies in the way
we created the set X. It turns out that, to deal with sets in a rigorous way, you have to be careful about
how sets can be created. The Zermelo-Fraenkel-Choice Axioms do this, and they are one of the most
commonly accepted foundations for mathematics. If you’re interested in learning more, you should take a
course or read a book in set theory or mathematical logic.
We will stay out of trouble by avoiding weird sets like the one in Russell’s paradox.
Let’s start by reviewing common set constructions and notations. If A and B are sets, then:
We’ll often use schematic diagrams like this one (sometimes called Venn diagrams) to picture arbitrary
sets. The sets are drawn as rectangles or ovals or other closed shapes, and you think of the elements of the
sets as being stuff inside. In this case, the dot labelled “x” is a particular element of A.
(b) A and B are equal if they have exactly the same elements. In that case, we write A = B.
(c) A ⊂ B means that A is a subset of B — that is, every element of A is also an element of B. It is
possible in this case that A = B; if we don’t want to allow A = B, we’ll write A ( B and say that A is a
proper subset of B.
1
(d) A ∩ B is the intersection of A and B — that is, all the elements which A and B have in common.
A∩ B
A
(e) A ∪ B is the union of A and B — that is, all the elements which are in A or are in B (or are in
both).
A∪B
(f) A − B is the complement of B relative to A — that is, all the elements of A which are not elements
of B.
A-B
B
A
If there is some “big set” S and everything under discussion is contained in S, you can denote the
complement of a set A relative to S (sometimes called the absolute complement) by A. In order words,
A is short for S − A when the “big set” S is understood.
__
A
(g) The empty set is the set with no elements. It is denoted ∅ or { }. (But “{∅}” is not the empty
set — if the empty set is like an empty bag, then “{∅}” is like a bag containing an empty bag.)
(h) If S and T are sets, their (Cartesian) product consists of all ordered pairs (s, t), where s ∈ S
and t ∈ T . “Ordered” means, for example, that (2, 8) and (8, 2) are different ordered pairs.
For example, if S = {a, b} and T = {1, 2, 3}, then
S × T = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3)}.
2
The sets which make up standard number systems have special symbols.
(a) Z is the set of integers (positive, negative, and 0).
(b) Q is the set of rational numbers, which can be represented as quotients of integers. (Thus, Q includes
3
Z, since (for instance) 3 = .) The use of the letter “Q” is apparently due to the Italian mathematician
1
Giuseppe Peano and stands for the Italian word for “quotient”.
(c) R is the set of real numbers, which are often represented as decimals (finite or infinite). R includes
Q (and so it includes Z).
(d) C is the set of complex
√ numbers: Numbers that can be written in the form a + bi, where a and b
are real numbers and i = −1. The old phrase “imaginary numbers” is not that good — there’s nothing
“imaginary” about C! — and so is less often used. C includes R — for instance, the real number 13 can be
written as 13 + 0 · i — so C also includes Q and Z. Using our subset notation, we could write Z ⊂ C, for
instance.
The convention is to use boldface capital letters (like “Z” or “Q”) or “blackboard bold” capital letters
(like Z, Q, R, C) for “number systems”. You may see other letters (like the quaternions H) used as well.
I’ll use blackboard bold, since it is easier to distinguish blackboard bold letters from ordinary capital letters
than boldface. It’s also the way you write these symbols on blackboards or in handwriting.
I’ll use these number systems informally based on your prior experience with them; actually constructing
them rigorously is pretty involved and belongs in other courses (though we can use matrices to construct
the complex numbers from the real numbers).
You can specify a finite set by listing its elements, placed between braces (“curly brackets”). Here’s a
set consisting of 5 elements:
19
π, −117, 32.83, , the pepperoni pizza in my refrigerator .
17
A set does not have duplicate elements, so don’t write things like “{1, 2, 2, 3}”. The order in which
you list the elements doesn’t matter: {a, b, c} is the same set as {c, a, b}. (It’s like a bag of stuff, as we
noted at the start.)
Sets can have other sets as elements:
{1, 2, {3, 4}}.
You can picture this set as a bag. Inside the bag are 1, 2, and another bag which contains 3 and 4. The
number of elements in this set is 3; in other words, in determining the number of elements in a big set,
you don’t peek inside any little sets inside the the big set.
You can sometimes specify infinite sets by listing elements:
{1, 2, 3, 4, . . .}.
Most people would assume that you mean the elements to continue 5, 6, 7, and so on. Of course, when
you do this you’re assuming that the “pattern” the elements follow is clear; the “. . .” means “continue in
the same way”. But there’s some ambiguity here; perhaps the elements of the set above are actually
{1, 2, 3, 4, 10, 20, 30, 40, 100, 200, 300, 400, . . .}.
If there’s any chance of confusion, it’s better to use set constructor notation to make the “pattern”
the elements follow explicit. Set constructor notation has the form
T = {x ∈ S | P (x)}.
The braces “{” and “}” indicate that we’re building a set, and its name is T . (You may give a set a
name to make it easier to talk about, but the name isn’t required.) The first part “x ∈ S” tells us that the
3
elements of T come from some “big set” S. And “P (x)” is some property that a typical element x must
satisfy to belong to T .
For example, this is the set of positive integers:
{x ∈ Z | x > 0}.
{x ∈ R | x > 0}.
You can see how the specification of the “big set” from which the elements come (x ∈ Z or x ∈ R) can
be important.
Sometimes it’s convenient to relax the rules for set constructor notation a bit. Here’s the set of even
integers:
{x ∈ Z | x = 2y for some y ∈ Z}.
It says that even integers are integers which are twice another integer. We can save writing (and a
variable) by writing the set this way:
{2y | y ∈ Z}.
This says even integers are numbers which are twice another integer. Since twice an integer is automat-
ically an integer, the word “numbers” must mean “integers”.
We’ll often use this kind of shortcut in writing sets. Here’s another example. It’s the set of all points
on the graph of y = x2 in the x-y-plane:
{(x, x2 ) | x ∈ R}.
Example. Consider the following set of integers:
S = {3n + 1 | n ∈ Z}.
(a) Is 46 an element of S?
(b) Is 17 an element of S?
(a) 46 is an element of S, since 46 = 3 · 15 + 1 and 15 ∈ Z.
(b) 17 is not an element of S. Suppose on the contrary that 17 ∈ S. Then 17 = 3n + 1 for some n ∈ Z.
Subtracting 1 from both sides, I get 16 = 3n for some integer n. This is impossible, since 16 is not “evenly
divisible” by 3.
2
You may have seen the notations R√ and R3 in a multivariable calculus course. R2 is the set of ordered
or ( 3, π) —the “x-y plane”. Likewise, R3 is the set of ordered triples
pairs of real numbers, like (−5, 11)
11
of real numbers, like (9, 0, −13) or , sin 1, 32 — 3-dimensional space. Using set constructor notation,
17
you can write these sets like this:
4
If we have an infinite but countable number of objects, we can continue to use natural numbers as
subscripts:
x1 , x2 , . . . , xn , . . . .
The “. . .” on the right indicates that the number of objects is infinite.
In fact, you can take “countable” to mean that the objects are either finite in number, or can be
arranged in a sequential list as above. If the objects can’t be arranged in a sequential list as above, the
objects are uncountable. Thus, if you have an uncountable number of objects, you can’t use natural
number subscripts to index them. For example, suppose the objects are the real numbers. You can’t arrange
the real numbers in a sequential list.
So how do we describe an uncountable collection of objects?
To handle this kind of situation is actually simple. We take an index set — say I. We don’t assume
anything about the size of I, or what its elements look like. We use the elements of I as subscripts for our
objects. So if we’re discussing a particular object indexed by I, we refer to “xi , where i ∈ I”. If we need
another object indexed by I, we say “xj , where j ∈ I”. If we need 4 such objects, we can use subscripts
with subscripts, this way:
xi1 , xi2 , xi3 , xi4 , where i1 , i2 , i3 , i4 ∈ I.
If we wanted 4 possibly different objects, we’d have to specify that i1 , i2 , i3 , and i4 are distinct (though
maybe in a given situation our collection of objects contains duplicates, so different i’s might still give the
same object).
In these cases, we’re not saying what i, j, i1 , i2 , i3 , or i4 are, beyond that they’re elements of I, and
that I is some set.
Arbitrary intersections and unions.
As examples of arbitrary collections of objects, we’ll discuss arbitrary unions and intersections of
sets. Begin with an arbitrary collection of sets {Si }i∈I . This means that each Si is a set, and there is one
such set for each element i in the index set I. The index set I might be finite, or if it is infinite, it might be
countable or uncountable. \ \
The intersection of the sets {Si }i∈I is denoted Si . By definition, an element x is in Si if and
i∈I i∈I
only if x is in Si for all i ∈ I. In other words,
\
x∈ Si if and only if x ∈ Si for all i ∈ I.
i∈I
\ n
X
The big intersection symbol is like a summation symbol (“ ”) that you’ve probably seen elsewhere.
i∈I i=1
For just two sets U and V , the definition would say x ∈ U ∩ V if and only if x ∈ U and x ∈ V . This
agrees with the definition of “intersection” we gave earlier. [
In similar fashion, the union of the sets {Si }i∈I is denoted Si . By definition, an element x is in
[ i∈I
Si if and only if x is in Si for some i ∈ I. (“For some” means “for at least one”.) In other words,
i∈I
[
x∈ Si if and only if x ∈ Si for some i ∈ I.
i∈I
We won’t have to deal with arbitrary intersections and unions that often, so don’t worry if this seems
a bit abstract. When it comes up, you’ll see that the notation isn’t that difficult to handle.
im f
You might have heard the term range used in connection with functions, but we’ll avoid it because it
was (unfortunately) used in two different ways. Sometimes, it meant what I’ve called the codomain, but it
was also used to refer to what is now called the image. The codomain is simply a set which contains all
possible outputs of the function (the image), but not every element of the codomain must be an output of the
function.
Remark. (a) When you write the definition of a function like “f (x) = x2 + 1”, you can use whatever
variable you wish for the input. It would be the same to write “f (t) = t2 + 1”. You could use a word for
the input to the function (as is often done in compute programming) and write
f (input) = input2 + 1.
You could even use words for the whole definition: “The function f is defined by taking an input,
squaring it, and adding 1, and returning the result as the output.” This is the way math was written before
people started using symbols — and you can see why people started using symbols!
1
In math, unlike in computer programming, it has been traditional to save writing and use single letters
to name variables, such as the variables used to define functions. There are also some loose conventions
about what letters you use for various purposes. For instance, x is often used as the input variable in a
function definition, and i, j, and k are often used as index variables in summations.
(b) The variables used to define a function are only “active” within the function definition. Once you
reach the end of the expression “f (x) = x2 + 1”, there is no variable named “x” hanging around — you can’t
ask at that point “Is x equal to 3?” In computer programming terms, you might say that x isn’t a global
variable; it’s in scope only within the definition “f (x) = x2 + 1”.
You might think about the last paragraph if you start getting confused when we discuss composites and
inverse functions below.
The older terms (which some people still use) are one-to-one instead of injective, onto instead of
surjective, and one-to-one correspondence for bijective.
Remark. You can restate the definition of injective this way: f is injective if a 6= b implies f (a) 6= f (b)
for a, b ∈ X. (This is the contrapositive of the original definition.) In this form, the definition says that
different inputs always give different outputs.
f f
If different inputs can give the same output, the function is not injective.
The definition of surjective means that everything in the codomain is an output of f — that is, im f = Y .
X f Y X f Y
im f
im f
im f = Y im f is smaller than Y
To say that f is bijective means that f “pairs up” the elements of X and the elements of Y — one
element of X paired with exactly one element of Y .
2
This picture shows a bijection f from the set {1, 2, 3} to the set {a, b, c}.
f
1 a
b
2
c
3
It is defined by
f (1) = c, f (2) = a, f (3) = b.
It is injective because two different inputs always produce two different outputs. It is surjective because
every element of the codomain {a, b, c} is an output of f .
Example. (a) f : R → R is defined by f (x) = ex . Prove that f is injective but not surjective.
(b) f : R → R is defined by
x + 1 if x ≤ 0
n
f (x) = .
x if x > 0
Prove that f is surjective but not injective.
(c) f : R → R is defined by f (x) = x3 + 13. Prove that f is bijective.
(a) Suppose f (a) = f (b). Then ea = eb , so ln ea = ln eb , and hence a = b. Thus, f is injective.
f is not surjective, because there is no x ∈ R such that f (x) = −5. This would imply ex = −5, but ex
is positive for all x.
(b) Let y ∈ R. If y ≤ 0, then y − 1 ≤ −1. Hence, f (y − 1) = (y − 1) + 1 = y. If y > 0, then f (y) = y. In
both cases, I’ve found an input to f which produces the output y. So f is surjective.
f is not injective: f (0) = 0 + 1 = 1, and f (1) = 1, so f (0) = f (1) but 0 6= 1.
The graph of f is shown below.
y
-1
In the case of functions R → R, you can interpret surjectivity this way: Every horizontal line hits the
graph at least once. You can interpret injectivity this way: No horizontal line hits the graph more than
once. Look at the graph and see how it shows that f is surjective, but not injective.
While this is visually appealing, this is a special case — do not remember these ideas as the definitions
of injective and surjective. For instance, they would not apply if you had a function R2 → R3 .
(c) Suppose f (a) = f (b). Then
a3 + 13 = b3 + 13
a3 = b 3
(a3 )1/3 = (b3 )1/3
a=b
3
Hence, f is injective.
Suppose y ∈ R. I need x ∈ R such that f (x) = y. I will work backwards to “guess” what x should be,
then verify my guess.
f (x) = y means x3 + 13 = y, so x3 = y − 13, and x = (y − 13)1/3 . That’s my “guess” for x; I still have
to show it works. (Why doesn’t what I did prove it? Because I worked backwards, so I have to check that
all the steps I took are reversible.) Here’s the check:
Example. f : R2 → R2 is defined by
f (x, y) = (x3 + 2, x + ey ).
(a3 + 2, a + eb ) = (c3 + 2, c + ed ).
a3 + 2 = c3 + 2
a3 = c3
a=c
a + eb = c + ed
eb = ed
ln eb = ln ed
b=d
x + ey = 0
2 + ey = 0
ey = −2
This is a contradiction, since ey > 0 for all y. So there is no input (x, y) such that f (x, y) = (10, 0), and
hence f is not surjective.
4
The thought process I used in choosing (10, 0) was this. I saw that the expression ey was restricted
in the values it can take: It’s always positive. So I tried to get a contradiction by forcing it to equal a
negative number. I could do this by forcing x in x + ey to be positive. I could force x > 0 by setting the
first component x3 + 2 to a value so that solving for x would produce a positive number — 10 happens to
work.
Definition. Let X, Y , and Z be sets, and let f : X → Y and g : Y → Z be functions. The composite of
f and g is the function g ◦ f : X → Z defined by
X f Y g
g(f(x))
x
f(x)
Note that “g ◦ f ” does not mean multiplication. I will often write “g(f (x))” for the composite, to ensure
that there’s no confusion. Also, “g ◦ f ” is read from right to left, so it means “do f first, then g”.
Example. (a) Suppose f : R → R is defined by f (x) = x2 and g : R → R is defined by g(x) = x + 2. Find
g(f (x)), f (g(x)), g(g(x)), and f (f (3)).
(b) Suppose f : R → R2 is given by f (x) = (x + 3, ex ) and g : R2 → R is given by g(s, t) = s − t. Find
g(f (x)) and f (g(s, t)).
(c) Suppose f : R2 → R2 is given by f (x, y) = (x + 3y, 2x − y) and g(s, t) = (st, s + t). Find g(f (x, y)) and
f (g(s, t)).
(a)
g(f (x)) = g(x2 ) = x2 + 2.
f (g(x)) = f (x + 2) = (x + 2)2 .
g(g(x)) = g(x + 2) = (x + 2) + 2 = x + 4.
f (f (3)) = f (32 ) = f (9) = 92 = 81.
Note that g(f (x)) 6= f (g(x)).
(b)
g(f (x)) = g(x + 3, ex ) = x + 3 − ex .
f (g(s, t)) = f (s − t) = ((s − t) + 3, es−t ).
(c)
g(f (x, y)) = g(x + 3y, 2x − y) = ((x + 3y)(2x − y), (x + 3y) + (2x − y)) = ((x + 3y)(2x − y), 3x + 2y).
5
Definition. Let A and B be sets, and let f : A → B and g : B → A be functions. f and g are inverses if
The inverse of f is denoted f −1 . (Note that “f −1 ” is not a “−1 power”, in the sense of a reciprocal.)
Using this notation, I can write the equations in the inverse definition as
The identity function on a set X is the function idX defined by idX (x) = x for all x ∈ X. We may
write “id” instead of “idX ” if it’s clear what set is intended.
Using this definition and the notation for the composite of functions, we can write the two equations
which say that f and g are inverses as
Example. Define f : R → R by f (x) = x3 + 7 and g : R → R by g(x) = (x − 7)1/3 . Show that f and g are
inverses.
Let x ∈ R.
f (g(x)) = f ((x − 7)1/3 ) = [(x − 7)1/3 ]3 + 7 = x − 7 + 7 = x.
g(f (x)) = g(x3 + 7) = [(x3 + 7) − 7]1/3 = (x3 )1/3 = x.
Since f (g(x)) = x and g(f (x)) = x for all x ∈ R, it follows that f and g are inverses.
Example. Let f : R → R be given by f (x) = 10 − x1/5 . Find f −1 and verify that it’s the inverse by
checking the inverse definition.
I guess f −1 by working backwards. Suppose f −1 (y) = x. Then
Not every function has an inverse — in fact, we’ll see below that having an inverse is equivalent to being
bijective.
6
Example. Define f : R → R by f (x) = x2 + 10. Prove that f does not have an inverse.
If you have some experience with proof writing, you may know that a negative statement (“does not
have an inverse”) is often proved by contradiction: You assume the negation of the statement to be proved
and try to get a contradiction. If the statement to be proved is a negative statement, its negation will be a
positive statement.
So suppose that f has an inverse f −1 . Then f −1 (f (x)) = x for all x. In particular, f −1 (f (2)) = 2 and
f (f (−2)) = −2. But f (2) = 14 and f (−2) = 14, so I get
−1
This is a contradiction, because f −1 is a function, so it can’t produce two different outputs (2 and −2)
for the same input (14). Hence, f does not have an inverse.
Note: Instead of 2 and −2, you could choose 5 and −5 or 17 and −17, and so on. Lots of pairs of
numbers would work. You could also get a contradiction using the equation f (f −1 (x)) = x: Take x = 6 and
see what happens!
Example. Let f : R2 → R2 be given by f (x, y) = (x1/3 − 4, −x + y). Find f −1 and verify that it’s the
inverse by checking the inverse definition.
I’ll guess f −1 by working backwards, then verify that my guess works. Suppose f −1 (a, b) = (x, y). Then
a = x1/3 − 4, b = −x + y.
The first equation gives x1/3 = a + 4, so x = (a + 4)3 . Plugging this into the second equation gives
Thus, my guess is
f −1 (a, b) = (x, y) = ((a + 4)3 , (a + 4)3 + b).
I check the inverse definition:
f (f −1 (a, b)) = f ((a + 4)3 , (a + 4)3 + b) = ([(a + 4)3 ]1/3 − 4, −(a + 4)3 + (a + 4)3 + b) =
(a + 4 − 4, b) = (a, b).
f −1 (f (x, y)) = f −1 (x1/3 −4, −x+y) = ([(x1/3 −4)+4]3, [(x1/3 −4)+4]3 +(−x+y)) = ((x1/3 )3 , (x1/3 )3 −x+y) =
(x, x − x + y) = (x, y).
The inverse definition checks, so f −1 (a, b) = (x, y) = ((a + 4)3 , (a + 4)3 + b).
Note: While the names of the input variables to a function are arbitrary, I used a and b as the input
variables to f −1 rather than x and y to avoid confusion — I was already using x and y as the input variables
for f .
The next result says that having an inverse is the same as being bijective.
Theorem. Let X and Y be sets, and let f : X → Y be a function. f is bijective if and only if it has an
inverse — that is, f −1 exists.
7
Proof. To prove a statement of the form A if and only if B, I must do two things: First, I assume that A
is true and prove that B is true; next, I assume that B is true and prove that A is true.
First, suppose that f −1 exists, so
Therefore, f is injective.
To show f is surjective, suppose y ∈ Y . I must find an element of X which f maps to y. This is easy,
since f −1 (y) ∈ X, and
f (f −1 (y)) = y.
Therefore, f is surjective. Hence, f is bijective.
Next, suppose f is bijective. I must show that f −1 exists. Now f −1 should be a function from Y to X,
so I need to start with y ∈ Y and define where f −1 maps it. Since f is bijective, it is surjective. Therefore,
f (x) = y for some x ∈ X. I’d like to define f −1 (y) = x.
There’s a possible problem. In general, it’s possible that f (x1 ) = y and f (x2 ) = y for x1 , x2 ∈ X. In
that case, how would I define f −1 (y)?
Fortunately, f is injective. So if f (x1 ) = y and f (x2 ) = y, then f (x1 ) = f (x2 ), and so x1 = x2 by
injectivity. In other words, there is one and only one x ∈ X such that f (x) = y, and it’s safe for me to define
f −1 (y) = x.
(Notice how I used both parts of the definition of bijective — surjectivity and injectivity — to define
f −1 .)
All I have to do is to check the inverse definition. First, if x ∈ X, then by definition f −1 (f (x)) is the
element of X which f takes to f (x) — but that element is x. Hence, f −1 (f (x)) = x.
Next, if y ∈ Y , then by definition f −1 (y) is the element x ∈ X such that f (x) = y. So
a + (b + c) = (a + b) + c.
3. Every every of R has an additive inverse. That is, if a ∈ R, there is an element −a ∈ R which
satisfies
a + (−a) = 0 and (−a) + a = 0.
4. Addition is commutative: If a, b ∈ R, then
a + b = b + a.
a · (b · c) = (a · b) · c.
a · (b + c) = a · b + a · c.
It’s common to drop the “·” in “a · b” and just write “ab”. I’ll do this except where the “·” is needed
for clarity.
As a convenience, we can define subtraction using additive inverses. If R is a ring and a, b ∈ R, then
a − b is defined to be a + (−b). That is, subtraction is defined as adding the additive inverse.
You might notice that we now have three of the usual four arithmetic operations: Addition, subtraction,
and multiplication. We don’t necessarily have a “division operation” in a ring; we’ll discuss this later.
If you’ve never seen axioms for a mathematical structure laid out like this, you might wonder: What
am I supposed to do? Do I memorize these? Actually, if you look at the axioms, they say things that are
“obvious” from your experience. For example, Axiom 4 says addition is commutative. So as an example for
real numbers,
117 + 33 = 33 + 117.
You can see that, as abstract as they look, these axioms are not that big a deal. But when you do
mathematics carefully, you have to be precise about what the rules are. You will not have much to do in
this course with writing proofs from these axioms, since that belongs in an abstract algebra course. A good
rule of thumb might be to try to understand by example what an axiom says. And if it seems “obvious” or
“familiar” based on your experience, don’t worry about it. Where you should pay special attention is when
things don’t work in the way you expect.
If you look at the axioms carefully, you might notice that some familiar properties of multiplication are
missing. We will single them out next.
1
Definition. A ring R is commutative if the multiplication is commutative. That is, for all a, b ∈ R,
ab = ba.
Note: The word “commutative” in the phrase “commutative ring” always refers to multiplication —
since addition is always assumed to be commutative, by Axiom 4.
Definition. A ring R is a ring with identity if there is an identity for multiplication. That is, there is an
element 1 ∈ R such that
1 · a = a and a · 1 = a for all a ∈ R.
Note: The word “identity” in the phrase “ring with identity” always refers to an identity for multipli-
cation — since there is always an identity for addition (called “0”), by Axiom 2.
A commutative ring which has an identity element is called a commutative ring with identity.
In a ring with identity, you usually also assume that 1 6= 0. (Nothing stated so far requires this, so you
have to take it as an axiom.) In fact, you can show that if 1 = 0 in a ring R, then R consists of 0 alone —
which means that it’s not a very interesting ring!
Each of these is a commutative ring with identity. In fact, all of them except Z are fields. I’ll discuss
fields below.
By the way, it’s conventional to use a capital letter with the vertical or diagonal stroke “doubled” (as
in Z or R) to stand for number systems. It is how you would write them by hand. If you’re typing them,
you usually use a special font; a common one is called Blackboard Bold.
You might wonder why I singled out the commutativity and identity axioms, and didn’t just make
them part of the definition of a ring. (Actually, many people add the identity axiom to the definition of
a ring automatically.) In fact, there are situations in mathematics where you deal with rings which aren’t
commutative, or (less often) lack an identity element. We’ll see, for instance, that matrix multiplication is
usually not commutative.
The idea is to write proofs using exactly the properties you need. In that way, the things that you prove
can be used in a wider variety of situations. Suppose I had included commutativity of multiplication in the
definition of a ring. Then if I proved something about rings, you would not know whether it applied to
noncommutative rings without carefully checking the proof to tell whether commutativity was used or not.
If you really need a ring to be commutative in order to prove something, it is better to state that assumption
explicitly, so everyone knows not to assume your result holds for noncommutative rings.
The next example (or collection of examples) of rings may not be familiar to you. These rings are the
integers mod n. For these rings, n will denote an integer. Actually, n can be any integer if I modify the
discussion a little, but to keep things simple, I’ll take n ≥ 2.
The integers mod n is the set
Zn = {0, 1, 2, . . . , n − 1}.
2
Zn becomes a commutative ring with identity under the operations of addition mod n and multipli-
cation mod n. I won’t prove this; I’ll just show you how to work with these operations, which is sufficient
for a linear algebra course. You’ll see a rigorous treatment of Zn in abstract algebra.
(a) To add x and y mod n, add them as integers to get x + y. Then divide x + y by n and take the
remainder — call it r. Then x + y = r.
(b) To multiply x and y mod n, multiply them as integers to get xy. Then divide xy by n and take the
remainder — call it r. Then xy = r.
Since modular arithmetic may be unfamiliar to you, let’s do an extended example. Suppose n = 6, so
the ring is Z6 .
4+5 = 9 (Add them as integers . . . )
= 3 (Divide 9 by 6 and take the remainder, which is 3)
Hence, 4 + 5 = 3 in Z6 .
You can picture arithmetic mod 6 this way:
0
5 1
4 2
3
You count around the circle clockwise, but when you get to where “6” would be, you’re back to 0. To
see how 4 + 5 works, start at 0. Count 4 numbers clockwise to get to 4, then from there, count 5 numbers
clockwise. You’ll find yourself at 3.
Here is multiplication:
Hence, 2 · 5 = 4 in Z6 .
You can see that as you do computations, you might in the middle get numbers outside {0, 1, 2, 3, 4, 5}.
But when you divide by 6 and take the remainder, you’ll always wind up with a number in {0, 1, 2, 3, 4, 5}.
Try it with a big number:
80 = 6 · 13 + 2 = 2.
Using our circle picture, if you start at 0 and do 80 steps clockwise around the circle, you’ll find yourself
at 2. (Maybe you don’t have the patience to actually do this!) When we divide by 6 then “discard” the
multiples of 6, that is like the fact that you return to 0 on the circle after 6 steps.
Notice that if you start with a number that is divisible by 6, you get a remainder of 0:
84 = 6 · 14 + 0 = 0.
We see that in doing arithmetic mod 6, multiples of 6 are equal to 0. And in general, in doing arithmetic
mod n, multiples of n are equal to 0.
Other arithmetic operations work as you’d expect. For example,
Hence, 34 = 3 in Z6 .
3
Negative numbers in Z6 are additive inverses. Thus, −2 = 4 in Z6 , because 4 + 2 = 0. To deal with
negative numbers in general, add a positive multiple of 6 to get a number in the set {0, 1, 2, 3, 4, 5}. For
example,
(−3) · 5 = −15 (Multiply them as integers . . . )
= −15 + 18 (Add 18, which is 3 · 6)
= 3
Hence, (−3) · 5 = 3 in Z6 .
The reason you can add 18 (or any multiple of 6) is that 18 divided by 6 leaves a remainder of 0. In
other words, “18 = 0” in Z6 , so adding 18 is like adding 0. In a similar way, you can always convert a
negative number mod n to a positive number in {0, 1, . . . n − 1} by adding multiples of n. For instance,
−14 = −14 + 18 = 4.
1 − 2 = 1 + 4 = 5.
We haven’t discussed division yet, but maybe the last example tells you how to do it. Just as subtraction
is defined as adding the additive inverse, division should be defined as multiplying by the multiplicative
inverse. Let’s give the definition.
Definition. Let R be a ring with identity, and let x ∈ R. The multiplicative inverse of x is an element
x−1 ∈ R which satisifies
x · x−1 = 1 and x−1 · x = 1.
1
If we were dealing with real numbers, then 3−1 = , for instance. But going back to the Z6 example,
3
we don’t have fractions in Z6 . So what is (say) 5−1 in Z6 ? By definition, 5−1 is the element (if there is one)
in Z6 which satisfies
5 · 5−1 = 1.
(I could say 5−1 · 5 = 1, but multiplication is commutative in Z6 , so the order doesn’t matter.)
We just check cases. Remember that if I get a product that is 6 or bigger, I have to reduce mod 6 by
dividing and taking the remainder.
5·0 =0
5·1 =5
5 · 2 = 10 = 4
5 · 3 = 15 = 3
5 · 4 = 20 = 2
5 · 5 = 25 = 1
I got 25 = 1 by dividing 25 by the modulus 6 — it goes in 4 times, with a remainder of 1.
Thus, according to the definition, 5−1 = 5. In other words, 5 is its own multiplicative inverse. This isn’t
unheard of: You know that in the real numbers, 1 is its own multiplicative inverse.
This also means that if you want to divide by 5 in Z6 , you should multiply by 5.
What about 4−1 in Z6 ? Unfortunately, if you take cases as I did with 5, you’ll see that for every number
n in Z6 , you do not have 4 · n = 1. Here’s a proof by contradiction which avoids taking cases. Suppose
4n = 1. Multiply both sides by 3:
4n = 1
3 · 4n = 3 · 1
12n = 3
0=3
4
I made the last step using the fact that 12n is a multiple of 6 (since 12 = 6 · 2), and multiples of 6 are
equal to 0 mod 6. Since “0 = 3” is a contradiction, 4 · n = 1 is impossible. So 4−1 is undefined in Z6 .
It happens to be true that in Z6 , the elements 0, 2, 3, and 4 do not have multiplicative inverses; 1 and
5 do.
And in Z10 , the elements 0, 2, 4, 5, 6, and 8 do not have multiplicative inverses; 1, 3, 7, and 9 do.
Do you see a pattern?
You probably don’t need much practice working with familiar number systems like the real numbers R,
so we’ll give some examples which involve arithmetic in Zn .
(g) Compute
25! = 1 · 2 · 3 · 4 · · · · · 25 in Z23 .
It’s understood for a Zn problem your final answer should be a number in {0, 1, . . . n − 1}. You can
simplify as you do each step, or simplify at the end (divide by n and take the remainder).
(a) 22 = 4 · 5 + 2 = 2.
Notice that 24 is a multiple of 4, so it’s equal to 0 in Z4 . You can also do this by dividing by 4 if you
do it carefully:
−21 = 4 · (−6) + 3 = 3.
(c) 5 + 11 = 16 = 12 + 4 = 4.
(d) 13 · 5 = 65 = 17 · 3 + 14 = 14.
(e) 11 − 14 = −3 = −3 + 20 = 17.
Notice that I added a multiple of 10 (since 20 = 10 · 2) to get a positive number.
(f) 43 = 64 = 11 · 5 + 9 = 9.
(g) 1 · 2 · 3 · 4 · · · · · 25 includes all the numbers from 1 to 25; in particular, it includes 23. So the product is
a multiple of the modulus 23, and
25! = 1 · 2 · 3 · 4 · · · · · 25 = 0.
5
(b) Suppose 6n = 1 for some n in Z10 . Then
6n = 1
5 · 6n = 5 · 1
30n = 5
0=5
The last step follows from the fact that 30n is a multiple of 10, so it equals 0 mod 10. Since “0 = 5” is
a contradiction, 6n = 1 is impossible, and 6 does not have a multiplicative inverse.
7 · 8 = 56 = 55 + 1 = 11 · 5 + 1 = 1.
8 · 1 = 8, 8 · 2 = 16 = 3, 8 · 3 = 24 = 11, 8 · 4 = 32 = 6, 8 · 5 = 40 = 1.
Alternatively, take multiples of 13 and add 1, stopping when you get a number divisible by 8:
13 · 1 + 1 = 14 Not divisible by 8
13 · 2 + 1 = 27 Not divisible by 8
13 · 3 + 1 = 40 Divisible by 8
6
40
Then = 5, so 8−1 = 5.
8
Even this approach is too tedious to use with large numbers. The systematic way to find inverses is to
use the Extended Euclidean Algorithm.
We saw that in a commutative ring with identity, an element x might not have multiplicative inverse
x−1 . That in turn would prevent you from “dividing” by x. From the point of view of linear algebra, this is
inconvenient. Hence, we single out rings which are “nice” in that every nonzero element has a multiplicative
inverse.
Definition. A field F is a commutative ring with identity in which 1 6= 0 and every nonzero element has a
multiplicative inverse.
1
By convention, you don’t write “ ” instead of “x−1 ” unless the ring happens to be a ring with “real”
x
fractions (like Q, R, or C). You don’t write fractions in (say) Z7 .
If an element x has a multiplicative inverse, you can divide by x by multiplying by x−1 . Thus, in a field,
you can divide by any nonzero element. (You’ll learn in abstract algebra why it doesn’t make sense to divide
by 0.)
The rationals Q, the reals R, and the complex numbers C are fields. Many of the examples will use
these number systems.
The ring of integers Z is not a field. For example, 2 is a nonzero integer, but it does not have a
1
multiplicative inverse which is an integer. ( is not an integer — it’s a rational number.)
2
Q, R, and C are all infinite fields — that is, they all have infinitely many elements. But (for example)
Z5 is a field.
For applications, it’s important to consider finite fields like Z5 . Before I give some examples, I need
some definitions.
Definition. Let R be a commutative ring with identity. The characteristic of R is the smallest positive
integer n such that n · 1 = 0.
Notation: char R = n.
If there is no positive integer n such that n · 1 = 0, then char R = 0.
In fact, if char R = n, then n · x = 0 for all x ∈ R.
Z, Q, R, and C are all rings of characteristic 0. On the other hand, char Zn = n.
Definition. An integer n > 1 is prime if its only positive divisors are 1 and n.
The first few prime numbers are
2, 3, , 5, 7, 11, . . . .
An integer n > 1 which is not prime is composite. The first few composite numbers are
4, 6, 8, 9, . . . .
7
Since the characteristic of Zn is n, the first theorem implies the following result:
Corollary. Zn is a field if and only if n is prime.
The Corollary tells us that Z2 , Z13 , and Z61 are fields, since 2, 3, and 61 are prime.
On the other hand, Z6 is not a field, since 6 isn’t prime (because 6 = 2 · 3). In fact, we saw it directly
when we showed that 4 does not have a multiplicative inverse in Z6 . Note that Z6 is a commutative ring
with identity.
For simplicity, the fields of prime characteristic that I use in this course will almost always be finite.
But what would an infinite field of prime characteristic look like?
As an example, start with Z2 = {0, 1}. Form the field of rational functions Z2 (x). Thus, elements
p(x)
of Z2 (x) have the form where p(x) and q(x) are polynomials with coefficients in Z2 . Here are some
q(x)
examples of elements of Z2 (x):
1 x2 + x + 1
, , 1, x7 + x3 + 1.
x x100 + 1
You can find multiplicative inverses of nonzero elements by taking reciprocals; for instance,
x2 + x + 1 x100 + 1
−1
= .
x100 + 1 x2 + x + 1
I won’t go through and check all the axioms, but in fact, Z2 (x) is a field. Moreover, since 2 · 1 = 0 in
Z2 (x), it’s a field of characteristic 2. It has an infinite number of elements; for example, it contains
1, x, x2 , x3 , ....
What about fields of characteristic p other than Z2 , Z3 , and so on? As noted above, these are called
Galois fields. For instance, there is a Galois field with 53 = 125 elements. To keep the computations simple,
we will rarely use them in this course. But here’s an example of a Galois field with 22 = 4 elements, so you
can see what it looks like.
GF (4) is the Galois field with 4 elements, and here are its addition and multiplication tables:
+ 0 1 a b · 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a
Notice that
1 + 1 = 0, a + a = 0, b + b = 0.
You can check by examining the multiplication table that multiplication is commutative, that 1 is the
multiplicative identity, and that the nonzero elements (1, a, and b) all have multiplicative inverses. For
instance, a−1 = b, because a · b = 1.
Since we’ve already seen a lot of weird things with these new number systems, we might as well see
another one.
Example. Find the roots of x2 + 5x + 6 in Z10 .
Make a table:
x 0 1 2 3 4 5 6 7 8 9
2
x + 5x + 6 6 2 0 0 2 6 2 0 0 2
8
For instance, plugging x = 4 into x2 + 5x + 6 gives
42 + 5 · 4 + 6 = 42 = 40 + 2 = 2.
0
1 −3 0 5 5 h 17
3 1i 0 0
0 −17 9 −5 −4 0
5 2 0 0
−1 −2 4 −3 3
−3.14
In this case, the numbers are elements of Q (or R). In general, the entries will be elements of some
commutative ring or field.
In this section, I’ll explain operations with matrices by example. I’ll discuss and prove some of the
properties of these operations later on.
Dimensions of matrices. An m × n matrix is a matrix with m rows and n columns. Sometimes this is
expressed by saying that the dimensions of the matrix are m × n.
0 0
1 2 3 0 0
[π]
4 5 6 0 0
0 0
a 2 × 3 matrix a 4×2 matrix a 1 × 1 matrix
A 1 × n matrix is called an n-dimensional row vector. For example, here’s a 3-dimensional row
vector: √
[4 17 − 5 42 ]
Likewise, an n × 1 matrix is called an n-dimensional column vector. Here’s a 3-dimensional column
vector:
0
17 − e2
√
3
617
An n × n matrix is called a square matrix. For example, here is a 2 × 2 square matrix:
1 x
z2 32
Notice that by convention the number of rows is listed first and the number of columns second. This
convention also applies to referring to specific elements of matrices. Consider the following matrix:
√
11 −0.4 33 2
a+b x2 sin y j−k
−8 0 13 114.71
The (2, 4)th element is the element in row 2 and column 4, which is j − k. (Note that the first row is
row 1, not row 0, and similarly for columns.) The (3, 3)th element is the element in row 3 and column 3,
which is 13.
Equality of matrices. Two matrices are equal if they have the same dimensions and the corresponding
entries are equal. For example, suppose
7 0 3 a b 3
= .
5 −4 10 5 −4 c
1
Then if I match corresponding elements, I see that a = 7, b = 0, and c = 10.
Definition. If R is a commutative ring, then M (n, R) is the set of n × n matrices with entries in R.
Note: Some people use the notation Mn (R).
For example, M (2, R) is the set of 2 × 2 matrices with real entries. M (3, Z5 ) is the set of 3 × 3 matrices
with entries in Z5 .
Adding and subtracting matrices. For these examples, I’ll assume that the matrices have entries in R.
You can add (or subtract) matrices by adding (or subtracting) corresponding entries.
1 −1 0 7 1 3 1 + 7 −1 + 1 0+3 8 0 3
+ = = .
2 −4 6 0 3 −6 2 + 0 −4 + 3 6 + (−6) 2 −1 0
(A + B) + C = A + (B + C).
This means that if you are adding several matrices, you can group them any way you wish:
1 −2 0 2 4 3 1 0 4 3 5 3
+ + = + = .
5 −11 9 8 −7 6 14 −3 −7 6 7 3
1 −2 0 2 4 3 1 −2 4 5 5 3
+ + = + = .
5 −11 9 8 −7 6 5 −11 2 14 7 3
Here’s an example of subtraction:
1 −3 6 3 −5 −6
2 0 − 2 π = 0 −π .
√ √
4 2 0 0 4 2
A + B = B + A.
2
Note that in the second example, there were some negative numbers in the middle of the computation,
but the final answer was expressed entirely in terms of elements of Z5 = {0, 1, 2, 3, 4}.
Definition. A zero matrix 0 is a matrix all of whose entries are 0.
0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0
There is an m × n zero matrix for every pair of positive dimensions m and n.
If you add the m × n zero matrix to another m × n matrix A, you get A:
31 97 0 0 31 97
+ = .
24 −53 0 0 24 −53
In symbols, if 0 is a zero matrix and A is a matrix of the same size, then
A + 0 = A and 0 +A = A.
A zero matrix is said to be an identity element for matrix addition.
Note: At some point, I may just write “0” instead of “0” (with the boldface) for a zero matrix, and rely
on context to tell it apart from the number 0.
Multiplying matrices by numbers. You can multiply a matrix by a number by multiplying each entry
by the number. Here is an example with real numbers:
3 −3 7 · 3 7 · (−3) 21 −21
7 · 4 −1 = 7 · 4 7 · (−1) = 28 −7 .
0 2 7·0 7·2 0 14
Things work in the same way over Zn , but all the arithmetic is done in Zn . Here is an example over Z5 :
2 1 3·2 3·1 1 3
3· = = .
4 1 3·4 3·1 2 3
Notice that, as usual with Z5 , I simplified my final answer so that all the entries of the matrix were in
the set {0, 1, 2, 3, 4}.
Unlike the operations I’ve discussed so far, matrix multiplication (multiplying two matrices) does not
work the way you might expect: You don’t just multiply corresponding elements of the matrices, the way
you add corresponding elements to add matrices.
To explain matrix multiplication, I’ll remind you first of how you take the dot product of two vectors;
you probably saw this in a multivariable calculus course. (If you’re seeing this for the first time, don’t worry
— it’s easy!) Here’s an example of a dot product of two 3-dimensional vectors of real numbers:
2
[ −3 7 9 ] 0 = (−3) · 2 + 7 · 0 + 9 · 10 = −6 + 0 + 90 = 84.
10
10
-3 7 9
3
Note that the vectors must be the same size, and that the product is a number. This is actually an
example of matrix multiplication, and we’ll see that the result should technically be a 1 × 1 matrix. But for
dot products, we will write
84 instead of [ 84 ] .
If you’ve seen dot products in multivariable calculus, you might have seen this written this way:
But for what I’ll do next, I want to distinguish between the row (first) vector and the column (second)
vector.
Multiplying matrices. To compute the product AB of two matrices, take the dot products of the rows
of A with the columns of B. In this example, assume all the matrices have real entries.
1 6
2 1 4 (2 + 5 + 8 = 15) ·
5 1 =
−1 0 3 · ·
2 −2
1 6
2 1 4 15 (12 + 1 − 8 = 5)
5 1 =
−1 0 3 · ·
2 −2
1 6
2 1 4 15 5
5 1 =
−1 0 3 (−1 + 0 + 6 = 5) ·
2 −2
1 6
2 1 4 15 5
5 1 =
−1 0 3 5 (−6 + 0 − 6 = −12)
2 −2
1 6
2 1 4 15 5
5 1 = .
−1 0 3 5 −12
2 −2
This schematic picture shows how the rows of the first matrix are “dot-producted” with the columns of
the second matrix to produce the 4 elements of the product:
You can see that you take the dot products of the rows of the first matrix with the columns of the second
matrix.
4
In order for the multiplication to work, the matrices must have compatible dimensions: The number of
columns in A should equal the number of rows of B. Thus, if A is an m × n matrix and B is an n × p matrix,
AB will be an m × p matrix. For example, this won’t work:
−5 −5
−2 1 0 8
2 13 (Won’t work!)
1 11 13 9
1 7
Do you see why? The rows of the first matrix have 4 entries, but the columns of the second matrix have
3 entries. You can’t take the dot products, because the entries won’t match up.
Here are two more examples, again using matrices with real entries:
2 0
[ −2 3 0 ] 9 −4 = [ 23 −12 ] .
1 5
4 −2 1 1 4 10
= .
1 1 0 −3 1 −2
Here is an example with matrices in M (2, Z3 ). All the arithmetic is done in Z3 .
2 1 1 2 4 5 1 2
= = .
0 2 2 1 4 2 1 2
Notice that I simplify the final result so that all the entries are in Z3 = {0, 1, 2}.
If you multiply a matrix by a zero matrix, you get a zero matrix:
0 0 9 −3 43 0 0 0
= .
0 0 5 0 −1.2 0 0 0
In symbols, if 0 is a zero matrix and A is a matrix compatible with it for multiplication, then
A·0= 0 and 0 · A = 0.
Matrix multiplication takes a little practice, but it isn’t hard. The big principle to take away is that you
take the dot products of the rows of the first matrix with the columns of the second matrix. This picture of
matrix multiplication will be very important for our work with matrices.
I should say something about a point I mentioned earlier: Why not multiply matrices like this?
4 −3 2 6 4 · 2 (−3) · 6
gives ?
5 2 0 4 5·0 2·4
You could define a matrix multiplication this way, but unfortunately, it would not be useful for applica-
tions (such as solving systems of linear equations). Or when we study the relationship between matrices and
linear transformations, we’ll see that the matrix multiplication we defined using dot products corresponds
to the composite of transformations. So even though the “corresponding elements” definition seems simpler,
it doesn’t match up well with the way matrices are used.
Here’s a preview of how matrices are related to systems of linear equations.
Example. Write the system of equations which correspond to the matrix equation
a
2 3 0 13
b = .
−1 5 17 0
c
5
Multiply out the left side:
2a + 3b 13
= .
−a + 5b + 17c 0
Two matrices are equal if their corresponding entries are equal, so equate corresponding entries:
2a + 3b = 13
−a + 5b + 17c = 0
Example. Write the following system of linear equations as a matrix multiplication equation:
x + 4y − 6z = 3
−2x + 10z = 3
x + y + z = 5
Take the 9 coefficients of the variables x, y, and z on the left side and make them into a 3×3 matrix. Put
the variables into a 3-dimensional column vector. Make the 3 numbers on the right side into a 3-dimensional
column vectors. Here’s what we get:
1 4 −6 x 3
−2 0 10 y = 3 .
1 1 1 z 5
Work out the multiplication on the left side of this equation and you’ll see how this represents the
original system.
This is really important! It allows us to represent a system of equations as a matrix equation. Eventually,
we’ll figure out how to solve the original system by working with the matrices.
Identity matrices. There are special matrices which serve as identities for multiplication: The n × n
identity matrix is the square matrices with 1’s down the main diagonal — the diagonal running from
northwest to southeast — and 0’s everywhere else. For example, the 3 × 3 identity matrix is
1 0 0
I = 0 1 0.
0 0 1
If I is the n × n identity and A is a matrix which is compatible for multiplication with A, then
AI = A and IA = A.
For example,
3 −7 10 3 −7 10
1 0 0
1 1
π −1.723 0 1 0 = π −1.723 .
2 0 0 1 2
0 19 712 0 19 712
Matrix multiplication obeys some of the algebraic laws you’re familiar with. For example, matrix multi-
plication is associative: If A, B, and C are matrices and their dimensions are compatible for multiplication,
then
(AB)C = A(BC).
However, matrix multiplication is not commutative in general. That is, AB 6= BA for all matrices A,
B.
One trivial way to get a counterexample is to let A be 3 × 5 and let B be 5 × 3. Then AB is 3 × 3 while
BA is 5 × 5. Since AB and BA have different dimensions, they can’t be equal.
6
However, it’s easy to come up with counterexamples even when AB and BA have the same dimensions.
For example, consider the following matrices in M (2, R):
1 −1 1 0
A= and B = .
2 3 4 1
Many of the properties of matrix arithmetic are things you’d expect — for example, that matrix addition
is commutative, or that matrix multiplication is associative. You should pay particular attention when things
don’t work the way you’d expect, and this is such a case. It is very significant that matrix multiplication is
not always commutative.
Transposes. If A is a matrix, the transpose AT of A is obtained by swapping the rows and columns of A.
For example,
T 1 4
1 2 3
= 2 5.
4 5 6
3 6
Notice that the transpose of an m × n matrix is an n × m matrix.
Example. Consider the following matrices with real entries:
0 1
1 0 1 2 c c 2 −1
A= , B= , C = .
2 1 −1 0 1 1 0 0
1 1
(a) Compute CB + AT .
(b) Compute AC − 2B.
(a)
0 1 1 1 1 2 2 3
T
2 −1 c c 1 0 1 2 2c − 1 2c − 1 0 1 2c − 1 2c
CB+AT = + = + = .
0 0 1 1 2 1 −1 0 0 0 1 −1 1 −1
1 1 c+1 c+1 2 0 c+3 c+1
(b)
0 1
1 0 1 2 2 −1 c c 2 3 2c 2c 2 − 2c 3 − 2c
AC − 2B = −2 = − = .
2 1 −1 0 0 0 1 1 2 1 2 2 0 −1
1 1
The inverse of a matrix. The inverse of an n × n matrix A is a matrix A−1 which satisfies
There is no such thing as matrix division in general, because some matrices do not have inverses. But
if A has an inverse, you can simulate division by multiplying by A−1 . This is often useful in solving matrix
equations.
An n × n matrix which does not have an inverse is called singular.
7
We’ll discuss matrix inverses and how you find them in detail later. For now, here’s a formula that we’ll
use frequently.
Proposition. Consider the following matrix with entries in a commutative ring with identity R.
a b
A= .
c d
Note: The number ad − bc is called the determinant of the matrix. We’ll discuss determinants later.
Remember that not every element of a commutative ring with identity has an inverse. For example,
4−1 is undefined in Z6 . If we’re dealing with real numbers, then ad − bc has a multiplicative inverse if and
only if it’s nonzero.
Proof. To show that the formula gives the inverse of A, I have to check that AA−1 = I and A−1 A = I:
a b d −b −1 ad − bc 0 1 0
AA =−1
· (ad − bc)−1
= (ad − bc) = ,
c d −c a 0 ad − bc 0 1
d −b a b −1 ad − bc 0 1 0
A A = (ad − bc)
−1 −1
= (ad − bc) = .
−c a c d 0 ad − bc 0 1
This proves that the formula gives the inverse of a 2 × 2 matrix.
Here’s an example of this formula for a matrix with real entries. Notice that 3 · (−1) − 5 · (−2) = 7. So
−1
3 5 1 −1 −5
= · .
−2 −1 7 2 3
(If I have a fraction outside a matrix, I may choose not to multiply it into the matrix to make the result
look nicer. Generally, if there is an integer outside a matrix, I will multiply it in.)
Recall that a matrix which does not have an inverse is called singular. Suppose we have a real matrix
a b
A= .
c d
The formula above will produce an inverse if ad−bc is invertible, which for real numbers means ad−bc 6=
0. So the matrix is singular if ad − bc = 0.
Example. For what values of x is the following real matrix singular?
x−1 3
4 x−5
ad − bc = (x − 1)(x − 5) − (3)(4) = 0.
Solve for x:
(x2 − 6x + 5) − 12 = 0
x2 − 6x − 7 = 0
(x − 7)(x + 1) = 0
8
This gives x = 7 or x = −1. The matrix is singular for x = 7 and for x = −1.
However, its determinant is 5 · 1 − 1 · 3 = 2, which is nonzero. The point is that 2 is not invertible in
Z6 , even though it’s nonzero.
(You should multiply any numbers outside the matrix into the matrix, and simplify all the numbers in
the final answer in Z5 .)
That is, when I multiply the inverse and the original matrix, I get the identity matrix. Check for yourself
that it also works if I multiply them in the opposite order.
We’ll discuss solving systems of linear equations later, but here’s an example of this which puts a lot of
the ideas we’ve discussed together.
Example. (Solving a system of equations) Solve the following system over R for x and y using the
inverse of a matrix.
x + 3y = 7
2x − 2y = −2
9
Multiply both sides by the inverse matrix:
1 −2 −3 1 3 x 1 −2 −3 7
− =− .
8 −2 1 2 −2 y 8 −2 1 −2
On the left, the square matrix and its inverse cancel, since they multiply to I. (Do you see how that is
like “dividing by the matrix”?) On the right,
1 −2 −3 7 1
− = .
8 −2 1 −2 2
Therefore,
x 1
= .
y 2
The solution is x = 1, y = 2.
This definition says that if two matrices have the same dimensions, you can add or subtract them by
adding or subtracting corresponding entries.
Definition. The m × n zero matrix 0 is the m × n matrix whose (i, j)th entry is given by
Proposition. Let A, B, and C be m × n matrices, and let 0 denote the m × n zero matrix. Then:
(a) (Associativity of Addition)
(A + B) + C = A + (B + C).
1
(b) (Commutativity of Addition)
A + B = B + A.
(Do you understand what this says? ((A + B) + C)ij is the (i, j)th entry of (A + B) + C, while
(A + (B + C))ij is the (i, j)th entry of A + (B + C).)
Since this is the first proof of this kind that I’ve done, I’ll show the justification for each step.
(A + B)ij = (B + A)ij .
(c) To prove that A + 0 = A, I have to show that their corresponding entries are equal:
(A + 0)ij = Aij .
2
By definition of matrix addition and the zero matrix,
0 + A = A + 0 = A.
(It’s considered ugly to write a number on the right side of a matrix if you want to multiply. For the
record, I’ll define Ak to be the same as kA.)
This definition says that to multiply a matrix by a number, multiply each entry by the number.
Definition. If A is a matrix, then −A is the matrix having the same dimensions as A, and whose entries
are given by
(−A)ij = −Aij .
Proposition. Let A and B be matrices with the same dimensions, and let k be a number. Then:
(a) k(A + B) = kA + kB and k(A − B) = kA − kB.
(b) 0 · A = 0.
(c) 1 · A = A.
(d) (−1) · A = −A.
(e) A − B = A + (−B).
Note that in (b), the 0 on the left is the number 0, while the 0 on the right is the zero matrix.
Proof. I’ll prove (a) and (c) by way of example and leave the proofs of the other parts to you.
First, I want to show that k(A + B) = kA + kB. I have to show that corresponding entries are equal,
which means
(k(A + B))ij = (kA + kB)ij .
I apply the definitions of matrix addition and multiplication of a matrix by a number:
3
(kA − kB)ij = (kA)ij − (kB)ij = kAij − kBij .
Therefore, (k(A − B))ij = (kA − kB)ij , so k(A − B) = kA − kB.
For (c), I want to show that 1 · A = A. As usual, this means I must show that they have the same
(i, j)th entries:
(1 · A)ij = Aij .
I use the definition of multiplying a matrix by a number:
j
.
.. .. .. · · · B1j ···
. . ··· B2j ···
i Ai1
Ai2 · · · Ain ..
.. .. .. .
. . . ··· Bnj ···
Corresponding elements are multiplied, and then the products are summed:
Look at the pattern of the subscripts in this sum. You can see that the “inner” matching subscripts are
going from 1 to n, while the “outer” “i” and “j” don’t change. Hence, I can write the (i, j)th entry of the
product in summation form as
Xn
Aik Bkj .
k=1
That is
n
X
(AB)ij = Aik Bkj .
k=1
It’s often useful to have a symbol which you can use to compare two quantities i and j — specifically,
a symbol which equals 1 when i = j and equals 0 when i 6= j.
Definition. The Kronecker delta is defined by
1 if i = j
δij = .
0 if i =
6 j
For example,
δ12 = 0, δ17,17 = 1, δ84 = 0.
n
X
Lemma. δij aj = ai .
j=1
4
Proof. To see what’s happening, write out the sum:
n
X
δij aj = δi1 a1 + δi2 a2 + δi3 a3 + · · · + δin an .
j=1
By definition, each δ with unequal subscripts is 0. The only δ that is not 0 is the one with equal
subscripts. Since i is fixed, the δ that is not 0 is δii , which equals 1. Thus,
Definition. The n × n identity matrix In is the n × n matrix whose (i, j)th entry is given by
You can see I “cycled” through the inner (“j”) index from 0 to 3 first, while holding i = 1. Then I
changed i to 2, and cycled through the j index again. To interchange the order of summation means that I
will get the same sum if I cycle through i first, then j. That is,
2 X
X 3 3 X
X 2
(ai + bj ) = (ai + bj ).
i=1 j=0 j=0 i=1
3 X
X 2
Here is how (ai + bj ) looks if I write out the terms:
j=0 i=1
(a1 + b0 ) + (a2 + b0 )+
(a1 + b1 ) + (a2 + b1 )+
(a1 + b2 ) + (a2 + b2 )+
(a1 + b3 ) + (a2 + b3 )
You can see that I get exactly the same terms, just in a different order. Therefore, the sums are the
same.
In general, if aij is some expression involving i and j, then
m X
X n n X
X m
aij = aij .
i=1 j=1 j=1 i=1
5
(a) (Associativity of Matrix Multiplication) If A, B, and C are matrices which are compatible for
multiplication, then
(AB)C = A(BC).
(b) (Distributivity of Multiplication over Addition) If A, B, C, D, E, and F are matrices
compatible for addition and multiplication, then
(c) If j and k are numbers and A and B are matrices which are compatible for multiplication, then
AIn = A and Im A = A.
The “compatible for addition” and “compatible for multiplication” assumptions mean that the matrices
should have dimensions which make the operations in the equations legal — but otherwise, there are no
restrictions on what the dimensions can be.
Proof. I’ll prove (a) and part of (d) by way of example, and leave the proofs of the other parts to you.
Before starting, I should say that this proof is rather technical, but try to follow along as best you can.
I’ll use i, j, k, and l as subscripts.
Suppose that A is an m × n matrix, B is an n × p matrix, and C is a p × q matrix. I want to prove that
(AB)C = A(BC). I have to show that corresponding entries are equal, i.e.
((AB)C)il = (A(BC))il .
p
n n
!
X X X
(A(BC))il = Aij (BC)jl = Aij Bjk Ckl .
j=1 j=1 k=1
If you stare at those two terrible double sums for a while, you can see that they involve the same A, B,
and C terms, and they involve the same summations — but in different orders. I’m allowed to convert one
into the other by interchanging the order of summation, and using the distributive law:
p p X p p
n n n X n
!
X X X X X X
Aij Bjk Ckl = (Aij Bjk Ckl ) = (Aij Bjk Ckl ) = Aij Bjk Ckl .
k=1 j=1 k=1 j=1 j=1 k=1 j=1 k=1
6
Using the lemma I proved on the Kronecker delta, I get
m
X
δik Akj = Aij .
k=1
(b) (A + B)T = AT + B T .
(c) (kA)T = kAT .
Proof. I’ll prove (b) by way of example and leave the proofs of the other parts for you.
I want to show that (A + B)T = AT + B T . I have to show the corresponding entries are equal:
Now
(A + B)T
ij
= (A + B)ji = Aji + Bji .
Proposition. Suppose A and B are matrices which are compatible for multiplication. Then
(AB)T = B T AT .
Let (AT )ij denote the (i, j)th entry of AT , and likewise for B and AB. Then
n
X n
X n
X
[(AB)T ]ji = (AB)ij = Aik Bkj = (AT )ki (B T )jk = (B T )jk (AT )ki .
k=1 k=1 k=1
The product on the right is the (j, i)th entry of B T AT , while [(AB)T ]ji is the (j, i)th entry of (AB)T .
Therefore, (AB)T = B T AT , since their corresponding entries are equal.
Definition.
(a) A matrix X is symmetric if X T = X.
(b) A matrix X is skew symmetric if X T = −X.
Both definitions imply that X is a square matrix.
Using the definition of transpose, I can express these definitions in terms of elements.
7
X is symmetric if
Xij = Xji for all i, j.
X is skew symmetric if
Xij = −Xji for all i, j.
Visually, a symmetric matrix is symmetric across its main diagonal (the diagonal running from north-
west to southeast). For example, this real matrix is symmetric:
0 √2 −9
2 3 4 .
−9 4 5
Entries which are symmetrically located across the main diagonals are negatives of one another. The
entries on the main diagonal must be 0, since they must be equal to their negatives.
The next result is pretty easy, but it illustrates how you can use the definitions of symmetry and skew
symmetry in writing proofs. In these proofs, in contrast to earlier proofs, I don’t need to write out the
entries of the matrices, since I can use properties I’ve already proved.
Proposition.
(a) The sum of symmetric matrices is symmetric.
(b) The sum of skew symmetric matrices is skew symmetric.
Proof. (a) Let A and B be symmetric. I must show that A + B is symmetric. Now
(A + B)T = AT + B T = A + B.
The first equality follows from a property I proved for transposes. The second equality follows from the
fact that A is symmetric (so AT = A) and B is symmetric (so B T = B).
Since (A + B)T = A + B, it follows that A + B is symmetric.
(b) Let A and B be skew symmetric, so AT = −A and B T = −B. I must show that A+B is skew symmetric.
Now
(A + B)T = AT + B T = −A + (−B) = −(A + B).
Therefore, A + B is skew symmetric.
Warning: Row reduction uses row operations. There are similar operations for columns which can be used
in other situations (like computing determinants), but not here.
There are three kinds of row operations. (Actually, there is some redundancy here — you can get away
with two of them.) In the examples below, assume that we’re using matrices with real entries (but see the
notes under (b)).
(b) You may multiply (or divide) a row by a number that has a multiplicative inverse.
If your number system is a field, a number has a multiplicative inverse if and only if it’s nonzero. This
is the case for most of our work in linear algebra.
If your number system is more general (e.g. a commutative ring with identity), it requires more
care to ensure that a number has a multiplicative inverse.
Notice in “r2 → r2 + 2r1 ” row 2 is the row that changes; row 1 is unchanged. You figure the “2r1 ” on
scratch paper, but you don’t actually change row 1.
You can do the arithmetic for these operations in your head, but use scratch paper if you need to. Here’s
the arithmetic for the last operation:
r2 = −2 4 2
2r1 = 2 −2 10
r2 + 2r1 = 0 2 12
Since the “multiple” can be negative, you may also subtract a multiple of a row from another row.
1
For example, I’ll subtract 4 times row 1 from row 2. Notation: r2 → r2 − 4r1 .
1 2 3 1 2 3
4 5 6 → 0 −3 −6
7 8 9 7 8 9
Notice that row 1 was not affected by this operation. Likewise, if you do r17 → r17 − 56r31 , row 17
changes and row 31 does not.
Operation (c) is probably the one that you will use the most.
You may wonder: Why are these operations allowed, but not others? We’ll see that these row operations
imitate operations we can perform when we’re solving a system of linear equations.
Example. In each case, tell whether the operation is a valid single row operation on a matrix with real
numbers. If it is, say what it does (in words).
(a) r5 ↔ r3 .
(b) r6 → r6 − 7.
(c) r3 → r3 + πr17 .
(e) r3 → r3 + r4 and r4 → r4 + r3 .
(b) This isn’t a valid row operation. You can’t add or subtract a number from the elements in a row.
(c) This adds π times row 17 to row 3 (and replaces row 3 with the result). Row 17 is not changed.
(d) This isn’t a valid row operation, though you could accomplish it using two row operations: First, multiply
row 6 by 5; next, add 11 times row 2 to the new row 6.
(e) This is not a valid row operation. It’s actually two row operations, not one. The only row operation that
changes two rows at once is swapping two rows.
Matrices can be used to represent systems of linear equations. Row operations are intended to mimic
the algebraic operations you use to solve a system. Row-reduced echelon form corresponds to the “solved
form” of a system.
A matrix is in row reduced echelon form if the following conditions are satisfied:
(a) The first nonzero element in each row (if any) is a “1” (a leading entry).
(b) Each leading entry is the only nonzero element in its column.
(c) All the all-zero rows (if any) are at the bottom of the matrix.
(d) The leading entries form a “stairstep pattern” from northwest to southeast:
0 1 6 0 0 2 ...
0 0 0 1 0 −1 . . .
0 0 0 0 1 4 ...
0 0 0 0 0 0 ...
.. .. .. .. .. ..
. . . . . .
In this matrix, the leading entries are in positions (1, 2), (2, 4), (3, 5), . . . .
2
Here are some matrices in row reduced echelon form. This is the 3 × 3 identity matrix:
1 0 0
0 1 0
0 0 1
The leading entries are in the (1, 1), (2, 2), and (3, 3) positions.
This matrix is in row reduced echelon form; its leading entries are in the (1, 1) and (2, 3) positions.
1 5 0 2
0 0 1 3
0 0 0 0
A leading entry must be the only nonzero number in its column. In this case, the “5” in the (1, 2)
position does not violate the definition, because it is not in the same column as a leading entry. Likewise for
the “2” and “3” in the fourth column.
This matrix has more rows than columns. It is in row reduced echelon form.
1 0
0 1
0 0
0 0
This row reduced echelon matrix has leading entries in the (1, 1), (2, 2), and (3, 4) positions.
1 0 6 0 2
0 1 1 0 −3
0 0 0 1 4
0 0 0 0 0
The nonzero numbers in the third and fifth columns don’t violate the definition, because they aren’t in
the same column as a leading entry.
A zero matrix is in row-reduced echelon form, though it won’t normally come up during a row reduc-
tion.
0 0 0 0
0 0 0 0
0 0 0 0
Note that conditions (a), (b), and (d) of the definition are vacuously satisfied, since there are no leading
entries.
Just as you may have wondered why only certain operations are allowed as row operations, you might
wonder what row reduced echelon form is for. “Why do we want a matrix to look this way?” As with the
question about row operations, the rationale for row reduced echelon form involves solving systems of linear
equations. If you want, you can jump ahead to the section on solving systems of linear equations and see
how these questions are answered. In the rest of this section, I’ll focus on the process on row reducing a
matrix and leave the reasons for later.
Example. The following real number matrices are not in row reduced echelon form. In each case, explain
why.
1 0 0
(a) 0 7 0
0 0 1
1 0 −3
(b) 0 1 5 .
0 0 1
3
0 0 0 0
(c) 0 1 0 −1 .
0 0 1 9
1 37 2 −1
(d) 0 1 −3 0 .
0 0 0 0
0 1 7 10
(e) 1 0 4 −5 .
0 0 0 0
(a) The first nonzero element in row 2 is a “7”, rather than a “1”.
(b) The leading entry in row 3 is not the only nonzero element in its column.
(d) The leading entry in row 2 is not the only nonzero element in its column.
(e) The leading entries do not form a “stairstep pattern” from northwest to southeast.
Row reduction is the process of using row operations to transform a matrix into a row reduced echelon
matrix. As the algorithm proceeds, you move in stairstep fashion from “northwest” to “southeast” through
different positions in the matrix. In the description below, when I say that the current position is (i, j),
I mean that your current location is in row i and column j. The current position refers to a location,
not the element at that location (which I’ll sometimes call the current element or current entry). The
current row means the row of the matrix containing the current position and the current column means
the column of the matrix containing the current position.
Some notes:
1. There are many ways to arrange the algorithm. For instance, another approach gives the LU -
decomposition of a matrix.
2. Trying to learn to row reduce by following the steps below is pretty tedious, and most people will
want to learn by doing examples. The steps are there so that, as you’re learning to do this, you have some
idea of what to do if you get stuck. Skim the steps first, then move on to the examples; go back to the steps
if you get stuck.
3. As you gain experience, you may notice shortcuts you can take which don’t follow the steps below.
But you can get very confused if you focus on shortcuts before you’ve really absorbed the sense of the
algorithm. I think it’s better to learn to use a correct algorithm “by the book” first, the test being whether
you can reliably and accurately row reduce a matrix. Then you can consider using shortcuts.
4. The algorithm is set up so that if you stop in the middle of a row reduction — maybe you want to
take a break to have lunch — and forget where you were, you can restart the algorithm from the very start.
The algorithm will quickly take you through the steps you already did without redoing the computations and
leave you where you left off.
5. There’s no point in doing row reductions by hand forever, and for larger matrices (as would occur
in real world applications) it’s impractical. At some point, you’ll use a computer. However, I think it’s
important to do enough examples by hand that you understand the algorithm.
We’ll assume that we’re using a number system which is a field. Remember that this means that every
nonzero number has a multiplicative inverse — or equivalently, that you can divide by any nonzero number.
4
Step 1. Start with the current position at (1, 1).
Step 2. Test the element at the current position. If it’s nonzero, go to Step 2(a); if it’s 0, go to Step 2(b).
Step 2(a). If the element at the current position is nonzero, then:
(i) Divide all the elements in the current row by the current element. This makes the current element
1.
(ii) Add or subtract multiples of the current row from the other rows in the matrix so that all the
elements in the current column (except for the current element) are 0.
(iii) Move the current position to the next row and the next column — that is, move one step down
and one step to the right. If doing either of these things would take you out of the matrix, then stop: The
matrix is in row-reduced echelon form. Otherwise, return to the beginning of Step 2.
Step 2(b). If the element at the current position is 0, then look at the elements in the current column
below the current element. There are two possibilities.
(i) If all the elements below the current element are 0, then move the current position to the next
column (in the same row) — that is, move one step right. If doing this would take you out of the matrix,
then stop: The matrix is in row-reduced echelon form. Otherwise, return to the beginning of Step 2.
(ii) If some element below the current element is nonzero, then swap the current row and the row
containing the nonzero element. Then return to the beginning of Step 2.
Example. For each of the following real number matrices, assume that the current position is (1, 1). What
is the next step in the row-reduction algorithm?
(a)
2 −4 0 8
3 0 0 1
−1 0 1 1
(b)
1 −2 0 4
3 0 0 1
−1 0 1 1
(c)
1 −2 0 4
0 6 0 −11
0 −2 1 5
(d)
0 2 1 −1
0 8 −11 4
7 5 0 −2
(e)
0 1 0 3
0 1 5 17
0 2 −3 0
(a) The element in the current position is 2, which is nonzero. So I divide the first row by 2 to make the
element 1:
2 −4 0 8 1 −2 0 4
3 →
0 0 1 1
3 0 0 1
r1 → 2 r1
−1 0 1 1 −1 0 1 1
5
(b) The element in the current position is 1. There are nonzero elements below it in the first column, and I
want to turn those into zeros. To turn the 3 in row 2, column 1 into a 0, I need to subtract 3 times row 1
from row 2:
1 −2 0 4 1 −2 0 4
→ 0
3 0 0 1 6 0 −11
r2 → r2 − 3r1
−1 0 1 1 −1 0 1 1
(c) The element in the current position is 1, and it’s the only nonzero element in its column. So I move the
current position down one row and to the right one column. No row operation is performed, and the matrix
doesn’t change.
1 −2 0 4 1 −2 0 4
0 6 0 −11 to 0 6 0 −11
0 −2 1 5 0 −2 1 5
(d) The element in the current position is 0. I look below it and see a nonzero element in the same column
in row 3. So I swap row 1 and row 3; the current position remains the same, and I return to the start of
Step 2.
0 2 1 −1 7 5 0 −2
0 →
8 −11 4 0 8 −11 4
r1 ↔ r3
7 5 0 −2 0 2 1 −1
(e) The element in the current position is 0. There are no nonzero elements below it in the same column. I
don’t perform any row operations; I just move the current position to the next column (in the same row).
The matrix does not change.
0 1 0 3 0 1 0 3
0 1 5 17 to 0 1 5 17
0 2 −3 0 0 2 −3 0
1. Why does the algorithm terminate? (Could you ever get stuck and never finish?)
2. When the algorithm does terminate, why is the final matrix in row-reduced echelon form?
The first question is easy to answer. As you execute the algorithm, the current position moves through
the matrix in “stairstep” fashion:
The cases in Step 2 cover all the possibilities, and in each case, you perform a finite number of row
operations (no larger than the number of rows in the matrix, plus one) before you move the current position.
Since you’re always moving the current position to the right (Step 2(b)(i)) or to the right and down (Step
2(a)(iii)), and since the matrix has only finitely many rows and columns, you must eventually reach the edge
of the matrix and the algorithm will terminate.
As for the second question, I’ll give a very informal argument using the matrix with the “stairstep”
path pictured above.
6
First, if you moved the current position down and to the right, the previous current element was a 1,
and every other element in its column must be 0. In the matrix with the “stairstep” path I gave above, this
means that each spot where a curved arrow starts must be a 1, and all the other elements in the column
with a 1 must be 0. Hence, the matrix must look like this:
1 ∗ ∗ 0 0 ∗ 0 ∗
0 ∗ ∗ 1 0 ∗ 0 ∗
0 ∗ ∗ 0 1 ∗ 0 ∗
0 ∗ ∗ 0 0 ∗ 1 ∗
0 ∗ ∗ 0 0 ∗ 0 ∗
0 ∗ ∗ 0 0 ∗ 0 ∗
(The ∗’s stand for elements which I don’t know.)
Next, notice that if you moved the current position to the right (but not down), then the previous
current element and everything below it must have been 0. In terms of the picture, every spot where a right
arrow starts must be a 0, and all the elements below it must be 0. Now I know that the matrix looks like
this:
1 ∗ ∗ 0 0 ∗ 0 ∗
0 0 0 1 0 ∗ 0 ∗
0 0 0 0 1 ∗ 0 ∗
0 0 0 0 0 0 1 ∗
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
In either case, you don’t move your current position until the column you’re in is “fixed” — that is,
your column and everything to the left of it is in row reduced echelon form. When the algorithm terminates,
the whole matrix is in row reduced echelon form.
Notice that this matrix is in row-reduced echelon form.
Row reduction is a key algorithm in linear algebra, and you should work through enough examples so
that you understand how it works. So here we go with some full examples — try to do as much of the work
by yourself as possible. I will start by giving lots of details, but gradually give fewer so that this doesn’t
become too long and tedious.
Example. Row reduce the following matrix over R to row-reduced echelon form.
0 0 1 −1 −2
2 −4 −2 4 18
−1 2 3 −5 −16
I will follow the steps in the algorithm I described earlier. Since this is our first example, I’ll explain
the steps and the reasons for them in a lot of detail. This will make the solution a bit long!
We start out in the upper-left hand corner (row 1, column 1), where the entry is 0. According to the
algorithm, I look below that entry to see if I can find a nonzero entry in the same column. The entry in row
2, column 1 is 2, which is nonzero. So I swap row 1 and row 2:
0 0 1 −1 −2 2 −4 −2 4 18
2 −4 −2 4 →
18 0 0 1 −1 −2
r1 ↔ r2
−1 2 3 −5 −16 −1 2 3 −5 −16
Note that the current position hasn’t changed: It is still row 1, column 1. But the entry there is now
2, which is nonzero. The algorithm says that if the entry is nonzero but not equal to 1, then divide the row
by the entry. So I divide row 1 by 2:
2 −4 −2 4 18 → 1 −2 −1 2 9
0 0 1 −1 −2 1 0 0 1 −1 −2
r1 → r1
−1 2 3 −5 −16 2 −1 2 3 −5 −16
7
Now the element in the current position is 1 — it is what I want for the leading entry in each row.
The algorithm then tells me to look below and above the current entry in its column. Wherever I see a
nonzero entry above or below, I add or subtract a multiple of the row to make the nonzero entry 0. Most of
the row operations you do will be operations of this kind.
There are no entries above the 1 is row 1, column 1. Below it there is 0 in row 2 and −1 in row 3. The
0 in row 2 is already what I want. So I just have to fix the −1 in row 3. What multiple of 1 do you add to
−1 to get 0? Obviously, you just add (the “multiple” is 1). So I add row 1 to row 3:
1 −2 −1 2 9 1 −2 −1 2 9
0 →
0 1 −1 −2 0 0 1 −1 −2
r3 → r3 + r1
−1 2 3 −5 −16 0 0 2 −3 −7
Finally, I use the 1 I just created in row 3, column 4 to “zero out” the two entries above it in column 4:
1 −2 0 1 7 1 −2 0 0 4 1 −2 0 0 4
0 0 1 −1 → →
−2 0 0 1 −1 −2 0 0 1 0 1
r1 → r1 − r3 r2 → r2 + r3
0 0 0 1 3 0 0 0 1 3 0 0 0 1 3
As I noted at the start, the solution above is pretty long, but that’s only because I explained every step.
After the first few examples, I will just do the steps without explanation. And eventually, I’ll just give the
starting matrix and the row-reduced echelon matrix without showing the steps.
8
As you do row reduction problems, refer to the algorithm and examples if you forget what steps you
should do. With practice, you’ll find that you know what to do, and your biggest problem may be avoiding
arithmetic mistakes.
At times, you may see shortcuts you can take which don’t follow the algorithm exactly. I think you
should avoid shortcuts at the start, or you will not learn the algorithm properly. The algorithm is guaranteed
to work; shortcuts can lead you to go in circles. When you can consistently get the right answer, you can
take shortcuts where appropriate.
Example. Row reduce the following matrix to row-reduced echelon form over Z3 :
2 2 0 1 2
2 1 2 1 0
1 0 2 1 0
I start with the current position at row 1, column 1. The element there is 2, which is nonzero. If I were
working with real numbers, I would divide row 1 by 2. Since I’m working over Z3 , I have to multiply row 1
by 2−1 — multiplying by the inverse of 2 is the equivalent in this case to dividing by 2. Now
2 · 2 = 1 in Z3 .
Now the element at the current position is 1, which is what I want for the leading entry in row 1. I use
it to clear out the numbers below it in the first column.
Since there is a 2 in row 2 and 2 + 1 = 0, I add row 1 to row 2. Since there is a 1 in row 3 and 1 + 2 ·1 = 0,
I add 2 times row 1 to row 3. Here’s what happens:
1 1 0 2 1 1 1 0 2 1 1 1 0 2 1
2 1 2 1 0 → 2 1 2 1 0 → 0 2 2 0 1
r3 → r3 + 2r1 r2 → r2 + r1
1 0 2 1 0 0 2 2 2 2 0 2 2 2 2
You can see that column 1 is now fixed: The leading entry 1 in row 1 is the only nonzero element in
the column. I move the current position down one row and to the right one column. The current position
is now at row 2, column 2. The entry there is 2, which is nonzero. As I did with the first row, I make turn
the 2 into 1 by multiplying row 2 by 2:
1 1 0 2 1 1 1 0 2 1
0 2 2 0 1 → 0 1 1 0 2
r2 → 2r2
0 2 2 2 2 0 2 2 2 2
Now I use the leading entry 1 in row 2, column 2 to clear out the second column. I need to add 2 times
row 2 to row 1, and I need to add row 2 to row 3. Here’s the work:
1 1 0 2 1 1 0 2 2 2 1 0 2 2 2
0 1 1 0 2 → 0 1 1 0 2 → 0 1 1 0 2
r1 → r1 + 2r2 r3 → r3 + r2
0 2 2 2 2 0 2 2 2 2 0 0 0 2 1
Column 2 is now fixed: The leading entry in row 2 is 1, and it is the only nonzero element in column 2.
I move the current position down one row and to the right one column. The current position is now at row
3, column 3 — but the element there is 0. I can’t make this into 1 by multiplying by the inverse, because 0
has no multiplicative inverse.
9
In this situation, the next thing you try is to look below the current position. If there is a nonzero
element in the same column in a lower row, you swap that row with the current row to get a nonzero element
into the current position. Unfortunately, we’re in the bottom row, so there are no rows below the current
position.
Note: You do not swap either of the rows above (that is, row 1 or row 2) with row 3. While that would
get a nonzero entry into the current position, swapping in those ways will mess up either the first or second
column.
The algorithm says that in this situation, we should move the current position one column right (staying
in the same row). The current position is now row 3, column 4. Fortunately, the element there is 2, which
is nonzero. As before, I can turn it into a 1 by multiplying row 3 by 2:
1 0 2 2 2 1 0 2 2 2
0 →
1 1 0 2 0 1 1 0 2
r3 → 2r3
0 0 0 2 1 0 0 0 1 2
Finally, I use the 1 in row 3, column 4 to clear out column 4. The only nonzero element above the
leading entry is the 2 in row 1. So I add row 3 to row 1:
1 0 2 2 2 1 0 2 0 1
0 →
1 1 0 2 0 1 1 0 2
r1 → r1 + r3
0 0 0 1 2 0 0 0 1 2
While row reducing matrices over Zn requires a little more thought in doing the arithmetic, you can see
(for the examples we’ll do) that the arithmetic is pretty simple. You don’t have to worry about computations
with fractions or decimals, for instance.
Here are some additional examples. I’m not explaining the individual steps, though they are labelled.
If you’re convinced by now that you could do the arithmetic for the row operations yourself, you might want
to go through the examples by just thinking about what steps to do, and taking the arithmetic that is shown
for granted. Or perhaps do a few of the row operations by hand for yourself to practice.
Example. Row reduce the following matrix over R to row reduced echelon form:
1 1 −4 2
3 4 −10 7
−4 −1 22 5
I could have asked to row reduce over Q, the rational numbers, since all of the entries in the matrices
will be rational numbers.
Here’s the row reduction; check at least a few of the steps yourself.
1 1 −4 2 1 1 −4 2 1 1 −4 2
3 → → →
4 −10 7 3 4 −10 7 0 1 2 1
r3 → r3 + 4r1 r2 → r2 − 3r1 r1 → r1 − r2
−4 −1 22 5 0 3 6 13 0 3 6 13
1 0 −6 1 1 0 −6 1 → 1 0 −6 1
0 → 1 →
1 2 1 0 1 2 1 0 1 2 1
r3 → r3 − 3r2 r3 → r3 r1 → r1 − r3
0 3 6 13 0 0 0 10 10 0 0 0 1
1 0 −6 0 1 0 −6 0
0 1 2 1 → 0 1 2 0
r2 → r2 − r3
0 0 0 1 0 0 0 1
10
The next example is over Z5 . Remember that Z5 = {0, 1, 2, 3, 4} and you do all the arithmetic mod 5.
So, for example, 3 + 4 = 2 and 4 · 2 = 3. Be careful as you do problems to pay attention to the number
system!
Some calculators and math software have programs that do row reduction, but assume that the row
reduction is done with real numbers. If you use such a program on an example like the next one, you could
very well get the wrong answer (even if the answer looks like it could be right). Be careful if you’re using
such a program to check your answers. There is software which does row reduction in Zn , but you may have
to do some searching to find programs like that.
Example. Row reduce the following matrix over Z5 to row reduced echelon form:
1 0 1 4
3 1 0 1
2 4 4 1
1 + 4 = 0, 2 + 3 = 0,
2 · 3 = 1, 4 · 4 = 1.
Here’s the row reduction:
1 0 1 4 1 0 1 4 1 0 1 4
3 1 0 1 → 3 1 0 1 → 0 1 2 4 →
r3 → r3 + 3r1 r2 → r2 + 2r1 r3 → r3 + r2
2 4 4 1 0 4 2 3 0 4 2 3
1 0 1 4 1 0 1 4 1 0 0 1
0 1 2 4 → 0 1 2 4 → 0 1 2 4 →
r3 → 4r3 r1 → r1 + 4r3 r2 → r2 + 3r3
0 0 4 2 0 0 1 3 0 0 1 3
1 0 0 1
0 1 0 3
0 0 1 3
As an example, here’s the scratch work for the first step:
r3 = 2 4 4 1
3r1 = 3 0 3 2
r3 + 3r1 = 0 4 2 3
You may need to do this kind of scratch work as you do row reductions, particularly when you’re doing
arithmetic in Zn .
Time for a bit of honesty! You may have noticed that the examples (in earlier sections as well as this
one) have tended to have “nice numbers”. It’s intentional: The examples are constructed so that the numbers
are “nice”. If the numbers and the computations were ugly, it would make it harder to see the principles,
and it wouldn’t really add anything.
As a small example, here’s a row reduction with rational numbers:
1 −6 4 −2 1 −6 4 −2
−1 −5 0 → →
4 −1 −5 0 4
r3 → r3 − 2r1 r2 → r2 + r1
2 7 −3 1 0 19 −11 5
1 −6 4 −2
1 −6 4 −2 → 4 2 →
0 −11 4 2 1
0 1 − −
r2 → − r2 11 11 r1 → r1 + 6r2
0 19 −11 5 11 0 19 −11 5
11
20 34
20 34 1 0 −
1 0 − 11 11
11 11 →
4 2 →
4 2 0 1 − − 11
0 1 −
− r3 → r3 − 19r2
11 r
11 3
→ − r3
11 11
45 93 45
0 19 −11 5 0 0 −
11 11
20 34 2
1 0 − 1 0 0
11 11 3
→ →
0 1 4 2 20
4 2 4
− − → r3 0 1 − −
11 r
11 1
r1 − 11 r
11 2
→ r2+ r3
31 11 31 11
0 0 1 − 0 0 1 −
15 15
2
1 0 0
3
0 14
1 0 −
15
31
0 0 1 −
15
You can see that the computations are getting messy even though the original matrix had entries which
were integers. Imagine having to add and reduce all those fractions! Then imagine what it would look like
if the starting matrix was
53 11
−117
2 31
8 −7.193 81.4 −217.911
3 17
9
or 31.1745 π 71.3
√
.
1111 5 1001.7 −0.00013 2
3 213
153 −
4 7
The ugly computations would add nothing to your understanding of the idea of row reduction.
It’s true that matrices that come up in the real world often don’t have nice numbers. They might also
be very large: Matrices with hundreds of rows or columns (or more). In those cases, computers are used to
do the computations, and then other considerations (like accuracy or round-off) come up. You will see these
topics in courses in numerical analysis, for instance.
Row reduction is used in many places in linear algebra. Many of those uses involve solving systems of
linear equations, which we will look at next.
All the a’s and b’s are assumed to come from some number system.
For example, here are some systems of linear equations with coefficients in R:
x+y−z = 3
x+y = 13
5x − 2y = 7
2x + y = 10
−2x + y = 3 −x − y + 5z = −7 x−y = 5
(The curly braces around each system are just here to make them stand out; if there’s no chance of
confusion, they aren’t necessary.)
A solution to a system of linear equations is a set of values for the variables which makes all the
equations true. For example,
5x − 2y = 7
x = 13, y = 29 is a solution to .
−2x + y = 3
5 · 13 − 2 · 29 = 65 − 58 = 7
−2 · 13 + 29 = −26 + 29 = 3
Note that we consider the pair (x, y) = (13, 29) a single solution to the system. In fact, it’s the only
solution to this system of equations.
Likewise, you can check that (x, y, z) = (1, 3, −1) and (x, y, z) = (2, 0, −1) are solutions to
x+y−z = 3
.
−x − y + 5z = −7
1
They aren’t the only solutions: In fact, this system has an infinite number of solutions.
This system has no solutions:
x + y = 13
2x + y = 10
x−y = 5
Here’s one way to see this. Add the first and third equations to get 2x = 18, which means x = 9. Plug
x = 9 into the second equation 2x + y = 10 to get 18 + y = 10, or y = −8. But now if you plug x = 9,
y = −8 into the first equation x + y = 13, you get 9 + (−8) = 13, which is a contradiction.
What happened in these examples is typical of what happens in general: A system of linear equation
can have exactly one solution, many solutions, or no solutions.
x + 2y = −4
2x + 3y = 5
The row operation that swaps two rows of a matrix corresponds to swapping the equations. Here we
swap equations 1 and 2:
x + 2y = −4 → 2x + 3y = 5
2x + 3y = 5 r1 ↔ r2 x + 2y = −4
It may seems pointless to do this with equations, but we saw that swapping rows was sometimes needed
in reducing a matrix to row reduced echelon form. We’ll see the connection shortly.
The row operation that multiplies (or divides) a row by a number corresponds to multiplying (or
dividing) an equation by a number. For example, I can divide the second equation by 2:
x + 2y = −4
( )
x + 2y = −4 →
3 5
2x + 3y = 5 r2 → r2 /2 x + y =
2 2
The row operation that adds a multiple of one row to another corresponds to adding a multiple of one
equation to another. For example, I can add −2 times the first equation to the second equation (that is,
subtract 2 times the first equation from the second equation):
x + 2y = −4 → x + 2y = −4
2x + 3y = 5 r2 → r2 − 2r1 −y = 13
You can see that row operations certainly correspond to “legal” operations that you can perform on
systems of equations. You can even see in the last example how these operations might be used to solve a
system.
To make the connection with row reducing matrices, suppose we take our system and just write down
the coefficients:
x + 2y = −4 1 2 −4
→
2x + 3y = 5 2 3 5
This matrix is called the augmented matrix or auxiliary matrix for the system.
Watch what happens when I row reduce the matrix to row-reduced echelon form!
1 2 −4 → 1 2 −4 → 1 2 −4 → 1 0 22
2 3 5 r2 → r2 − 2r1 0 −1 13 r2 → −1r2 0 1 −13 r1 → r1 − 2r2 0 1 −13
2
Since row operations operate on rows, they will “preserve” the columns for the x-numbers, the y-
numbers, and the constants: The coefficients of x in the equations will always be in the first column,
regardless of what operations I perform, and so on for the y-coefficients and the constants. So at the end, I
can put the variables back to get equations, reversing what I did at the start:
x y constants
1 0 22 1·x + 0·y = 22
→
0 1 −13 0·x + 1·y = −13
That is, x = 22 and y = −13 — I solved the system!
Geometrically, the graphs of x + 2y = −4 and 2x + 3y = 5 are lines. Solving simultaneously amounts to
finding where the lines intersect — in this case, in the point (22, −13).
(I will often write solutions to systems in the form (x, y) = (22, −13), or even (22, −13). In the second
form, it’s understood that the variables occur in the same order in which they occurred in the equations.)
You can see several things from this example. First, we noted that row operations imitate the operations
you might perform on whole equations to solve a system. This explains why row operations are what they
are.
Second, you can see that having row reduced echelon form as the target of a row reduction corresponds
to producing equations with “all the variables solved for”. Think particularly of the requirements that
leading entries (the 1’s) must be the only nonzero elements in their columns, and the “stairstep” pattern of
the leading entries.
Third, by working with the coefficients of the variables and the constants, we save ourselves the trouble
of writing out the equations. We “get lazy” and only write exactly what we need to write to do the problem:
The matrix preserves the structure of the equations for us. (This idea of “detaching coefficients” is useful in
other areas of math.)
Let’s see some additional examples.
Example. Use row reduction to solve the following system of linear equations over R:
x + y + z = 5
x + 3y + z = 9
4x − y + z = −2
Write down the matrix for the system and row reduce it to row reduced echelon form:
1 1 1 5 1 1 1 5 1 1 1 5 →
1 3 1 9 → 1 3 → 1
1 9 0 2 0 4
r3 → r3 − 4r1 r2 → r2 − r1 r2 → r2
4 −1 1 −2 0 −5 −3 −22 0 −5 −3 −22 2
1 1 1 5 1 0 1 3 1 0 1 3 →
0 1 → → 1
0 2 0 1 0 2 0 1 0 2
r1 → r1 − r2 r3 → r3 + 5r2 r3 → − r3
0 −5 −3 −22 0 −5 −3 −22 0 0 −3 −12 3
3
1 0 1 3 1 0 0 −1
0 →
1 0 2 0 1 0 2
r1 → r1 − r3
0 0 1 4 0 0 1 4
The last matrix represents the equations x = −1, y = 2, z = 4. So the solution is (−1, 2, 4).
In this case, the three equations represent planes in space, and the point (−1, 2, 4) is the intersection of
the three planes.
Everything works in similar fashion for systems of linear equations over a finite field. Here’s an example
over Z3 .
Example. Solve the following system of linear equations over Z3 :
x + 2y = 2
2x + y + z = 1
2x + z = 2
1 + 2 = 0, 2 + 2 = 1, 2 · 2 = 1.
The last equation tells you that when you would think of dividing by 2, you multiply by 2 instead.
Write down the corresponding matrix and row reduce it to row reduced echelon form:
1 2 0 2 1 2 0 2 1 2 0 2
2 1 → → →
1 1 2 1 1 1 0 0 1 0
r3 → r3 + r1 r2 → r2 + r1 r2 ↔ r3
2 0 1 2 0 2 1 1 0 2 1 1
1 2 0 2 1 2 0 2 1 0 2 1
0 → → →
2 1 1 0 1 2 2 0 1 2 2
r2 → 2r2 r1 → r1 + r2 r1 → r1 + r3
0 0 1 0 0 0 1 0 0 0 1 0
1 0 0 1 1 0 0 1
0 1 2 →
2 0 1 0 2
r2 → r2 + r3
0 0 1 0 0 0 1 0
The last matrix represents the equations x = 1, y = 2, z = 0. Hence, the solution is (1, 2, 0).
It’s possible for a system of linear equations to have no solutions. Such a system is said to be incon-
sistent. You can tell a system of linear equations is inconsistent if at any point one of the equations gives
3
a contradiction, such as “0 = 13” or “0 = − ”.
2
4
Example. Solve the following system of equations over R:
3x + y + 3z = 2
x + 2z = −3
2x + y + z = 4
x + 2z = 0
y − 3z = 0
0 = 1
a + 2b + d = 3
−a − 2b + c − 2e =
−2a − 4b + c − d − 2e = −5
a + 2b + d = 3
c + d − 2e = 1
Notice that the variables are not solved for as particular numbers. This means that the system will have
more than one solution. In this case, we’ll write the solution in parametric form.
The variables a and c correspond to leading entries in the row reduced echelon matrix. b, d, and e are
called free variables. Solve for the leading entry variables a and c in terms of the free variables b, d, and e:
a = −2b − d + 3
c = −d + 2e + 1
5
Next, assign parameters to each of the free variables b, d, and e. Often variables like s, t, u, and v are
used as parameters. Try to pick variables “on the other side of the alphabet” from the original variables to
avoid confusion. I’ll use
b = s, d = t, e = u.
Plugging these into the equations for a and c gives
a = −2s − t + 3, c = −t + 2u + 1.
a = −2s − t + 3
b = s
c = −t + 2u + 1
d = t
e = u
(I lined up the variables so you can see the structure of the solution. You don’t need to do this in
problems for now, but it will be useful when we discuss how to find the basis for the null space of a matrix.)
Each assignment of numbers to the parameters s, t, and u produces a solution. For example, if s = 1,
t = 0, and u = 2,
a = (−2) · 1 − 0 + 3 = 1, c = −0 + 2 · 2 + 1 = 5.
The solution (in this case) is (a, b, c, d, e) = (1, 1, 5, 0, 2). Since you can assign any real number to each
of s, t, and u, there are infinitely many solutions.
Geometrically, the 3 original equations represented sets in 5-dimensional space R5 . The solution to the
system represents the intersection of those 3 sets. Since the solution has 3 parameters, the intersection is
some kind of 3-dimensional set. Since the sets are in 5-dimensional space, it would not be easy to draw a
picture!
Example. Solve the following system over Z5 :
w + x + y + 2z = 1
2x + 2y + z = 0
2w + 2y + z = 1
Write down the matrix for the system and row reduce:
1 1 1 2 1 1 1 1 2 1 1 1 1 2 1
0 → → →
2 2 1 0 0 2 2 1 0 0 1 1 3 0
r3 → r3 − 2r1 r2 → 3r2 r1 → r1 − r2
2 0 2 1 1 0 3 0 2 4 0 3 0 2 4
1 0 0 4 1 1 0 0 4 1 1 0 0 4 1
0 → → →
1 1 3 0 0 1 1 3 0 0 1 1 3 0
r3 → r3 − 3r2 r3 → 3r3 r2 → r2 − r3
0 3 0 2 4 0 0 2 3 4 0 0 1 4 2
1 0 0 4 1
0 1 0 4 3
0 0 1 4 2
Putting the variables back, the corresponding equations are
w + 4z = 1, x + 4z = 3, y + 4z = 2.
Solve for the leading entry variables w, x, and y in terms of the free variable z. Note that 4z + z = 0,
so I can solve by adding z to each of the 3 equations:
w = z + 1, x = z + 3, y = z + 2.
6
To write the solution in parametric form, assign a parameter to the “non-leading entry” variable z.
Thus, set z = t:
w = t + 1, x = t + 3, y = t + 2, z = t.
Remember that the number system is Z5 = {0, 1, 2, 3, 4}. So there are 5 possible values for the parameter
t, and each value of t gives a solution. For instance, if t = 1, I get
w = 2, x = 4, y = 3, z = 1.
Remember that if A is a square (n × n) matrix, the inverse A−1 of A is an n × n matrix such that
Here I is the n × n identity matrix. Note that not every square matrix has an inverse.
Row reduction provides a systematic way to find the inverse of a matrix. The algorithm gives another
application of row reduction, so I’ll describe the algorithm and give an example. We’ll justify the algorithm
later.
To invert a matrix, adjoin a copy of the identity matrix to the original matrix. “Adjoin” means to place
a copy of the identity matrix of the same size as the original matrix on the right of the original matrix to
form a larger matrix (the augmented matrix).
Next, row reduce the augmented matrix. When the block corresponding the original matrix becomes
the identity, the block corresponding to the identity will have become the inverse.
Example. Invert the following matrix over R:
2 −1
.
5 −3
Place a copy of the 2 × 2 identity matrix on the right of the original matrix, then row reduce the
augmented matrix:
1 1
→
" 1 1 #
1 − 0
2 −1 1 0 1 1 − 0 → 2 2 →
2 2
5 −3 0 1 r1 → r1 r2 → r2 − 5r1
1 5
r2 → −2r2
2 5 −3 0 1 0 − − 1
2 2
" 1 1 #
→
1 − 0 1 1 0 3 −1
2 2 r1 → r1 + r2 0 1 5 −2
0 1 5 −2 2
You can see that the original matrix in the left-hand 2 × 2 block has turned into the 2 × 2 identity
matrix. The identity matrix that was in the right-hand 2 × 2 block has at the same time turned into the
inverse.
Therefore, the inverse is
3 −1
5 −2
7
You can check that
2 −1 3 −1 1 0 3 −1 2 −1 1 0
= and = .
5 −3 5 −2 0 1 5 −2 5 −3 0 1
Of course, there’s a formula for inverting a 2 × 2 matrix. But the procedure above works with (square)
matrices of any size. To explain why this algorithm works, I’ll need to examine the relationship between row
operations and inverses more closely.
You might have discussed lines and planes in 3 dimensions in a calculus course; if so, you probably
saw problems like the next example. The rule of thumb is: You can often find the intersection of things by
solving equations simultaneously.
Example. Find the line of intersection of the planes in R3 :
x + 2y − z = 4 and x − y − z = 3.
Think of the equations as a system of linear equations over R. Write down the coefficient matrix and
row reduce:
→
1 2 −1 4 → 1 2 −1 4 1
1 −1 −1 3 r2 → r2 − r1 0 −3 0 −1 r2 → − r2
3
10
1 2 −1 4 1 0 −1
" #
→ 3
1
0 1 0 r1 → r1 − 2r2
1
3 0 1 0
3
The last matrix gives the equations
10 1
x−z = , y= .
3 3
10 1
Thus, x = z + and y = . Letting z = t, the parametric solution is
3 3
10 1
x= t+ , y= , z = t.
3 3
These are the parametric equations of the line of intersection.
The solution of simultaneous linear equations dates back 2000 years to the Jiuzhang Suanshu, a collection
of problems compiled in China. The systematic algebraic process of eliminating variables seems to be due to
8
Isaac Newton. Matrix notation developed in the second half of the 1800’s; what we call Gaussian elimination,
applied to matrices, was developed in the first half of the 1900’s by various mathematicians. Grcar [1] contains
a nice historical account.
[1] Joseph Grcar, Mathematics of Gaussian elimination, Notices of the American Mathematical Society,
58(6)(2011), 782–792.
Ax = b.
(One reason for using matrix notation is that it saves writing!) If A has an inverse A−1 , I can multiply
both sides by A−1 :
A−1 Ax = A−1 b
I · x = A−1 b
x = A−1 b
I’ve solved for the vector x of variables.
Not every matrix has an inverse — an obvious example is the zero matrix, but here’s a nonzero non-
invertible matrix over the real numbers:
1 1
.
0 0
a b
Suppose this matrix had an inverse . Then
c d
1 1 a b 1 0 a+c b+d 1 0
= , so = .
0 0 c d 0 1 0 0 0 1
But equating entries in row 2, column 2 gives the contradiction 0 = 1. Hence, the original matrix does
not have an inverse.
If we want to know whether a matrix has an inverse, we could try to do what we did in this example —
set up equations and see if we can solve them. But you can see that it could be pretty tedious if the matrix
was large or the entries were messy. And we saw earlier that you can solve systems of linear equations using
row reduction.
In this section, we’ll see how you can use row reduction to determine whether a matrix has an inverse
— and, if it does, how to find the inverse. We’ll begin by explaining the connection between elementary row
operations and matrices.
Definition. An elementary matrix is a matrix which represents an elementary row operation. “Repre-
sents” means that multiplying on the left by the elementary matrix performs the row operation.
Here are the elementary matrices that represent our three types of row operations. In the pictures
below, the elements that are not shown are the same as those in the identity matrix. In particular, all of the
1
elements that are not on the main diagonal are 0, and all the main diagonal entries — except those shown
— are 1.
Multiplying by this matrix swaps rows i and j:
i j
1 0
0 1
..
.
i 0 ··· 1
.. ..
. .
j
1 ··· 0
..
.
1 0
0 1
The “i” and “j” on the borders of the matrix label rows and columns so you can see where the elements
are.
This is the same as the identity matrix, except that rows i and j have been swapped. In fact, you obtain
this matrix by applying the row operation (“swap rows i and j”) to the identity matrix. This is true for our
other elementary matrices.
Multiplying by this matrix multiplies row i by the number c:
i
1 0
0 1
..
.
i c
..
.
1 0
0 1
This is the same as the identity matrix, except that row i has been multiplied by c. Note that this is
only a valid operation if the number c has a multiplicative inverse. For instance, if we’re working over the
real numbers, c can be any nonzero number.
Multiplying by this matrix replaces row i with row i plus c times row j.
i j
1 0
0 1
..
.
i 1 ··· c
.. ..
. .
j
0 ··· 1
..
.
1 0
0 1
To get this matrix, apply the operation “add c times row j to row i” to the identity matrix.
While we could give formal proofs that these matrices do what we want — we would have to write
formulas for elements of the matrices, then use the definition of matrix multiplication — I don’t think the
2
proofs would be very enlightening. It’s good, however, to visualize the multiplications for yourself to see why
these matrices work: Take rows of the elementary matrix and picture them multiplying rows of the original
matrix. For example, consider the elementary matrix that swaps row i and row j.
i j
1 0
0 1
..
.
i 0 ··· 1
.. ..
. .
j
1 ··· 0
..
.
1 0
0 1
When you multiply the original matrix by row FOO of this matrix, you get row FOO of the product.
So multiplying the original matrix by first row of this matrix gives the first row of the product, and so on.
Let’s look at what happens when you multiply the original matrix by row i of this matrix.
j-th position
row i of the
elementary [ 0 0 ... 0 1 ... 0 ]
matrix
0 row 1
0 row 2
...
...
0
j-th position 1 row j
...
...
0 row m
original matrix
Row i has 0’s everywhere except for a 1 in the j th position. So when it multiplies the original matrix,
all the rows of the original matrix get multiplied by 0, except for the j th row, which is multiplied by 1. The
net result is the j th row of the original matrix. Thus, the ith row of the product is the j th row of the original
matrix.
If you picture this process one row at a time, you’ll see that the original matrix is replaced with the
same matrix with the i and j rows swapped.
Let’s try some examples.
This elementary matrix should swap rows 2 and 3 in a 3 × 3 matrix:
1 0 0
0 0 1
0 1 0
Notice that it’s the identity matrix with rows 2 and 3 swapped.
Multiply a 3 × 3 matrix by it on the left:
1 0 0 a b c a b c
0 0 1d e f = g h i .
0 1 0 g h i d e f
3
This elementary matrix should multiply row 2 of a 2 × 2 matrix by 13:
1 0
0 13
Notice that it’s the identity matrix with row 2 multiplied by 13. (We’ll assume that we’re in a number
system where 13 is invertible.)
Multiply a 2 × 2 matrix by it on the left:
1 0 a b a b
= .
0 13 c d 13c 13d
Row 2 orf the original matrix was multiplied by 13.
This elementary matrix should add 5 times row 1 to row 3:
1 0 0
0 1 0
5 0 1
Notice that it’s the identity matrix with 5 times row 1 added to row 3.
Multiply a 3 × 3 matrix by it on the left:
1 0 0 a b c a b c
0 1 0d e f = d e f .
5 0 1 g h i g + 5a h + 5b i + 5c
You can see that 5 times row 1 was added to row 3.
The inverses of these matrices are, not surprisingly, the elementary matrices which represent the inverse
row operations. The inverse of a row operation is the row operation which undoes the original row operation.
Let’s look at the three operations in turn.
The inverse of swapping rows i and j is swapping rows i and j — to undo swapping two things, you
swap the two things back! So the inverse of the “swap row i and j” elementary matrix the the same matrix:
i j
1 0
0 1
..
.
i 0 ··· 1
.. ..
. .
j
1 ··· 0
..
.
1 0
0 1
The inverse of multiplying row i by c is dividing row i by c. To account for the fact that a number
system might not have “division”, I’ll say “multiplying row i by c−1 ”. Just take the original “multiply row
i by c” elementary matrix above and replace c with c−1 :
i
1 0
0 1
..
.
i c−1
..
.
1 0
0 1
4
The inverse of adding c times row j to row i is subtracting c times row j from row i. To write down the
inverse, I just replace c with −c in the matrix for “row i goes to row i plus c times row j”:
i j
1 0
0 1
..
.
i 1 · · · −c
.. ..
. .
j
0 ··· 1
..
.
1 0
0 1
Example. In each case, tell what row operation is performed by multiplying on the left by the elementary
matrix. Then find the inverse of the elementary matrix, and tell what row operation is performed by
multiplying on the left by the inverse. (All the matrices are real number matrices.)
1 0 2
(a) 0 1 0
0 0 1
0 1 0
(b) 1 0 0
0 0 1
1 0 0
(c) 0 17 0
0 0 1
(a) Multiplying on the left by
1 0 2
0 1 0 adds 2 times row 3 to row 1.
0 0 1
The inverse
−1
1 0 2 1 0 −2
0 1 0 = 0 1 0 subtracts 2 times row 3 from row 1.
0 0 1 0 0 1
5
The inverse
−1 1 0 0
1 0 0
0 17 0 = 1
0 0 divides row 2 by 17.
17
0 0 1 0 0 1
Definition. Matrices A and B are row equivalent if A can be transformed to B by a finite sequence of
elementary row operations.
Remark. Since row operations may be performed by multiplying by elementary matrices, A and B are row
equivalent if and only if there are elementary matrices E1 , . . . , En such that
E1 · · · En A = B.
Proof. Let’s recall what it means (in this situation) to be an equivalence relation. I have to show three
things:
(c) (Transitivity) If A row reduces to B and B row reduces to C, then A row reduces to C.
(a) is obvious, since I can row reduce a matrix to itself by performing the identity row operation.
For (b), suppose A row reduces to B. Then there are elementary matrices E1 , . . . En such that
E1 · · · En A = B.
Hence,
A = En−1 · · · E1−1 B.
Since the inverse of an elementary matrix is an elementary matrix, each Ei−1 is an elementary matrix.
This equation gives a sequence of row operations which row reduces B to A.
To prove (c), suppose A row reduces to B and B row reduces to C. Then there are elementary matrices
E1 , . . . , Em and F1 , . . . , Fn such that
E1 · · · Em A = B and F1 · · · Fn B = C.
Hence,
F1 · · · Fn E1 · · · Em A = C.
This equation gives a sequence of row operations which row reduces A to C.
Therefore, row equivalence is an equivalence relation.
In this case, B is the inverse of A (or A is the inverse of B), and we write A−1 for B (or B −1 for A).
6
Notation. If A is a square matrix, then
n times
z }| {
A· A···A
if n > 0
An = I if n = 0 .
n times
z }| {
A−1 · A−1 · · · A−1 if n < 0
Note that An for n < 0 only makes sense if A is invertible.
The usual rules for powers hold:
(a) Am An = Am+n .
(b) (Am )n = Amn .
The proofs involve induction and taking cases. I don’t think they’re that enlightening, so I will skip
them.
Example. Consider the real matrix
3 1
A= .
2 0
Compute A2 and A−2 .
2 3 1 3 1 11 3
A = A·A = = .
2 0 2 0 6 2
Using the formula for the inverse of a 2 × 2 matrix,
−1 1 0 −1 0 0.5
A = = .
3 · 0 − 2 · 1 −2 3 1 −1.5
Therefore,
0 0.5 0 0.5 0.5 −0.75
A−2 = A−1 · A−1 = = .
1 −1.5 1 −1.5 −1.5 2.75
Remember that matrix multiplication isn’t commutative, so AB is not necessarily equal to BA. So
what is (AB)2 ? Since X 2 is shorthand for X · X, and in this case X = AB, the best we can say is that
(AB)2 = ABAB.
Example. Give a specific example of two 2 × 2 real matrices A and B for which
(AB)2 6= A2 B 2 .
How should I construct this counterexample? On the one hand, I want to avoid matrices which are “too
special”, because I might accidentally get a case where the equation holds. (The example in this problem
just shows that the equation “(AB)2 = A2 B 2 ” isn’t always true; this doesn’t mean that it’s never true.) For
instance, I should not take A to be the identity matrix or the zero matrix (in which case the equation would
actually be true).
On the other hand, I’d like to keep the matrices simple — first, so that I don’t struggle to do the
computation, and second, so that a reader can easily see that the computation works. For instance, this
would be a bad idea: " #
−1.378
√ π
A= 2 .
1013
171
7
When you’re trying to construct a counterexample, try to keep these ideas in mind. In the end, you
may have to make a few trials to get a suitable counterexample — you will not know whether something
works without trying.
I will take
1 0 0 1
A= and B = .
1 1 1 0
Following the ideas above, I tried to make matrices which were simple without being too special.
Then
2 1 0 2 1 0 2 2 1 0
A = and B = so A B = .
2 1 0 1 2 1
On the other hand,
0 1 2 1 1
AB = so (AB) = .
1 1 1 2
So for these two matrices, A2 B 2 6= (AB)2 .
Proposition.
(a) If A and B are invertible n × n matrices, then AB is invertible, and its inverse is given by
(AB)−1 = B −1 A−1 .
Proof. (a) Remember that AB is not necessarily equal to BA, since matrix multiplication is not necessarily
commutative.
I have
(B −1 A−1 )(AB) = B −1 IB = B −1 B = I,
(AB)(B −1 A−1 ) = AIA−1 = AA−1 = I.
Since B −1 A−1 gives the identity when multiplied by AB, it means that B −1 A−1 must be the inverse of
AB — that is, (AB)−1 = B −1 A−1 .
(b) I have
T T
AT (A−1 )T = A−1 A = I T = I and (A−1 )T AT = AA−1 = I T = I.
Since (A−1 )T gives the identity when multiplied by AT , it means that (A−1 )T must be the inverse of
A — that is, (AT )−1 = (A−1 )T .
T
Remark. Look over the proofs of the two parts of the last proposition and be sure you understand why the
computations proved the things that were to be proved. The idea is that the inverse of a matrix is defined
by a property, not by appearance. By analogy, it is like the difference between the set of mathematicians (a
set defined by a property) and the set of people with purple hair (a set defined by appearance).
A matrix C is the inverse of a matrix D if it has the property that multiplying C by D (in both orders)
gives the identity I. So to check whether a matrix C really is the inverse of D, you multiply C by D (in
both orders) any see whether you get I.
Example. Suppose that A and B are n × n invertible matrices. Simplify the following expression:
(AB)−2 A (BA)2 A.
8
(AB)−2 A(BA)2 A = [(AB)2 ]−1 A(BA)2 A
= (ABAB)−1 A(BABA)A
= B −1 A−1 B −1 A−1 ABABAA
= A2
Example. (Solving a matrix equation) Solve the following matrix equation for X, assuming that A and
B are invertible:
A2 X B A = A B.
A2 XBA = AB
A−2 A2 XBA = A−2 AB
XBA = A−1 B
XBAA−1 = A−1 BA−1
XB = A−1 BA−1
XBB −1 = A−1 BA−1 B −1
X = A−1 BA−1 B −1
Notice that I can multiply both sides of a matrix equation by the same thing, but I must multiply on
the same side of both sides. So when I multiplied by A−2 , I had to put A−2 on the left side of both sides of
the equation.
Once again, the reason I have to be careful is that in general, M N 6= N M — matrix multiplication is
not commutative.
Example. Give a specific example of two invertible 2 × 2 real matrices A and B for which
(A + B)−1 6= A−1 + B −1 .
One way to get a counterexample is to choose A and B so that A + B isn’t invertible. For instance,
1 0 −1 0
A= and B = .
0 1 0 −1
But
0 0
A+B = .
0 0
The zero matrix is not invertible, because 0 · C = 0 for any matrix C — so for no matrix C can 0 · C = I.
But this feels like “cheating”, because the left side (A + B)−1 of the equation isn’t defined. Okay —
can we find two matrices A and B for which (A + B)−1 6= A−1 + B −1 and both sides of the equation are
defined? Thus, we need A, B, and A + B to all be invertible.
I’ll use
1 0 0 1
A= and B = .
1 1 1 0
9
Then
−1 1 0 −1 0 1
A = and B = .
−1 1 1 0
(You can find the inverses using the formula for the inverse of a 2 × 2 matrix which I gave when I
discussed matrix arithmetic. You can also find the inverses using row reduction.)
Thus,
1 1
A−1 + B −1 = .
0 1
On the other hand,
1 1 −1 −1 1
A+B = so (A + B) = .
2 1 2 −1
Thus, (A + B)−1 6= A−1 + B −1 even though both sides of the equation are defined.
The next result connects several of the ideas we’ve looked at: Row reduction, elementary matrices,
invertibility, and solving systems of equations.
Theorem. Let A be an n × n matrix. The following are equivalent:
(a) A is row equivalent to I.
(b) A is a product of elementary matrices.
(c) A is invertible.
(d) The only solution to the following system is the vector x = 0:
Ax = 0.
(e) For any n-dimensional vector b, the following system has a unique solution:
Ax = b.
Proof. When you are trying to prove several statements are equivalent, you must prove that if you assume
any one of the statements, you can prove any of the others. I can do this here by proving that (a) implies
(b), (b) implies (c), (c) implies (d), (d) implies (e), and (e) implies (a).
(a) ⇒ (b): Let E1 , . . . , Ep be elementary matrices which row reduce A to I:
E1 · · · Ep A = I.
Then
A = Ep−1 · · · E1−1 .
Since the inverse of an elementary matrix is an elementary matrix, A is a product of elementary matrices.
(b) ⇒ (c): Write A as a product of elementary matrices:
A = F1 · · · Fq .
Now
F1 · · · Fq · Fq−1 · · · F1−1 = I,
Fq−1 · · · F1−1 · F1 · · · Fq = I.
Hence,
A−1 = Fq−1 · · · F1−1 .
10
(c) ⇒ (d): Suppose A is invertible. The system Ax = 0 has at least one solution, namely x = 0.
Moreover, if y is any other solution, then
Ay = 0, so A−1 Ay = A−1 0, or y = 0.
Ignoring the last column (which never changes), this means there is a sequence of row operations E1 ,
. . . , En which reduces A to the identity I — that is, A is row equivalent to I. (I’ve actually proved (d) ⇒
(a) at this point.)
Let b = hb1 , . . . bn i be an arbitrary n-dimensional vector. Then
1 0 · · · 0 b′1
a11 a12 ··· a1n b1
a
21 a22 ··· a2n b2 0 1 · · · 0 b′2
E1 · · · En
... .. .. .. =. .
. . .. . . .
. . . . . . .. ..
an1 an2 · · · ann bn 0 0 · · · 1 b′n
A(y − z) = Ay − Az = b − b = 0.
If A is invertible, the theorem implies that A can be written as a product of elementary matrices. To
do this, row reduce A to the identity, keeping track of the row operations you’re using. Write each row
operation as an elementary matrix, and express the row reduction as a matrix multiplication. Finally, solve
the resulting equation for A.
Example. (Writing an invertible matrix as a product of elementary matrices) Express the following
real matrix as a product of elementary matrices:
2 −4
A= .
−2 3
11
1 −2 → 1 0
0 1 r1 → r1 + 2r2 0 1
Next, represent each row operation as an elementary matrix:
"1 #
1 0
r1 → r1 corresponds to 2 ,
2 0 1
1 0
r2 → r2 + 2r1 corresponds to ,
2 1
1 2
r1 → r1 + 2r2 corresponds to .
0 1
Using the elementary matrices, write the row reduction as a matrix multiplication. A must be multiplied
on the left by the elementary matrices in the order in which the operations were performed.
" 1 #
1 2 1 0 0
2 · A = I.
0 1 2 1 0 1
Solve the last equation for A, being careful to get the inverses in the right order:
"1 #−1 −1 −1 " 1 # "1 #−1 −1 −1
0 1 0 1 2 1 2 1 0 0 0 1 0 1 2
2 2 ·A= 2 · I,
0 1 2 1 0 1 0 1 2 1 0 1 0 1 2 1 0 1
You can check your answer by multiplying the matrices on the right.
For two n × n matrices A and B to be inverses, I must have
AB = I and BA = I.
Since in general AB need not equal BA, it seems as though I must check that both equations hold in
order to show that A and B are inverses. It turns out that, thanks to the earlier theorem, we only need to
check that AB = I.
Corollary. If A and B are n × n matrices and AB = I, then A = B −1 and BA = I.
Proof. Suppose A and B are n × n matrices and AB = I. The system Bx = 0 certainly has x = 0 as a
solution. I’ll show it’s the only solution.
Suppose y is another solution, so
By = 0.
Multiply both sides by A and simplify:
ABy = A · 0
Iy = 0
y=0
12
Thus, 0 is a solution, and it’s the only solution.
Thus, B satisfies condition (d) of the Theorem. Since the five conditions are equivalent, B also satisfies
condition (c), so B is invertible. Let B −1 be the inverse of B. Then
AB = I
ABB −1 = IB −1
AI = B −1
A = B −1
BA = BB −1 = I.
E1 · · · Ep A = I.
But this equation says that E1 · · · Ep is the inverse of A, since multiplying A by E1 · · · Ep gives the
identity. Thus,
A−1 = E1 · · · Ep = E1 · · · Ep · I.
We can interpret the last expression E1 · · · Ep · I as applying the row operations for E1 , . . . Ep to the
identity matrix I. And the same row operations row reduce A to I. So form an augmented matrix by placing
the identity matrix next to A:
A I
augmented matrix
Row reduce the augmented matrix. The left-hand block A will row reduce to the identity; at the same
time, the right-hand block I will be transformed into A−1 .
Example. Invert the following matrix over R:
1 2 −1
−1 −1 3 .
0 1 1
Form the augmented matrix by putting the 3 × 3 identity matrix on the right of the original matrix:
1 2 −1 1 0 0
−1 −1 3 0 1 0.
0 1 1 0 0 1
Next, row reduce the augmented matrix. The row operations are entirely determined by the block
on the left, which is the original matrix. The row operations turn the left block into the identity, while
simultaneously turning the identity in the right block into the inverse.
1 2 −1 1 0 0 1 2 −1 1 0 0
−1 −1 3 0 1 0 → 0 1 2 1 →
1 0
r2 → r2 + r1 r3 → r3 − r2
0 1 1 0 0 1 0 1 1 0 0 1
13
1 2 −1 1 0 0 1 0 −5 −1 −2 0
0 1 → →
2 11 0 0 1 2 1 1 0
r1 → r1 − 2r2 r3 → −r3
0 0 −1 −1
−1 1 0 0 −1 −1 −1 1
1 0 −5 −1 −2 0 1 0 0 4 3 −5
0 1 → →
2 1 1 0 0 1 2 1 1 0
r1 → r1 + 5r3 r2 → r2 − 2r3
0 0 1 1 1 −1 0 0 1 1 1 −1
1 0 0 4 3 −5
0 1 0 −1 −1 2 .
0 0 1 1 1 −1
Thus,
−1
1 2 −1 4 3 −5
−1 −1 3 = −1 −1 2 .
0 1 1 1 1 −1
In the future, I won’t draw the vertical bar between the two blocks; you can draw it if it helps you keep
the computation organized.
Example. (Inverting a matrix over Zp ) Find the inverse of the following matrix over Z3 :
1 0 2
1 1 1.
2 1 1
The theorem also tells us about the number of solutions to a system of linear equations.
Proposition. Let F be a field, and let Ax = b be a system of linear equations over F . Then:
(a) If F is infinite, then the system has either no solutions, exactly one solution, or infinitely many
solutions.
(b) If F is a finite field with pn elements, where p is prime and n ≥ 1, then the system has either no
solutions, exactly one solution, or at least pn solutions.
Proof. Suppose the system has more than one solution. I must show that there are infinitely many solutions
if F is infinite, or at least pn solutions if F is a finite field with pn elements.
14
Since there is more than one solution, there are at least two different solutions. So let x1 and x2 be two
different solutions to Ax = b:
Ax1 = b and Ax2 = b.
Subtracting the equations gives
Thus, x1 + t(x1 − x2 ) is a solution to Ax = b. Moreover, the only way two solutions of the form
x1 + t(x1 − x2 ) can be the same is if they have the same t. For
← r1 → ← r1 → ← r1 →
.. .. ..
. . .
D ← ax + y → = a · D← x → + D ← y →.
.. .. ..
. . .
← rn → ← rn → ← rn →
det A or |A|.
1
You still want a formula or a recipe for computing determinants — right? Well, in some of the examples
below, we’ll see how you can compute determinants just using the axioms, or using row reduction. Be
patient, and you’ll feel better shortly!
Let’s see how the axioms look like in particular cases.
The first axiom (linearity) is probably the hardest to understand. It allows you to add or subtract, or
move constants in and out, in a single row, assuming that all the other rows stay the same. It is easier to
show you what this means than to describe it in words.
For example, here are two determinants being combined into one. The two third rows are added, and
the other two rows (which must be the same in both matrices) are unchanged:
a b c a b c a b c
D d e f +D d e f = D d e f .
x1 x2 x3 y1 y2 y3 x1 + y1 x2 + y2 x3 + y3
You can also take a single determinant apart into two determinants. In this example, we have subtraction
instead of addition. All the action takes place in row 2; the first and third rows are the same in all of the
matrices.
a b c a b c a b c
D x1 − y1 x2 − y2 x3 − y3 = D x1 x2 x3 − D y1 y2 y3 .
d e f d e f d e f
Linearity also allows you to factor a constant out of a single row, in this case row 1:
kx1 kx2 kx3 x1 x2 x3
D a b c = k ·D a b c .
d e f d e f
You can do the opposite: Multiply a constant outside the determinant into a single row:
a b 5a 5b
5·D =D .
c d c d
2
start with a matrix and produce a number. In fact, the three axioms above are enough to be able to compute
determinants (though not very efficiently). Here’s an example.
Suppose I have a determinant function D for 2 × 2 real matrices — so D satisfies the three axioms
above. Using only the axioms, I’ll compute
1 −1
D .
3 2
First, I’ll break up the second row into a sum of a multiple of the first row and another vector:
Then I can use linearity to break the determinant up into two pieces.
1 −1 1 −1 1 −1 1 −1 1 −1 1 −1
D =D =D +D = 3·D +5·D =
3 2 3 + 0 −3 + 5 3 −3 0 5 1 −1 0 1
1 −1 1 −1 1 −1 1 −1
3·D +5·D = 3·0+5·D =5·D .
1 −1 0 1 0 1 0 1
Notice that in the second equality the first row stays the same, while the new second rows are (3, −3)
and (0, 5).
Notice that linearity also allows me to factor 3 and 5 out of the second rows for the third equality.
The alternating axiom says that a matrix with two equal rows has determinant 0, and that gave me the
fifth equality.
Now I do a similar trick with the first row:
1 −1 1 + 0 0 + −1 1 0 0 −1
5·D =5·D =5·D +5·D =
0 1 0 1 0 1 0 1
0 1
5 · 1 + (−1) · 5 · D = 5 + (−5) · 0 = 5.
0 1
The second equality used linearity applied to the first row: (1 + 0, 0 + −1) = (1, 0) + (0, −1). The third
equality used the fact that the determinant of the identity matrix is 1, and used linearity to factor −1 out
of the second row. The fourth equality used the fact that the determinant of a matrix with two equal rows
is 0.
Thus,
1 −1
D = 5.
3 2
Notice that we computed a determinant using only the axioms for a determinant. We don’t have a
formula at the moment (though you may have seen a formula for 2 × 2 determinants before). It’s true that
the computation took a lot of steps, and this is not the best way to do this — but this example gives some
evidence that our axioms actually tell what determinants are.
The next two results are often useful in computations.
First, you might suspect that a matrix with an all-zero row has determinant 0, and it’s easy to prove
using linearity. Rather than give a formal proof, I’ll illustrate the idea with a particular example.
0 0 0 0+0 0+0 0+0 0 0 0 0 0 0
D 2 0 17 = D 2 0 17 = D 2 0 17 + D 2 0 17 ,
7 −6 1 7 −6 1 7 −6 1 7 −6 1
The last equation says “(stuff) = (stuff) + (stuff)”. This means that (stuff) = 0. So
0 0 0
D 2 0 17 = 0.
7 −6 1
3
The same idea can be used to prove the result in general.
The next result tells us what happens to a determinant when we swap two rows of the matrix.
Lemma. If D : M (n, r) → R) is a function which is linear in the rows (Axiom 1) and is 0 when a matrix
has equal rows (Axiom 2), then swapping two rows multiplies the value of D by −1:
.. ..
. .
← ri → ← rj →
..
D
= −D
.. .
. .
← rj → ← ri →
.. ..
. .
Proof. The proof will use the first and second axioms repeatedly. The idea is to swap rows i and j by
adding or subtracting rows.
In the diagrams below, the gray rectangles represent rows which are unchanged and the same in all of
the matrices. All the “action” takes place in row i and row j.
D = D + D =D = D - D =
D =D + D = D = -D
Notice that in each addition or subtraction step (the steps that use Axiom 1), only one of row i or row
j changes at a time.
Remarks. (a) I’ll show later that it’s enough to assume (instead of Axiom 2) that D(A) = 0 vanishes
whenever two adjacent rows of A are equal. (This is a technical point which you can forget about until we
need it.)
(b) Suppose that D : M (n, r) → R is a function satisfying Axioms 1 and 3, and suppose that swapping two
rows multiplies the value of D by −1. Must D satisfy Axiom 2? In other words, is “swapping multiplies the
value by −1” equivalent to “equal rows means determinant 0”?
Assuming that swapping two rows multiplies the value of D by −1, I have
← r1 → ← r1 →
.. ..
.
.
← x → ← x →
.. ..
D(A) = D .
= −D
.
= −D(A).
← x → ← x →
.. ..
. .
← rn → ← rn →
(I swapped the two equal x-rows, which is why the matrix didn’t change. But by assumption, this
useless swap multiplies D by −1.)
4
Hence, 2 · D(A) = 0.
If R is R, Q, C, or Zn for n prime and not equal to 2, then 2 · D(A) = 0 implies D(A) = 0. However, if
R = Z2 , then 2x = 0 for all x. Hence, 2 · D(A) = 0, no matter what D(A) is. I can’t conclude that D(A) = 0
in this case. Therefore, Axiom 2 need not hold. You can see, however, that it will hold if R is a field of
characteristic other than 2.
Fortunately, since I took “equal rows means determinant 0” as an axiom for determinants, and since the
lemma shows that this implies that “swapping rows multiplies the determinant by −1”, I know that both of
these properties will hold for determinant functions.
Example. (Computing determinants using the axioms) Suppose that D : M (3, R) → R is a determi-
nant function and
← a1 → ← a2 →
D ← b → = 5 and D ← b → = −3.
← c → ← c →
Compute
← c →
D← b →.
← a1 + 4a2 →
← c → ← a1 + 4a2 →
D← b → = −D ← b → =
← a1 + 4a2 → ← c →
← a1 → ← a2 →
− D ← b → + 4D ← b → = −(5 + 4 · (−3)) = 7.
← c → ← c →
Elementary row operations are used to reduce a matrix to row reduced echelon form, and as a conse-
quence, to solve systems of linear equations. We can use them to compute determinants with more ease than
using the axioms directly — and, even when we have some better algorithms (like expansion by cofactors),
row operations will be useful in simplifying computations. How are determinants affected by elementary row
operations?
Adding a multiple of a row to another row does not change the determinant. Suppose, for example, I’m
performing the operation ri → ri + a · rj . Let
..
.
← ri →
..
A= .
.
← rj →
..
.
Then
.. .. .. ..
. . . .
← ri + a · rj → ← ri → ← rj → ← ri →
.. .. .. ..
D = D +a·D = D = D(A).
. . . .
← r j → ← rj → ← rj → ← rj →
.. ..
.. ..
. . . .
5
Therefore, this kind of row operation leaves the determinant unchanged.
The alternating property implies that swapping two rows multiplies the determinant by −1. For example,
a b c d
D = −D .
c d a b
Our third kind of row operation involves multiplying a row by a number (which must be invertible in
the ring from which the entries of the matrix come). So if I wanted to multiply the second row of a real
matrix by 19, I could do this:
a b 1 a b 1 a b
D = · 19 · D = D .
c d 19 c d 19 19c 19d
1
Thus, multiplying the second row by 19 leaves a factor of outside.
19
However, when you’re using row operations to compute a determinant, you usually want to factor a
number out of a row, which you can do using the linearity axiom. Thus:
6a 6b a b
D = 6·D .
c d c d
1 5 = 1 5 = 1 5 =
D D 19 · D
−3 4 r2 → r2 + 3r1 0 19 (Linearity) 0 1 r1 → r1 − 5r2
1 0
19 · D = 19 · 1 = 19.
0 1
In Z5 , I have 1 = 2 · 3. So
2 1 2·1 2·3 = 1 3 =
D =D 2·D
3 4 3 4 (Linearity) 3 4 r2 → r2 + 2r1
1 3
2·D = 2 · 0 = 0.
0 0
I used the fact that a matrix with an all-zero row has determinant 0.
Your experience with row reducing matrices tells you that either the row reduced echelon form will be the
identity, or it will have an all-zero row at the bottom. In the second case, we’ve seen that the determinant is
0. In the first case, there may be constants multiplying the determinant, and the determinant of the identity
is 1 — and so, you know the value of the determinant by multiplying everything together.
6
Row reduction gives you a way of computing determinants that is a little more practical than applying
the axioms directly. It should also convince you that, starting with the three determinant axioms, we now
have something which takes a square matrix and produces a number.
There are still some questions we need to address. We need to prove that there really are functions
which satisfy the three axioms. (Just being able to compute in particular cases is not a proof.) This is called
an existence question. We will answer this question by producing an algorithm which gives a determinant
function for square matrices. It is called expansion by cofactors.
Could there be multiple determinant functions? Could more than one function on square matrices
satisfy the axioms? This seems unlikely given that we were able to start with numerical matrices and
compute specific numbers — but maybe a different approach might produce a different answer. This is
called a uniqueness question.
We will show that, in fact, there is only one determinant function — a function satisfying the three
axioms — on square matrices. It can be computed in various ways, but you’ll get the same answer in all
cases.
Along the way, we’ll find another approach which uses permutations to compute determinants. We’ll
also prove some important properties of determinants, such as the rule for products.
D [ a ] = a.
D [ ax + y ] = ax + y = a · D [ x ] + D [ y ] .
1
All three axioms have been verified, so D is a determinant function.
To save writing, we often write
1 2 1 2
for D .
3 4 3 4
This is okay, since we’ll eventually find out that there is only one determinant function on n×n matrices.
You can also write “det” for the determinant function.
For example, on M (2, R),
1 2
= 4 − 6 = −2.
3 4
Let’s move on to the inductive step for the main result. Earlier I discussed the connection between
“swapping two rows multiplies the determinant by −1” and “when two rows are equal the determinant is 0”.
The next lemma is another piece of this picture. It says for a function which is linear in the rows, if “when
two adjacent rows are equal the determinant is 0”, then “swapping two rows multiplies the determinant by
−1”. (Axiom 2 does not require that the two equal rows be adjacent.) It’s a technical result which is used
in the proof of the main theorem and nowhere else, and the proof is rather technical as well. You could skip
it and refer to the result when it’s needed in the main theorem proof.
Lemma. Let f : M (n, r) → R be a function which is linear in each row and satisfies f (A) = 0 whenever
two adjacent rows are equal. Then swapping any two rows multiplies the value of f by −1.
Proof. First, I’ll show that swapping two adjacent rows multiplies the value of f by −1. I’ll show the
required manipulations in schematic form. The adjacent rows are those with the x’s and y’s; as usual, the
vertical dots represent the other rows, which are the same in all of the matrices.
.. .. .. ..
. . . .
← x → ← y + (x − y) → ← y → ← x− y →
f =f =f +f =
← y → ← y → ← y → ← y →
.. .. .. ..
. . . .
.. .. ..
..
. . . .
← x−y
→
← x − y →
← x−y → + f ← x − y → =
f =f
← x + (y − x)
=f
← y → → ← x → ← y − x →
.. .. .. ..
. . . .
.. .. .. ..
. . . .
← x → ← y → ← x − y → ← y →.
f −f −f = −f
← x → ← x → ← x − y → ← x →
.. .. .. ..
. . . .
Next, I’ll show that you can swap any two rows by swapping adjacent rows and odd number of times.
Then since each swap multiplies f by −1, and since (−1)(odd number) = −1, it follows that swapping two
non-adjacent rows multiplies the value of f by −1.
To illustrate the idea, suppose the rows to be swapped are rows 1 and n. I’ll indicate how to do the
swaps by just displaying the row numbers. First, I swap row 1 with the adjacent row below it n − 1 times
to move it from the top of the matrix to the n-th (bottom) position:
←− r1 −→ ←− r2 −→ ←− r2 −→
←− r2 −→ ←− r1 −→ ←− r3 −→
←− r3 −→ ←− r3 −→ ←− r1 −→
. → . → . → ··· →
. . .
. ←− rn−2 −→ . ←− rn−2 −→ . ←− rn−2 −→
←− r −→ ←− r −→ ←− r −→
n−1 n−1 n−1
←− rn −→ ←− rn −→ ←− rn −→
2
←− r2 −→ ←− r2 −→ ←− r2 −→
←− r3 −→ ←− r3 −→ ←− r3 −→
←− r4 −→ ←− r4 −→ ←− r4 −→
. → . → .
. . .
. ←− r1 −→ . ←− rn−1 −→ . ←− rn−1 −→
←− r −→ ←− r −→ ←− r −→
n−1 1 n
←− rn −→ ←− rn −→ ←− r1 −→
Next, I swap (the old) row n with the adjacent row above it n − 2 times to move it to the top of the
matrix:
←− r2 −→ ←− r2 −→ ←− r2 −→
←− r3 −→ ←− r3 −→ ←− r3 −→
←− r4 −→ ←− r4 −→ ←− r4 −→
. → . → . → ··· →
. . .
. ←− rn−1 −→ . ←− rn −→ . ←− rn−2 −→
←− r −→ ←− r −→ ←− r −→
n n−1 n−1
←− r1 −→ ←− r1 −→ ←− r1 −→
←− r2 −→ ←− r2 −→ ←− rn −→
←− r3 −→ ←− rn −→ ←− r2 −→
←− rn −→ ←− r3 −→ ←− r3 −→
. → . → .
. . .
. ←− rn−2 −→ . ←− rn−2 −→ . ←− rn−2 −→
←− r −→ ←− r −→ ←− r −→
n−1 n−1 n−1
←− r1 −→ ←− r1 −→ ←− r1 −→
The original rows 1 and n have swapped places, and I needed (n− 1)+ (n− 2) = 2n− 3 swaps of adjacent
rows to do this. Now 2n − 3 is an odd number, and (−1)2n−3 = −1.
In general, if you want to swap row i and row j, where i < j, following the procedure above will require
that you swap 2j − 2i − 1 adjacent rows, an odd number. Once again, (−1)2j−2i−1 = −1.
Thus, swapping two non-adjacent rows multiplies the value of f by −1.
Definition. Let A ∈ M (n, r). Let A(i | j) be the (n − 1) × (n − 1) matrix obtained by deleting the i-th row
and j-th column of A. If D is a determinant function, then D[A(i | j)] is called the (i, j)th minor of A.
In the picture below, the i-th row and j-th column are shown in gray; they will be deleted, and the
remaining (n − 1) × (n − 1) matrix is A(i | j). Its determinant is the (i, j)th minor.
column j
row i
The (i, j)th cofactor of A is (−1)i+j times the (i, j)th minor, i.e. (−1)i+j · D[A(i | j)].
Find the (2, 3)th minor and the (2, 3)th cofactor.
3
To find the (2, 3)th minor, remove the 2nd row and the 3rd column (i.e. the row and column containing
the (2, 3)th element):
1 2 ∗
∗ ∗ ∗.
7 8 ∗
1 2
= 8 − 14 = −6.
7 8
To get the (2, 3)th cofactor, multiply this by (−1)2+3 = (−1)5 = −1. The (2, 3)th cofactor is (−1)·(−6) =
6.
Note: The easy way to remember whether to multiply by +1 or −1 is to make a checkboard pattern
of +’s and −’s:
+ − +
− + −.
+ − +
Use the sign in the (i, j)th position. For example, there’s a minus sign in the (2, 3)th position, which
agrees with the sign I computed using (−1)i+j .
The main result says that we may use cofactors to extend a determinant function on (n − 1) × (n − 1)
matrices to a determinant function on n × n matrices.
Theorem. (Expansion by cofactors) Let R be commutative ring with identity, and let C be a determinant
function on M (n − 1, R). Let A ∈ M (n, R). For any j ∈ {1, . . . , n}, define
n
X
D(A) = (−1)i+j Aij C (A(i | j)) .
i=1
Notice that the summation is on i, which is the row index. The index j is fixed, and it indexes columns.
This means you’re moving down the j th column as you sum. Consequently, this is a cofactor expansion
by columns.
← r1 → ← r1 → ← r1 →
.. .. ..
. . .
D ← ax + y → = a · D← x →+ D← y →.
.. .. ..
. . .
← rn → ← rn → ← rn →
All the action is taking place in the k-th row — the one with the x’s and y’s — and the other rows are
the same in the three matrices.
4
Label the three matrices above:
← r1 → ← r1 → ← r1 →
.. .. ..
. . .
P = ← ax + y → , Q = ← x →, R = ← y →.
.. .. ..
. . .
← rn → ← rn → ← rn →
I’m going to show that the ith term in the sum on the left equals the sum of the ith terms of the sums
on the right. To to this, I will consider two cases: i 6= k and i = k.
First, consider a term in the cofactor sum where i 6= k — that is, where the row that is deleted is not the
k th row. The ith row and j th column are deleted from each matrix, as shown in gray in the picture below.
Since C is a determinant function, I can apply linearity to the ax + y row, and I get the following equation:
row i
C =aC +C
row k
The matrices in this picture are, from left to right, P (i | j), Q(i | j), and R(i | j). The matrices are the
same except in the k th row (the one with the x’s and y’s).
Thus, the terms of the summation on the two sides agree for i 6= k:
(−1)i+j Aij C (P (i | j)) = a · (−1)i+j Aij C (Q(i | j)) + (−1)i+j Aij C (R(i | j))
Next, consider the case where i = k, so the row that is deleted is the k th row. Now P , Q, and R only
differ in row k, which is the row which is deleted.
C =aC +C
row i = row k
Therefore,
(−1)k+j (axj + yj )C (P (k | j)) = (−1)k+j axj C (Q(k | j)) + (−1)k+j yj C (R(k | j)) .
Thus, the terms on the left and right are the same for all i, and D is linear.
5
Alternating: I have to show that if two rows of A are the same, then D(A) = 0.
First, I’ll show that if rows 1 and 2 are equal, then D(A) = 0.
Suppose then that rows 1 and 2 are equal. Here’s a typical term in the cofactor expansion:
Suppose row i, the row that is deleted, is a row other than row 1 or row 2.
row 1
row 2
row i
Then the matrix that results after deletion will have two equal rows, since row 1 and row 2 were equal.
Therefore, C (A(i | j)) = 0, and the term in the cofactor expansion is 0.
Thus, all the terms in the cofactor expansion are 0 except the first and second (i = 1 and i = 2). These
terms are
(−1)1+j A1j C (A(1 | j)) + (−1)2+j A2j C (A(2 | j)) .
Now A1j = A2j , since the first and second rows are equal. And since row 1 and row 2 are equal, I get
the same matrix by deleting either row 1 or row 2:
row 1 row 1
row 2 row 2
That is, A(1 | j) = A(2 | j). Therefore, C (A(1 | j)) = C (A(2 | j)).
Thus, the only way in which the two terms above differ is in the signs (−1)1+j and (−1)2+j . But 1 + j
and 2 + j are consecutive integers, so one must be even and the other must be odd. Hence, (−1)1+j and
(−1)2+j are either +1 and −1 or −1 and +1. In either case, the terms cancel, and the sum of the two terms
is 0.
Hence, the cofactor expansion is equal to 0, and D(A) = 0, as I wished to prove.
Tou can give a similar argument if A has two adjacent rows equal other than rows 1 and 2 (so, for
instance, if rows 4 and 5 are equal). I will skip the details.
Thus, I know that D(A) = 0 if two adjacent rows of A are equal. Since I proved that D satisfies the
linearity axiom, the hypotheses of the previous technical lemma are satisfied, and I can apply it. It says that
swapping two rows multiplies the determinant by −1.
Now take the general case: Two rows of A are equal, but they aren’t necessarily adjacent.
I can swap the rows of A until the two equal rows are adjacent, and each swap multiplies the value of
the determinant by −1. Let’s say that k swaps are needed to get the two equal rows to be adjacent. That
is, after k row swaps, I get a matrix B which has adjacent equal rows. Then D(B) = 0 by the adjacent row
case above, so
D(A) = (−1)k D(B) = (−1)k · 0 = 0.
This completes the proof that D(A) = 0 if A has two equal rows, and the Alternating Axiom has been
verified.
6
The identity has determinant 1: Suppose A = I. Since the entries of the identity matrix are 0 except
on the main diagonal, I have Aij = 0 unless i = j. When i = j, I have Ajj = 1. Therefore, the cofactor
expansion of D(A) has only one nonzero term, which is
1 3 −5
1 −2 1 −5 1 −5
1 0 −2 = −(3) + (0) − (1) = −42.
6 1 6 1 1 −2
6 1 1
This diagram shows where the terms in the cofactor expansion come from:
1 3 -5 1 3 -5 1 3 -5
1 0 -2 1 0 -2 1 0 -2
6 1 1 6 1 1 6 1 1
7
For each element (3, 0, 1) in the second column, compute the cofactor for that element by crossing out
the row and column containing the element (cross-outs shown in gray), computing the determinant of the
2 × 2 matrix that is left, and multiplying by the sign (+ or −) that comes from the “checkerboard pattern”).
Then multiply the cofactor by the column element.
So for the first term, the element is 3, the sign is “−”, and after crossing out the first row and second
column, the 2 × 2 determinant that is left is
1 −2
.
6 1
As usual, this is harder to describe in words than it is to actually do. Try a few computations yourself.
Finally, I computed the 2 × 2 determinants using the 2 × 2 determinant formula I derived earlier.
You can often simplify a cofactor expansion by doing row operations first. For instance, if you can
produce a row or a column with lots of zeros, you can expand by cofactors of that row or column.
Example. (Computing a determinant using row operations and cofactors) Compute the determi-
nant of the following matrix in M (3, Z3 ):
1 2 1
2 2 1
2 0 1
I’ll do a couple of row operations first to make some zeros in the first column. Remember that adding
a multiple of a row to another row does not change the determinant.
1 2 1 1 2 1 1 2 1
→ →
2 2 1 0 1 2 0 1 2 .
r → r2 + r1 r → r2 + r1
2 0 1 2 2 0 1 2 0 2 2
Now I expand by cofactors of column 1. The two zeros make the computation easy; I’ll write out those
terms just so you can see the cofactors:
1 2 1
1 2 2 1 2 1
0 1 2 =1· −0· +0· = (2 − 4) − 0 + 0 = −2 = 1.
2 2 2 2 1 2
0 2 2
Proof. This proof is a little complicated; if you wish to skip it for now, at least try to understand the
statement of the theorem (and see the discussion that follows).
Write the matrix in terms of its rows:
← r1 →
← r2 →
A= .. .
.
← rn →
I can use linearity applied to row 1 to expand the determinant of A into a sum of determinants:
P
← j A1j ej → ← ej →
← r2 → X ← r2 →
D(A) = D .. = A1j D .. .
.
j .
← rn → ← rn →
Use linearity applied to row 2 to expand the determinant terms in the last D-sum:
← ej →
XX ← ek →
D(A) = A1j A2k D .. .
j k
.
← rn →
1
Continue in this way for all n rows. I’ll switch notation and use j1 , j2 , . . . , jn as the summation indices:
← ej1 →
XX X ← ej2 →
D(A) = ... A1j1 A2j2 · · · Anjn D .. .
.
j1 j2 jn
← ejn →
If two of the j’s are equal, then the e’s are equal — e.g. if j3 = j7 , then ej3 = ej7 . But D is 0 on matrices
with equal rows. So terms with two of the j’s equal are 0. Hence, I only need to consider terms where all
the j’s are distinct numbers in the set {1, 2, . . . n}. This means that {j1 , j2 , . . . , jn } is a permutation of
{1, 2, . . . , n}. So I can just sum over all permutations in σ ∈ Sn :
← eσ(1) →
X ← eσ(2) →
D(A) = A1σ(1) A2σ(2) · · · Anσ(n) D .
..
σ∈Sn
.
← eσ(n) →
Therefore, X
D(A) = A1σ(1) A2σ(2) · · · Anσ(n) sgn(σ)D(I).
σ∈Sn
Corollary. Let R be a commutative ring with identity, and let A ∈ M (n, R).
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn
Proof. Since the determinant function det defined by expansion by cofactors is alternating and linear in
each row, it satisfies the conditions of the theorem. Since det I = 1, the formula in the theorem becomes
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn
2
σ(1), σ(2), . . . σ(n). This is a permutation of {1, 2, . . . n}, which means each of the numbers from 1 to n is
chosen exactly once. This means that we’re also choosing the entries so that one comes from each column.
We’re summing over all permutations of {1, 2, . . . n}, so we’re choosing entries for our products in all such
ways. Let’s see how this looks for small matrices.
Consider a 2 × 2 matrix:
a b
c d
I have to choose 2 entries at a time, so the 2 entries come from different rows and columns. I multiply
the 2 chosen entries to get one of the product terms. There are 2! = 2 ways to do this; they are
a d, bc
I have to choose 3 entries from the matrix at a time, in such a way that there is one entry from each
row and each column. For each such choice, I take the product of the three elements and multiply by the
sign of the permutation of the elements, which I’ll describe below. Finally, I add up the results.
In order to do this systematically, focus on the first column. I can choose 2, 1, or 5 from column 1.
If I choose 2 from column 1, I can either choose −1 from column 2 and 1 from column 3, or 3 from
column 2 and 2 from column 3. (Remember that I can’t have two elements from the same row or column.)
2 ∗ ∗ 2 ∗ ∗
∗ −1 ∗ ∗ ∗ 2
∗ ∗ 1 ∗ 3 ∗
If I choose 1 from column 1, I can either choose 1 from column 2 and 1 from column 3, or 3 from column
2 and 4 from column 3.
∗ 1 ∗ ∗ ∗ 4
1 ∗ ∗ 1 ∗ ∗
∗ ∗ 1 ∗ 3 ∗
Finally, if I choose 5 from column 1, I can either choose 1 from column 2 and 2 from column 3, or −1
from column 2 and 4 from column 3.
∗ 1 ∗ ∗ ∗ 4
∗ ∗ 2 ∗ −1 ∗
5 ∗ ∗ 5 ∗ ∗
3
This gives me 6 products:
Next, I have to attach a sign to each product. To do this, I count the number of row swaps I need to
move the 1’s in the identity matrix into the same positions as the numbers in the product. I’ll illustrate with
two examples.
∗ ∗ 4 1 0 0 0 1 0 0 0 1
1 ∗ ∗ : 0 1 0 → 1 0 0 → 1 0 0
r1 ↔ r2 r1 ↔ r3
∗ 3 ∗ 0 0 1 0 0 1 0 1 0
It took 2 row swaps to move the 1’s into the same positions as 1, 3, and 4. Since 2 is even, the sign of
(1)(3)(4) is +1.
∗ ∗ 4 1 0 0 0 0 1
∗ −1 ∗ : 0 1 →
0 0 1 0
r1 ↔ r3
5 ∗ ∗ 0 0 1 1 0 0
It took 1 row swap to move the 1’s into the same positions as 7, 5, and 3. Since 1 is odd, the sign of
(5)(−1)(4) is −1.
Continuing in this fashion, I get
2 1 4
det 1 −1 2 = (2)(−1)(1) − (2)(3)(2) − (1)(1)(1) + (1)(3)(4) + (5)(1)(2) − (5)(−1)(4) = 27.
5 3 1
Notice how ugly the computation was! While the permutation formula can be used for computations,
it’s easier to use row or column operations or expansion by cofactors. The main point of the permutation
formula lies in the following Corollary. It says there is only one function on n × n matrices which satisfies
the three axioms for a determinant — the determinant function is unique. Row reduction, expansion by
cofactors, and the permutation formula give different ways of computing the same thing.
The permutation formula is connected to a trick for computing determinants of 3 × 3 matrices. You
may have seen this trick in other math courses, or in physics courses. I’ll illustrate with the matrix in the
last example.
Warning: This only works on determinants which are 3 × 3!
Begin by making copies of the first two columns of the matrix. Put the copies to the right of the original
matrix:
2 1 4 2 1
1 -1 2 1 -1
5 3 1 5 3
Next, draw diagonal lines through the elements as shown below. Three lines down and to the right,
three lines up and to the right:
2 1 4 2 1
1 -1 2 1 -1
5 3 1 5 3
Form products by multiplying the elements along each line. The products of the “down and right” lines
get plus signs, and the products of the “down and left” lines get minus signs:
4
You can see that we got the same terms as we got with the permutation formula, with the factors and
the terms in a different order.
Again, I emphasize that this trick only works on matrices which are 3 × 3! You can’t use it on matrices
of any other size. It’s not bad for 3 × 3 determinants you’re computing by hand, so feel free to use it if you
wish. Don’t try to use it on determinants which are 2 × 2, 4 × 4, and so on.
Corollary. Let R be a commutative ring with identity. There is a unique determinant function | · | :
M (n, R) → R.
Proof. The permutation formula says
X
det A = sgn(σ)A1σ(1) · · · Anσ(n) .
σ∈Sn
But the right side only depends on the entries of the matrix A. So D(A) is completely determined by
A, and there can be only one determinant function on n × n matrices.
We know that the determinant function defined by cofactor expansion satisfies the axioms for a deter-
minant function. Therefore, it is the only determinant function on n × n matrices.
This doesn’t mean that you can’t compute the determinant in different ways; in fact, the permutation
formula gives a different way of computing determinants than cofactor expansion. To say that there’s only
one determinant function means that any function satisfying the determinant axioms will give the same
answer as any other function satisfying the determinant axioms, for a given matrix.
Remark. Here’s another way to express the theorem. Suppose det denotes the determinant function on
M (n, R). If D : M (n, r) → R is alternating and linear in each row, then
In other words, a function which satisfies the first two axioms for a determinant function is a multiple
of the “real” determinant function, the multiple being the value the function takes on the identity matrix.
In the case of the “real” determinant function, the third axiom says det I = 1, so the multiple is 1 and D is
the “real” determinant function.
X n
Y X n
Y
|AT | = sgn(σ) ATiσ(i) = sgn(σ) Aσ(i)i =
σ∈Sn i=1 σ∈Sn i=1
X n
Y X n
−1
Y
sgn(σ) Ajσ−1 (j) = sgn σ Ajσ−1 (j) =
σ∈Sn j=1 σ∈Sn j=1
X n n
Y X Y
sgn σ −1 Ajσ−1 (j) = sgn (τ ) Ajτ (j) = |A|.
σ−1 ∈Sn j=1 τ ∈Sn j=1
In the fourth equality, I went from summing over σ in Sn to σ −1 in Sn . This is valid because permutations
are bijective functions, so they have inverse functions which are also permutations. So summing over all
permutations in Sn is the same as summing over all their inverses in Sn — you will get the same terms in
the sum, just in a different order.
I got the next-to-the-last equality by letting τ = σ −1 . This just makes it easier to recognize the
next-to-last expression as the permutation formula for |A|.
Remark. We’ve used row operations as an aid to computing determinants. Since the rows of A are the
columns of AT and vice versa, the Corollary implies that you can also use column operations to compute
determinants. The allowable operations are swapping two columns, multiplying a column by a number, and
adding a multiple of a column to another column. They have the same effects on the determinant as the
corresponding row operations.
This also means that you can compute determinants using cofactors of rows as well as columns.
In proving the uniqueness of determinant functions, we showed that if D is a function on n × n matrices
which is alternating and linear on the rows, then D(M ) = (det M )D(I). We will use this to prove the
product rule for determinants.
Theorem. Let R be a commutative ring with identity, and let A, B ∈ M (n, R). Then |AB| = |A||B|.
Proof. Fix B, and define
D(A) = |AB|.
I will show that D is alternating and linear, then apply a result I derived in showing uniqueness of
determinant functions.
Let ri denote the i-th row of A. Then
← r1 B →
← r2 B →
D(A) = .. .
.
← rn B →
1
Now | · | is alternating, so interchanging two rows in the determinant above multiplies D(A) by −1.
Hence, D is alternating.
Next, I’ll show that D is linear:
.. ..
. .
D ← kx + y → = ← (kx + y)B → =
.. ..
. .
.. .. .. ..
. . . .
k · ← xB → + ← yB → = k · D← x → + D← y →.
.. .. .. ..
. . . .
This proves that D is linear in each row.
Since D is a function on M (n, R) which is alternating and linear in the rows, the result I mentioned
earlier shows
D(A) = |A|D(I).
But D(A) = |AB| and D(I) = |IB| = |B|, so we get
In other words, the determinant of a product is the product of the determinants. A similar result holds
for powers.
Corollary. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then for every m ≥ 0,
|Am | = |A|m .
Proof. This follows from the previous result using induction. The result is obvious for m = 0 and m = 1
(note that A0 = I, the identity matrix), and the case m = 2 follows from the previous result if we take
B = A.
Suppose the result is true for m, so |Am | = |A|m . We need to show that the result holds for m + 1. We
have
|Am+1 | = |Am A| = |Am ||A| = |A|m |A| = |A|m+1 .
We used the case m = 2 to get the second equality, and the induction assumption was used to get the
third equality. This proves the result for m + 1, so it holds for all m ≥ 0 by induction.
While the determinant of a product is the product of the determinants, the determinant of a sum is not
necessarily the sum of the determinants.
Example. Give a specific example of 2 × 2 real matrices A and B for which det(A + B) 6= det A + det B.
1 0 −1 0
det = 1 and det = 1.
0 1 0 −1
But
1 0 −1 0 0 0
det + = det = 0.
0 1 0 −1 0 0
The rule for products gives us an easy criterion for the invertibility of a matrix. First, I’ll prove the
result in the special case where the entries of the matrix are elements of a field.
Theorem. Let F be a field, and let A ∈ M (n, F ).
2
A is invertible if and only if |A| 6= 0.
Proof. If A is invertible, then
|A||A−1 | = |AA−1 | = |I| = 1.
This equation implies that |A| 6= 0 (since |A| = 0 would yield “0 = 1”).
Conversely, suppose that |A| 6= 0. Suppose that A row reduces to the row reduced echelon matrix R,
and consider the effect of elementary row operations on |A|. Swapping two rows multiplies the determinant
by −1. Adding a multiple of a row to another row leaves the determinant unchanged. And multiplying a
row by a nonzero number multiplies the determinant by that nonzero number. Clearly, no row operation
will make the determinant 0 if it was nonzero to begin with. Since |A| 6= 0, it follows that |R| 6= 0.
Since R is a row reduced echelon matrix with nonzero determinant, it can’t have any all-zero rows. An
n × n row reduced echelon matrix with no all-zero rows must be the identity, so R = I. Since A row reduces
to the identity, A is invertible.
Corollary. Let F be a field, and let A ∈ M (n, F ). If A is invertible, then
|A−1 | = |A|−1 .
Compute |AT B 2 C −1 |.
1 1
We have |AT | = |A| = 18 and |C −1 | = = . Using the product rule for determinants,
|C| 3
1
|AT B 2 C −1 | = |AT ||B|2 |C −1 | = 18 · 52 · = 150.
3
Definition. Let R be a commutative ring with identity. Matrices A, B ∈ M (n, R) are similar if there is an
invertible matrix P ∈ M (n, R) such that P AP −1 = B.
Similar matrices come up in many places, for instance in changing bases for vector spaces.
Corollary. Let R be a commutative ring with identity. Similar matrices in M (n, R) have equal determinants.
Proof. Suppose A and B are similar, so P AP −1 = B for some invertible matrix P . Then
In the third equality, I used the fact that |P −1 | and |A| are numbers — elements of the ring R — and
multiplication in R is commutative. That allows me to commute |P −1 | and |A|.
Definition. Let R be a commutative ring with identity, and let A ∈ M (n, R). The adjugate adj A is the
matrix whose i-j-th entry is
(adj A)ij = (−1)i+j |A(j | i)|.
In other words, adj A is the transpose of the matrix of cofactors.
Remark. In the past, adj A was referred to as the adjoint, or the classical adjoint. But the term “adjoint”
is now used to refer to something else: The conjugate transpose, which we’ll see when we discuss the
3
spectral theorem. So the term “adjugate” has come to replace it for the matrix defined above. One
advantage of the word “adjugate” is that you can use the same abbreviation “adj” as was used for “adjoint”!
Example. Compute the adjugate of
1 0 3
A = 0 1 1.
1 −1 2
First, I’ll compute the cofactors. The first line shows the cofactors of the first row, the second line the
cofactors of the second row, and the third line the cofactors of the third row.
1 1 0 1 0 1
+ = 3, − = 1, + = −1,
−1 2 1 2 1 −1
0 3 1 3 1 0
− = −3, + = −1, − = 1,
−1 2 1 2 1 −1
0 3 1 3 1 0
+ = −3, − = −1, + = 1.
1 1 0 1 0 1
The adjugate is the transpose of the matrix of cofactors:
3 −3 −3
adj A = 1 −1 −1 .
−1 1 1
The next result shows that adjugates and tranposes can be interchanged: The adjugate of the transpose
equals the transpose of the adjugate.
Proposition. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then
Proof. Consider the (i, j)th elements of the matrices on the two sides of the equation.
1 2 4
2 4 1 4 1 2
1 −1 0 = (2) − (3) + (−2) =
−1 0 1 0 1 −1
2 3 −2
4
Now suppose I make a mistake: I multiply the cofactors of the 3rd row by elements of the 1st row (which
are 1, 2, 4). Here’s what I get:
2 4 1 4 1 2
(1) − (2) + (4) =
−1 0 1 0 1 −1
Or suppose I multiply the cofactors of the 3rd row by elements of the 2nd row (which are 1, −1, 0).
Here’s what I get:
2 4 1 4 1 2
(1) − (−1) + (0) =
−1 0 1 0 1 −1
These examples suggest that if I try to do a cofactor expansion by using the cofactors of one row
multiplied by the elements from another row, I get 0. It turns out that this is true in general, and is the key
step in the next proof.
Theorem. Let R be a commutative ring with identity, and let A ∈ M (n, R). Then
|A| · I = A · adj A.
Proof. This proof is a little tricky, so you may want to skip it for now.
We expand |A| by cofactors of row i:
X
|A| = (−1)i+j Aij |A(i | j)|.
j
First, suppose k 6= i. Construct a new matrix B by replacing row k of A with row i of A. Thus, the
elements of B are the same as those of A, except that B’s row k duplicates A’s row i.
row k
A B
In symbols,
Alj if l 6= k
Blj =
Aij if l = k
n
X n
X
(−1)k+j Bkj |B(k | j)| = (−1)k+j Aij |A(k | j)|.
j=1 j=1
Why is |B(k | j)| = |A(k | j)|? To compute |B(k | j)|, you delete row k and column j from B. To
compute |A(k | j)|, you delete row k and column j from A. But A and B only differ in row k, which is being
5
deleted in both cases. Hence, |B(k | j)| = |A(k | j)|.
delete delete
column j column j
delete row k
A B
The deleted row k is the only row
in which A and B are different
On the other hand, B has two equal rows — its row i and row k are both equal to row i of A — so the
determinant of B is 0. Hence,
X n
(−1)k+j Aij |A(k | j)| = 0.
j=1
This is the point we illustrated prior to stating the theorem: if you do a cofactor expansion by using
the cofactors of one row multiplied by the elements from another row, you get 0. The last equation is what
we get for k 6= i. In case k = i, we just get the cofactor expansion for |A|:
n
X
(−1)i + jAij |A(i | j)| = |A|.
j=1
We can combine the two equations into one using the Kronecker delta function:
X
(−1)k+j Aij |A(k | j)| = δik |A| for all i, k.
j
Remember that δik = 1 if i = k, and δik = 0 if i 6= k. These are the two cases above.
Interpret this equation as a matrix equation, where the two sides represent the (i, k)-th entries of their
respective matrices. What are the respective matrices? Since δik is the (i, k)-th entry of the identity matrix,
the right side is the (i, k)-th entry of |A| · I.
The left side is the (i, k)-th entry of A · adj A, because
X X
(A · adj A)ik = Aij (adj A)jk = Aij (−1)j+k |A(k | j)|.
j j
Therefore,
|A| · I = A · adj A. .
I can use the theorem to obtain an important corollary. I already know that a matrix over a field is
invertible if and only if its determinant is nonzero. The next result explains what happens over a commutative
ring with identity, and also provides a formula for the inverse of a matrix.
Corollary. Let R be a commutative ring with identity. A matrix A ∈ M (n, R) is invertible if and only if
|A| is invertible in R, in which case
A−1 = |A|−1 adj A.
6
Proof. First, suppose A is invertible. Then AA−1 = I, so
I = A · |A|−1 adj A.
I = A · |A|−1 adj A.
Proof.
a b a b d −b
det = ad − bc and adj = .
c d c d −c a
Hence, the result follows from the adjugate formula.
To see the difference between the general case of a commutative ring with identity and a field, consider
the following matrices over Z6 :
5 3 2 1
and
1 1 1 3
In the first case,
5 3
det = 2.
1 1
2 is not invertible in Z6 — do you know how to prove it? Hence, even though the determinant is nonzero,
the matrix is not invertible.
2 1
det = 5.
1 3
5 is invertible in Z6 — in fact, 5 · 5 = 1. Hence, the second matrix is invertible. You can find the inverse
using the formula in the last corollary.
The adjugate formula can be used to find the inverse of a matrix. It’s not very good for big matrices
from a computational point of view: The usual row reduction algorithm uses fewer steps. However, it’s not
too bad for small matrices — say 3 × 3 or smaller.
Example. Compute the inverse of the following real matrix using the adjugate formula.
1 −2 −2
A = 3 −2 0 .
1 1 1
7
First, I’ll compute the cofactors. The first line shows the cofactors of the first row, the second line the
cofactors of the second row, and the third line the cofactors of the third row. I’m showing the “checkerboard”
pattern of pluses and minuses as well.
−2 0 3 0 3 −2
+ = −2, − = −3, + = 5,
1 1 1 1 1 1
−2 −2 1 −2 3 −2
− = 0, + = 3, − = −3,
1 1 1 1 1 1
−2 −2 1 −2 1 −2
+ = −4, − = −6, + = 4.
2 0 3 0 3 −2
The adjugate is the transpose of the matrix of cofactors:
−2 0 −4
adj A = −3 3 −6 .
5 −3 4
Another consequence of the formula |A| · I = A · adj A is Cramer’s rule, which gives a formula for the
solution of a system of linear equations.
Corollary. (Cramer’s rule) If A is an invertible n × n matrix, the unique solution to Ax = y is given by
|Bi |
xi = ,
|A|
But the last sum is a cofactor expansion of A along column i, where instead of the elements of A’s
column i I’m using the components of y. This is exactly |Bi |.
Example. Use Cramer’s Rule to solve the following system over R:
2x + y + z = 1
x + y − z = 5
3x − y + 2z = −2
8
I replace the successive columns of the coefficient matrix with (1, 5, −2), in each case computing the
determinant of the resulting matrix and dividing by the determinant of the coefficient matrix:
1 1 1 2 1 1 2 1 1
5 1 −1 1 5 −1 1 1 5
−2 −1 2 −10 10 3 −2 2 −6 6 3 −1 −2 19 19
x= = = , y= = = , z= = =− .
2 1 1 −7 7 2 1 1 −7 7 2 1 1 −7 7
1 1 −1 1 1 −1 1 1 −1
3 −1 2 3 −1 2 3 −1 2
This looks pretty simple, doesn’t it? But notice that you need to compute four 3 × 3 determinants to do
this (and I didn’t write out the work for those computations!). It becomes more expensive to solve systems
this way as the matrices get larger.
As with the adjugate formula for the inverse of a matrix, Cramer’s rule is not computationally efficient:
It’s better to use row reduction to solve large systems. Cramer’s rule is not too bad for solving systems of
two linear equations in two variables; for anything larger, you’re probably better off using row reduction.
You may have seen vectors before — in physics or engineering courses, or in multivariable calculus. In
those courses, you tend to see particular kinds of vectors, and it could lead you to think that those particular
kinds of vectors are the only kinds of vectors. We’ll discuss vectors by giving axioms for vectors. When
you define a mathematical object using a set of axioms, you are describing how it behaves. Why is this
important?
One common intuitive description of vectors is that they’re things which “have a magnitude and a
direction”. This isn’t a bad description for certain kinds of vectors, but it has some shortcomings. Consider
the following example: Take a book and hold it out in front of you with the cover facing toward you.
90 degrees away
90 degrees left
Book
Book
ok
Bo
90 degrees left 90 degrees away
Book ok
Bo
Book
First, rotate the book 90◦ away from you, then (without returning the book to its original position)
rotate the book 90◦ to your left. The first three pictures illustrate the result.
Next, return the book to its original position facing you. Rotate the book 90◦ to your left, then
(without returning the book to its original position) rotate the book (without returning the book to its
original position) away from you. The next three pictures illustrate the result.
In other words, we’re doing two rotations — 90◦ away from you and 90◦ to your left — one after the
other, in the two possible orders. Note that the final positions are different.
It’s certainly reasonable to say that rotations by 90◦ away from you or to your left are things with
“magnitudes” and “directions”. And it seems reasonable to “add” such rotations by doing one followed by
the other. However, we saw that when we performed the “addition” in different orders, we got different
results. Symbolically,
A + B 6= B + A.
The addition fails to be commutative. But it happens that we really do want vector addition to be
commutative, for all of the “vectors” that come up in practice.
It is not enough to tell what a thing “looks like” (“magnitude and direction”); we need to say how the
thing behaves. This example also showed that words like “magnitude” and “direction” are ambiguous.
Other descriptions of vectors — as “arrow”, or as “lists of numbers” — also describe particular kinds
of vectors. And as with “magnitude and direction”, they’re incomplete: They don’t tell how the “vectors”
in question behave.
Our axioms for a vector space describe how vectors should behave — and if they behave right, we don’t
care what they look like! It’s okay to think of “magnitude and direction” or “arrow” or “list of numbers”,
as long as you remember that these are only particular kinds of vectors.
1
Let’s see the axioms for a vector space.
Definition. A vector space V over a field F is a set V equipped with two operations. The first is called
(vector) addition; it takes vectors u and v and produces another vector u + v.
The second operation is called scalar multiplication; it takes an element a ∈ F and a vector u ∈ V
and produces a vector au ∈ V .
These operations satisfy the following axioms:
1. Vector addition is associative: If u, v, w ∈ V , then
(u + v) + w = u + (v + w).
u + v = v + u.
0 + u = u = u + 0 for all u ∈ V.
Note: Some people prefer to write something like “~0” for the zero vector to distinguish it from the
number 0 in the field F . I’ll be a little lazy and just write “0” and rely on you to determine whether it’s the
zero vector or the number zero from the context.
4. For every vector u ∈ V , there is a vector −u ∈ V which satisfies
u + (−u) = 0 = (−u) + u.
5. If a, b ∈ F and x ∈ V , then
a(bx) = (ab)x.
6. If a, b ∈ F and x ∈ V , then
(a + b)x = ax + bx.
7. If a ∈ F and x, y ∈ V , then
a(x + y) = ax + ay.
8. If x ∈ V , then
1 · x = x.
The elements of V are called vectors; the elements of F are called scalars. As usual, the use of words
like “multiplication” does not imply that the operations involved look like ordinary “multiplication”.
Note that Axiom (4) allows us to define subtraction of vectors this way:
x − y = x + (−y).
An easy (and trivial) vector space (over any field F ) is the zero vector space V = {0}. It consists of
a zero vector (which is required by Axiom 3) and nothing else. The scalar multiplication is a · 0 = 0 for any
a ∈ F . You can easily check that all the axioms hold.
The most important example of a vector space over a field F is given by the “standard” vector space F n .
In fact, every (finite-dimensional) vector space over F is isomorphic to F n for some nonnegative integer n.
We’ll discuss isomorphisms later; let’s give the definition of F n .
2
If F is a field and n ≥ 1, then F n denotes the set
F n = {(a1 , . . . , an ) | a1 , . . . , an ∈ F }.
If you know about (Cartesian) products, you can see that F n is the product of n copies of F .
We can also define F 0 to be the zero vector space {0}.
If v ∈ F n and v = (v1 , v2 , . . . vn ), I’ll often refer to v1 , v2 , . . . vn as the components of v.
Proposition. F n becomes a vector space over F with the following operations:
Thus, ui , vi ∈ F for i = 1, . . . n.
Then
(v1 , v2 , . . . vn ) + (u1 , u2 , . . . un ) = v + u.
The second equality used the fact that ui + vi = vi + ui for each i, because the u’s and v’s are elements
of the field F , and addition in F is commutative.
The zero vector is 0 = (0, 0, . . . 0) (n components). I’ll use “0” to denote the zero vector as well as the
number 0 in the field F ; the context should make it clear which of the two is intended. For instance, if v is
a vector and I write “0 + v”, the “0” must be the zero vector, since adding the number 0 to the vector v is
not defined.
If u = (u1 , u2 , . . . un ) ∈ F n , then
Since I already showed that addition of vectors is commutative, it follows that u + 0 = 0 as well. This
verifies Axiom 3.
If u = (u1 , u2 , . . . un ) ∈ F n , then I’ll define −u = (−u1 , −u2 , . . . − un ). Then
You can see that checking the axioms amounts to writing out the vectors in component form, applying
the definitions of vector addition and scalar multiplication, and using the axioms for a field.
3
While all vector spaces “look like” F n (at least if “n” is allowed to be infinite — the fancy word is
“isomorphism”), you should not assume that a given vector space is F n , unless you’re explicitly told that it
is. We’ll see examples (like C[0, 1] below) where it’s not easy to see why a given vector space “looks like”
F n.
In discussing matrices, we’ve referred to a matrix with a single row as a row vector and a matrix with
a single column as a column vector.
1
[1 2 3] 2
3
row vector column vector
The elements of F n are just ordered n-tuples, not matrices. In certain situations, we may “identify” a
vector in F n with a row vector or a column vector in the obvious way:
1
(1, 2, 3) ↔ [ 1 2 3 ] ↔ 2
3
(Very often, we’ll use a column vector, for reasons we’ll see later.) I will mention this identification
explicitly if we’re doing this, but for right now just think of (1, 2, 3) as an ordered triple, not a matrix.
You may have seen examples of F n -vector spaces before.
For instance, R3 consists of 3-dimensional vectors with real components, like
1
(3, −2, π) or , 0, −1.234 .
2
You’re probably familiar with addition and scalar multiplication for these vectors:
(1, −2, 4) + (4, 5, 2) = (1 + 4, −2 + 5, 4 + 2) = (5, 3, 6).
7 · (−2, 0, 3) = (7 · (−2), 7 · 0, 7 · 3) = (−14, 0, 21).
Note: Some people write (3, −2, π) as “h3, −2, πi”, using angle brackets to distinguish vectors from
points.
Recall that Z3 is the field {0, 1, 2}, where the operations are addition and multiplication mod 3. Thus,
Z23 consists of 2-dimensional vectors with components in Z3 . Since each of the two components can be any
element in {0, 1, 2}, there are 3 · 3 = 9 such vectors:
(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2).
Here are examples of vector addition and multiplication in Z23 :
(1, 2) + (1, 1) = (1 + 1, 2 + 1) = (2, 0).
2 · (2, 1) = (2 · 2, 2 · 1) = (1, 2).
We can picture elements of F as points in n-dimensional space. Let’s look at R2 , since it’s easy to
n
draw the pictures. R2 is the x-y-plane, and elements of R2 are points in the plane:
y
(2,3)
(-3,0)
x
(2,-2)
(-1,-4)
4
In the picture above, each grid square is 1 × 1. The vectors (2, 3), (2, −2), (−3, 0), and (−1, −4) are
shown.
Vectors in Rn are often drawn as arrows going from the origin (0, 0, . . . , 0) to the corresponding point.
Here’s how the vectors in the previous picture look when represented with arrows:
y
(2,3)
(-3,0)
x
(2,-2)
(-1,-4)
If the x-component is negative, the arrow goes to the left; if the y-component is negative, the arrow
goes down.
When you represent vectors in Rn as arrows, the arrows do not have to start at the origin. For instance,
2
in R the vector (3, 2) can be represented by any arrow which goes 3 units in the x-direction and 2 units in
the y-direction, from the start of the arrow to the end of the arrow. All of the arrows in the picture below
represent the vector (3, 2):
y
As long as the length and direction of the arrow don’t change as it is moved around, it represents the
same vector.
Representing vectors in Rn as arrows gives us a way of picturing vector addition, vector subtraction,
and scalar multiplication.
To add vectors a and b represented as arrows, move one of the arrows — say b — so that it starts at
the end of the vector a. As you move b, keep its length and direction the same:
b
a a a+b
b
As we noted earlier, if you don’t change an arrow’s length or direction, it represents the same vector.
So the new vector is still b. The sum a + b is represented by the arrow that goes from the start of a to the
end of b.
5
You can also leave b alone and move a so it starts at the end of b. Then b + a is the arrow going from
the start of b to the end of a. Notice that it’s the same as the arrow a + b, which reflects the commutativity
of vector addition: a + b = b + a.
b
a+b
a
a
b+a
This picture also shows that you can think of a + b (or b + a) as the arrow given by the diagonal of the
parallelogram whose sides are a and b.
You can add more than two vectors in the same way. Move the vectors to make a chain, so that the
next vector’s arrow starts at the end of the previous vector’s arrow. The sum is the arrow that goes from
the start of the first arrow to the end of the last arrow:
a
a+b+c+d
To subtract a vector b from a vector a — that is, to do a − b — draw the arrow from the end of b to
the end of a. This assumes that the arrows for a and b start at the same point:
a-b
To see that this picture is correct, interpret it as an addition picture, where we’re adding a − b to b.
The sum (a − b) + b = a should be the arrow from the start of b to the end of a − b, which it is.
When a vector a is multiplied by a real number k to get ka, the arrow representing the vector is scaled
up by a factor of k. In addition, if k is negative, the arrow is “flipped” 180◦ , so it points in the opposite
6
direction to the arrow for a.
a
2a
-3 a
In the picture above, the vector 2a is twice as long as a and points in the same direction as a. The
vector −3a is 3 times as long as a, but points in the opposite direction.
w
u
I start by constructing 2v, an arrow twice as long as v in the same direction as v. I place it so it starts
at the same place as u. Then the arrow that goes from the end of 2v to the end of u is u − 2v.
u-2v+3w
u-2v
2v
3w
Next, I construct 3w, an arrow 3 times as long as w in the same direction as w. I move 3w so it starts
at the end of u − 2v. Then the arrow from the start of u − 2v to the end of 3w is u − 2v + 3w.
While we can draw pictures of vectors when the field of scalars is the real numbers R, pictures don’t
work quite as well with other fields. As an example, suppose the field is Z5 = {0, 1, 2, 3, 4}. Remember that
the operations in Z5 are addition mod 5 and multiplication mod 5. So, for instance,
4 + 3 = 2 and 2 · 4 = 3.
We saw that R2 is just the x-y plane. What about Z25 ? It consists of pairs (a, b) where a and b are
elements of Z5 . Since there are 5 choices for a and 5 choices for b, there are 5 · 5 = 25 elements in Z25 . We
7
can picture it as a 5 × 5 grid of dots:
4
3
2
1
0
0 1 2 3 4
The dot corresponding to the vector (3, 2) is circled as an example.
Picturing vectors as arrows seems to work until we try to do vector arithmetic. For example, suppose
v = (3, 4) in Z25 . We can represent it with an arrow from the origin to the point (3, 4).
Suppose we multiply v by 2. You can check that 2v = (1, 3).
Here’s a picture showing v = (3, 4) and 2v = (1, 3):
4 4
v
3 3
2 2
2v
1 1
0 0
0 1 2 3 4 0 1 2 3 4
In R2 , we’d expect 2v to have the same direction as v and twice the length. You can see that it doesn’t
work that way in Z25 .
What about vector addition in Z25 ? Suppose we add (2, 1) and (2, 4):
If we represent the vectors as arrows and try to add the arrows as we did in R2 , we encounter problems.
First, when I move (2, 4) so that it starts at the end of (2, 1), the end of (2, 4) sticks outside of the 5 × 5 grid
which represents Z25 .
4
3
2
1
0
0 1 2 3 4
If I ignore this problem and I draw the arrow from the start of (1, 2) to the end of (2, 4), the diagonal
arrow which should represent the sum looks very different from the actual sum arrow (4, 0) (the horizontal
arrow in the picture) — and as with (2, 4), the end of the sum arrow sticks outside the grid which represents
Z25 .
You can see that thinking of vectors as arrows has limitations. It’s okay for vectors in Rn .
What about thinking of vectors as “lists of numbers”? That seemed to work in the examples above in
Rn and in Z25 . In general, this works for the F n vector spaces for finite n, but those aren’t the only vector
spaces.
Here are some examples of vector spaces which are not F n ’s, at least for finite n.
8
The set R[x] of polynomials with real coefficients is a vector space over R, using the standard operations
on polynomials. For example, you add polynomials and multiply them by numbers in the usual ways:
x3 + 2x2 + 10x − 6 = (−6, 10, 2, 1, 0, 0, . . .), and − 12x2 + 40 = (40, 0, −12, 0, 0, . . .).
We have to begin with the lowest degree coefficient and work our way up, because polynomials can have
arbitrarily large degree. So a polynomial whose highest power term was 3x100 might have nonzero numbers
from the zeroth slot up to the “3” in the 101st slot, followed by an infinite number of zeros.
Not bad! It’s hard to see how we could think of these as “arrows”, but at least we have something like
our earlier examples.
However, sometimes you can’t represent an element of a vector space as a “list of numbers”, even if you
allow an “infinite list”.
Let C([0, 1]) denote the continuous real-valued functions defined on the interval 0 ≤ x ≤ 1. You add
functions pointwise:
(f + g)(x) = f (x) + g(x) for f, g ∈ C([0, 1]).
From calculus, you know that the sum of continuous functions is a continuous function. For instance, if
f (x) = ex and g(x) = sin(x3 + 1), then
9
(b) k · 0 = 0 for all k ∈ F .
Note: On both the left and right, “0” denotes the zero vector in V .
(c) (−1) · x = −x for all x ∈ V .
(d) −(−x) = x for all x ∈ V .
Proof. (a) As I noted above, the “0” on the left is the zero in F , whereas the “0” on the right is the zero
vector in V . We use a little trick, writing 0 as 0 + 0:
0 · x = (0 + 0) · x = 0 · x + 0 · x.
The first step used the definition of the number zero: “Zero plus anything gives the anything”, so take
the “anything” to be the number 0 itself. The second step used distributivity.
Next, I’ll subtract 0 · x from both sides. Just this once, I’ll show all the steps using the axioms. Start
with the equation above, and add −(0 · x) to both sides:
0·x = 0·x+0·x
0 · x + [−(0 · x)] = (0 · x + 0 · x) + [−(0 · x)]
0 = (0 · x + 0 · x) + [−(0 · x)] (Axiom (4))
0 = 0 · x + (0 · x + [−(0 · x)]) (Axiom (1))
0 = 0·x+0 (Axiom (4))
0=0·x (Axiom (3))
Normally, I would just say: “Subtracting 0 · x from both sides, I get 0 = 0 · x.” It’s important to go
through a few simple proofs based on axioms to ensure that you really can do them. But the result isn’t very
surprising: You’d expect “zero times anything to equal zero”. In the future, I won’t usually do elementary
proofs like this one in such detail.
(b) Note that “0” on both the left and right denotes the zero vector, not the number 0 in F . I use the same
idea as in the proof of (a):
k · 0 = k · (0 + 0) = k · 0 + k · 0.
The first step used the definition of the zero vector, and the second used distributivity. Now subtract
k · 0 from both sides to get 0 = k · 0.
(c) (The “−1” on the left is the scalar −1; the “−x” on the right is the “negative” of x ∈ V .)
(d)
−(−x) = (−1) · [(−1) · x] = [(−1) · (−1)]x = 1 · x = x.
Definition. Let V be a vector space over a field F , and let W ⊂ V , W 6= ∅. W is a subspace of V if:
(a) If u, v ∈ W , then u + v ∈ W .
In other words, W is closed under addition of vectors and under scalar multiplication.
If we draw the vectors as arrows, we can picture the axioms in this way:
ku
u+v
v
u
u
Remember that not all vectors can be drawn as arrows, so in general these pictures are just aids to your
intuition.
A subspace W of a vector space V is itself a vector space, using the vector addition and scalar multi-
plication operations from V . If you go through the axioms for a vector space, you’ll see that they all hold
in W because they hold in V , and W is contained in V . Thus, the subspace axioms simply ensure that the
vector addition and scalar multiplication operations from V “don’t take you outside of W ” when applied to
vectors in W .
Remark. If W is a subspace, then axiom (a) says that sum of two vectors in W is in W . You can show
using induction that if x1 , . . . , xn ∈ W , then x1 + · · · + xn ∈ W for any n ≥ 1.
y
A
B
1
In R3 , the subspaces are the {0}, R3 , and lines or planes passing through the origin.
S = {(0, 0), (2, 3), (4, 1), (1, 4), (3, 2)}.
4
3
2
1
0
0 1 2 3 4
While 4 of the points lie on a “line”, the “line” is not a line through the origin. The origin is in S, but
it doesn’t lie on the same “line” as the other points.
Or consider the vector space C(R) over R, consisting of continuous functions R → R. There is a subspace
consisting of all multiples of ex — so things like 2ex , −πex , 1.79132ex, 0 · ex , and so on. Here’s a picture
which shows the graph of some of the elements of this subspace:
Of course, there are actually an infinite number of “graphs” (functions) in this subspace — I’ve only
drawn a few. You can see our subspace is pretty far from “a line through the origin”, even though it consists
of all multiples of a single vector.
2
In what follows, we’ll look at properties of subspaces, and discuss how to check whether a set is or is
not a subspace.
First, every vector space contains at least two “obvious” subspaces, as described in the next result.
Proposition. if V is a vector space over a field F , then {0} and V are subspaces of V .
Proof. I’ll do the proof for {0} by way of example. First, I have to take two vectors in {0} and show
that their sum is in {0}. But {0} contains only the zero vector 0, so my “two” vectors are 0 and 0 — and
0 + 0 = 0, which is in {0}.
Next, I have to take k ∈ F and a vector in {0} — which, as I just saw, must be 0 — and show that
their product is in {0}. But k · 0 = 0 ∈ {0}. This verifies the second axiom, and so {0} is a subspace.
Obviously, the very uninteresting vector space consisting of just a zero vector (i.e. V = {0}) has only
the one subspace V = {0}.
If a vector space V is nonzero and one-dimensional — roughly speaking, if V “looks like” a line — then
{0} and V are the only subspaces, and they are distinct. In this case, V consists of all multiples kv of any
nonzero vector v ∈ V by all scalars k ∈ F .
Beyond those cases, a vector space V always has subspaces other than {0} and V . For example, if
V 6= 0, take a nonzero vector x ∈ V and consider the set of all multiples kx of x by scalars k ∈ F . You can
check that this is a subspace — the “line” passing through x.
If you want to show that a subset of a vector space is a subspace, you can combine the verifications for
the two subspace axioms into a single verification.
Proposition. Let V be a vector space over a field F , and let W be a subset of V .
W is a subspace of V if and only if u, v ∈ W and k ∈ F implies ku + v ∈ W .
Proof. Suppose W is a subspace of V , and let u, v ∈ W and k ∈ F . Since W is closed under scalar
multiplication, ku ∈ W . Since W is closed under vector addition, ku + v ∈ W .
Conversely, suppose u, v ∈ W and k ∈ F implies ku + v ∈ W . Take k = 1: Our assumption says that if
u, v ∈ W , then u + v ∈ W . This proves that W is closed under vector addition. Again in our assumption,
take v = 0. The assumption then says that if u ∈ W and k ∈ F , then ku ∈ W . This proves that W is closed
under scalar multiplication. Hence, W is a subspace.
Note that the two axioms for a subspace are independent: Both can be true, both can be false, or one
can be true and the other false. Hence, some of our examples will ask that you check each axiom separately,
proving that it holds if it’s true and disproving it by a counterexample if it’s false.
Lemma. Let W be a subspace of a vector space V .
(a) The zero vector is in W .
(b) If w ∈ W , then −w ∈ W .
Note: These are not part of the axioms for a subspace: They are properties a subspace must have. So
if you are checking the axioms for a subspace, you don’t need to check these properties. But on the other
hand, if a subset does not have one of these properties (e.g. the subset doesn’t contain the zero vector), then
it can’t be a subspace.
Proof. (a) Take any vector w ∈ W (which you can do because W is nonempty), and take 0 ∈ F . Since W
is closed under scalar multiplication, 0 · w ∈ W . But 0 · w = 0, so 0 ∈ W .
(b) Since w ∈ W and −1 ∈ F , (−1) · w = −w is in W .
Example. Consider the real vector space R2 , the usual x-y plane.
(a) Show that the following sets are subspaces of R2 :
3
(These are just the x and y-axes.)
(b) Show that the union W1 ∪ W2 is not a subspace.
(a) I’ll check that W1 is a subspace. (The proof for W2 is similar.) First, I have to show that two elements
of W1 add to an element of W1 . An element of W1 is a pair with the second component 0. So (x1 , 0), (x2 , 0)
are two arbitrary elements of W1 . Add them:
(3, 17) 6∈ W1 because its second component isn’t 0. And (3, 17) 6∈ W2 because its first component isn’t
0. Since (3, 17) isn’t in either W1 or W2 , it’s not in their union.
Pictorially, it’s easy to see: (3, 17) doesn’t lie in either the x-axis (W1 ) or the y-axis (W2 ):
y (3,17)
(0,17)
(3,0) x
W = {(x, y, 1) | x, y ∈ R}.
If you’re trying to decide whether a set is a subspace, it’s always good to check whether it contains
the zero vector before you start checking the axioms. In this case, the set consists of 3-dimensional vectors
whose third components are equal to 1. Obviously, the zero vector (0, 0, 0) doesn’t satisfy this condition.
4
Since W doesn’t contain the zero vector, it’s not a subspace of R3 .
Example. Consider the following subset of the vector space R2 :
W = {(x, sin x) | x ∈ R} .
Check each axiom for a subspace (i.e. closure under addition and closure under scalar multiplication).
If the axiom holds, prove it; if the axiom doesn’t hold, give a specific counterexample.
Notice that this problem is open-ended, in that you aren’t told at the start whether a given axiom holds
or not. So you have to decide whether you’re going to try to prove that the axiom holds, or whether you’re
going to try to find a counterexample. In these kinds of situations, look at the statement of the problem —
in this case, the definition of W . See if your mathematical experience causes you to lean one way or another
— if so, try that approach first.
If you can’t make up your mind, pick either “prove” or “disprove” and get started! Usually, if you pick
the wrong approach you’ll know it pretty quickly — in fact, getting stuck taking the wrong approach may
give you an idea of how to make the right approach work.
Suppose I start by trying to prove that the set is closed under sums. I take two vectors in W — say
(x, sin x) and (y, sin y). I add them:
The last vector isn’t in the right form — it would be if sin x + sin y was equal to sin(x + y). Based
on your knowledge of trigonometry, you should know that doesn’t sound right. You might reason that if a
simple identity like “sin x + sin y = sin(x + y)” was true, you probably would have learned about it!
I now suspect that the sum axiom doesn’t hold. I need a specific counterexample — that is, two vectors
in W whose sum is not in W .
To choose things for a counterexample, you should try to choose things which are not too “special”
or your “counterexample” might accidentally satisfy the axiom, which is not what you want. At the same
time, you should avoid things which are too “ugly”, because it makes the counterexample less convincing
if a computer is needed (for instance) to compute the numbers. You may need a few tries to find a good
counterexample. Remember that the things in your counterexample should involve specific numbers, not
“variables”. π π
Returning to our problem, I need two vectors in W whose sum isn’t in W . I’ll use , sin and
2 2
(π, sin π). Note that π π π
, sin = , 1 ∈ W and (π, sin π) = (π, 0) ∈ W.
2 2 2
On the other hand,
π π π 3π
, sin + (π, sin π) = , 1 + (π, 0) = ,1 .
2 2 2 2
3π 3π
But , 1 6∈ W because sin = −1 6= 1.
2 2
How did I choose the two vectors? I decided to use a multiple of π in the first component, because the
sine of a multiple of π (in the second component) comes out to “nice numbers”. If I had used (say) (1, sin 1),
I’d have needed a computer to tell me that sin 1 ≈ 0.841470984808 . . ., and a counterexample would have
looked kind of ugly. In addition, an approximation of this kind really isn’t a proof.
π
How did I know to use and π? Actually, I didn’t know till I did the work that these numbers would
2
produce a counterexample — you often can’t know without trying whether the numbers you’ve chosen will
work.
Thus, W is not closed under vector addition, and so it is not a subspace. If that was the question, I’d
be done, but I was asked to each axiom. It is possible for one of the axioms to hold even if the other one
does not. So I’ll consider scalar multiplication.
5
I’ll givea counterexample to show that the scalar multiplication axiom doesn’t hold. I need a vector in
π π
W ; I’ll use , sin again. I also need a real number; I’ll use 2. Now
2 2
π π π
2· , sin = π, 2 sin = (π, 2).
2 2 2
W = {v ∈ F n | Av = Bv}.
u ∈W v ∈W
Au = B u Av = B v
A + Av = B u + B v
A (u + v) = B (u + v)
u + v ∈W
Next, use the definition of W to translate each of the statements: u ∈ W means Au = Bu, so put
“Au = Bu” below “u ∈ W ”. Likewise, v ∈ W means Av = Bv, so put “Av = Bv” below “v ∈ W ”. On the
other hand, u + v ∈ W means A(u + v) = B(u + v), but since “u + v ∈ W ” is what we want to conclude,
put “A(u + v) = B(u + v)” above “u + v ∈ W ”.
u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v
A(u + v) = B (u + v)
u + v ∈W
6
At this point, you can either work downwards from Au = Bu and Av = Bv, or upwards from A(u + v) =
B(u + v). But if you work upwards from A(u + v) = B(u + v), you must ensure that the algebra you do is
reversible — that it works downwards as well.
I’ll work downwards from Au = Bu and Av = Bv. What algebra could I do which would get me closer
to A(u + v) = B(u + v)? Since the target involves addition, it’s natural to add the equations:
u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v Add equations
A(u + v) = B (u + v)
u + v ∈W
At this point, I’m almost done. To finish, I have to explain how to go from Au + Av = Bu + Bv to
A(u + v) = B(u + v). You can see that I just need to factor A out of the left side and factor B out of the
right side:
u ∈W v ∈W
Au = B u Av = B v
Au + Av = B u + B v
Factor out A and B
A(u + v) = B (u + v)
u + v ∈W
The proof is complete! If you were just writing the proof for yourself, the sketch above might be good
enough. If you were writing this proof more formally — say for an assignment or a paper — you might add
some explanatory words.
For instance, you might say: “I need to show that W is closed under addition. Let u ∈ W and let
v ∈ W . By definition of W , this means that Au = Bu and Av = Bv. Adding the equations, I get
Au = Av = Bu + Bv. Factoring A out of the left side and B out of the right side, I get A(u + v) = B(u + v).
By definition of W , this means that u + v ∈ W . Hence, W is closed under addition.”
By the way, be careful not to write things like “A(u + v) + B(u + v) ∈ W ” — do you see why this doesn’t
make sense? “A(u + v) = B(u + v)” is an equation that u + v satisfies. You can’t write “∈ W ”, since an
equation can’t be an element of W . Elements of W are vectors. You say “u + v ∈ W ”, as in the last line.
Here’s a sketch of a similar “fill-in” proof for closure under scalar multiplication:
u ∈W k ∈F
Au = Bu
kAu = kBu
A(ku) = B(ku)
ku ∈W
7
Alternatively, the constant polynomial f (x) = 1 is an element of V — it gives 1 when you plug in 2 —
but 2 · f (x) is not. So V is not closed under scalar multiplication.
See if you can give an example which shows that V is not closed under vector addition.
Proposition. If A is an m × n matrix over the field F , consider the set V of n-dimensional vectors x which
satisfy
Ax = 0.
Then V is a subspace of F n .
A(x + y) = Ax + Ay = 0 + 0 = 0.
Hence, x + y ∈ V .
Suppose x ∈ V and k ∈ F . Then Ax = 0, so
A(kx) = k(Ax) = k · 0 = 0.
Therefore, kx ∈ V .
Thus, V is a subspace.
The subspace defined in the last proposition is called the null space of A.
As a specific example of the last proposition, consider the following system of linear equations over R:
w 0
1 1 0 1 x 0
= .
0 0 1 3 y 0
z 0
You can show by row reduction that the general solution can be written as
w = −s − t, x = s, y = −3t, z = t.
Thus,
w −s − t
x s
= .
y −3t
z t
The Proposition says that the set of all vectors of this form constitute a subspace of R4 .
For example, if you add two vectors of this form, you get another vector of this form:
−s − t −s − t′ −(s + s′ ) − (t + t′ )
′
s s′ s + s′
+ ′ = .
−3t −3t −3(t + t′ )
t t′ t + t′
You can check directly that the set is also closed under scalar multiplication.
8
In terms of systems of linear equations, a vector (x1 , . . . , xn ) ∈ F n is in the null space of the matrix
A = (aij ) if it’s a solution to the system
a11 x1 + ··· + a1n xn = 0
a21 x1 + ··· + a2n xn = 0
..
.
an1 x1 + ··· + ann xn = 0
In this situation, we say that the vectors (x1 , . . . , xn ) make up the solution space of the system. Since
the solution space of the system is another name for the null space of A, the solution space is a subspace of
F n.
We’ll study the null space of a matrix in more detail later.
Example. C(R) denotes the real vector space of continuous functions R → R. Consider the following subset
of C(R): Z x
S= f ∈ C(R) f (x) = et f (t) dt .
0
Prove that S is a subspace of C(R).
Let f, g ∈ S. Then Z x Z x
f (x) = et f (t) dt and g(x) = et g(t) dt.
0 0
Adding the two equations and using the fact that “the integral of a sum is the sum of the integrals”,
we have Z x Z x Z x
f (x) + g(x) = et f (t) dt + et g(t) dt = et [f (t) + g(t)] dt.
0 0 0
This proves that f (x) + g(x) ∈ S, so S is closed under addition.
Let f ∈ S, so Z x
f (x) = et f (t) dt.
0
Let c ∈ R. Using the fact that constants can be moved into integrals, we have
Z x Z x
t
c · f (x) = c · e f (t) dt = et [c · f (t)] dt.
0 0
This proves that c · f (x) ∈ S, so S is closed under scalar multiplication. Thus, S is a subspace of C(R).
Intersections of subspaces.
We’ve seen that the union of subspaces is not necessarily a subspace. For intersections, the situation
is different: The intersection of any number of subspaces is a subspace. The only signficant issue with the
proof is that we will deal with an arbitrary collection of sets — possibly infinite, and possibly uncountable.
Except for taking care with the notation, the proof is fairly easy.
Theorem. Let \ V be a vector space over a field F , and let {Ui }i∈I be a collection of subspaces of V . The
intersection Ui is a subspace of V .
i∈I
\
Proof. We have to show that Ui is closed under vector addition and under scalar multiplication.
\ i∈I
Suppose x, y ∈ Ui . For x and y to be in the intersection of the Ui , they must be in each Ui for
i∈I
all i ∈ I. So pick a particular i ∈ I; we have x, y ∈ Ui . Now Ui is a subspace,
\ so it’s closed under vector
addition. Hence, x + y ∈ Ui . Since this is true for all i ∈ I, I have x + y ∈ Ui .
i∈I
9
\
Thus, Ui is closed under vector addition.
i∈I
\
Next, suppose k ∈ F and x ∈ Ui . For x to be in the intersection of the Ui , it must be in each Ui
i∈I
for all i ∈ I. So pick a particular i ∈ I; we have x ∈ Ui . Now Ui is a subspace,
\ so it’s closed under scalar
multiplication. Hence, kx ∈ Ui . Since this is true for all i ∈ I, I have kx ∈ Ui .
i∈I
\
Thus, Ui is closed under scalar multiplication.
i∈I
\
Hence, Ui is a subspace of V .
i∈I
You can see that the proof was pretty easy, the two parts being pretty similar. The key idea is that
something is in the intersection of a bunch of sets if and only if it’s in each of the sets. How many sets there
are in the bunch doesn’t matter. If you’re still feeling a little uncomfortable, try writing out the proof for
the case of two subspaces: If U and V are subspaces of a vector space W over a field F , then U ∩ V is a
subspace of W . The notation is easier for two subspaces, but the idea of the proof is the same as the idea
for the proof above.
Now (1, 3, 2) + (−5, 1, 0) = (−4, 4, 2), which is in the set. But (−5, 1, 0) + (−4, 4, 2) = (−9, 5, 2) is not
in the set, so the set is not closed under vector addition. And 2 · (1, 3, 2) = (2, 6, 4) is not in the set, so the
set is not closed under scalar multiplication. The set is not a subspace of R3 .
Suppose we try to fix this by finding a subspace of R3 which contains the three vectors. Well, that’s
easy: We could take R3 ! This would work for any set of vectors in R3 , but for that reason, it isn’t a very
interesting answer.
It might be better to ask for the “smallest” subspace of R3 which contains the three vectors. This set
will be called the span of the given set of vectors. We might approach this by asking: What vectors do
we need to add to the set of three vectors to make it a subspace? Since subspaces are closed under vector
addition, we’ll need to throw in all the vectors which you get by adding the three vectors. You can check
that we’d need to throw in two more vectors, so we now have 5. But subspaces are also closed under scalar
multiplication, so we need to throw in multiples of these 5 vectors by every element of R. Now we have an
infinite number of vectors. Oops! We need to ensure that the set is closed under addition, so we need to
go back and throw in all sums of the infinitely many vectors we have so far. Then, since we threw in some
additional vectors, we need to take scalar multiples of those, and . . . .
Our approach is a reasonable one, but we clearly need a more systematic way of carrying it out. We
will see how to do it in this section.
Building subspaces involves forming sums and scalar multiples. Rather than thinking of doing these
two operations separately, we’ll define a concept which does both at once.
Definition. If v1 , v2 , . . . , vn are vectors in a vector space V over a field F , a linear combination of the
v’s is a vector
k1 v1 + k2 v2 + · · · + kn vn for k1 , k2 , . . . kn ∈ F.
Notice that if you take two vectors v1 and v2 and take the scalars to be k1 = 1 and k2 = 1, you get
v1 + v2 , which is vector addition. And if you just take one vector v1 , then k1 v1 represents a scalar multiple
of that vector. Thus, the idea of a linear combination contains the operations of vector addition and scalar
multiplication, but allows us to do many such operations all at once.
Let’s see a numerical example. Take u = (1, 2) and v = (−3, 7) in R2 . Here is a linear combination of u
and v:
2u − 5v = 2 · (1, 2) − 5 · (−3, 7) = (17, −31).
√ π2
( 2−17)u+ v is also a linear combination of u and v. And u and v are themselves linear combinations
4
of u and v, as is the zero vector:
u = 1 · u + 0 · v, v = 0 · u + 1 · v, 0 = 0 · u + 0 · v.
On the other hand, there are vectors in R2 which are not linear combinations of p = (1, −2) and
q = (−2, 4). Do you see how this pair is different from the first?
Definition. If S is a set of vectors in a vector space V , the span hSi of S is the set of all linear combinations
of vectors in S.
1
If S is a subset of a subspace W , then S spans W (or S is a spanning set for W , or S generates W )
if W = hSi.
If S is a finite set of vectors, we could just describe hSi as the set of all linear combinations of the
elements of S. If S is infinite, a particular linear combination of vectors from S involves finitely many
vectors from S — we “grab” finitely many vectors from S at a time — but we’re doing this in all possible
ways.
Note that S ⊂ hSi. For if s is a vector in S, then 1 · s = s is a linear combination of a vector from S,
so it is in hSi.
v
u
It turns out that the span of u and v is all of R2 ; that is, if w ∈ R2 , then w can be written as a linear
combination of u and v. To see why this is reasonable, take a vector w and “project” w onto the lines
containing the vectors u and v:
v
u
I can scale u up by multiplying by an appropriate number to get a vector au which is the projection of
w on the line of u. Likewise, I can scale v up by multiplying by an appropriate number to get a vector bv
which is the projection of w on the line of v.
au bv
w=au+bv
As you can see in the picture, w is the diagonal of the parallelogram whose sides are au and bv, so
w = au + bv.
Try this with some other vectors in place of w. Of course, a picture isn’t a proof, but this example
should help you get an idea of what the span of a set of vectors means geometrically.
Theorem. If S is a subset of a vector space V , the span hSi of S is a subspace of V which contains S.
j1 u1 + j2 u2 + · · · + jn un , k1 v1 + k2 v2 + · · · + kn vn .
2
Here the j’s and k’s are scalars and the u’s and v’s are elements of S.
Take two elements of the span and add them:
(j1 u1 + j2 u2 + · · · + jn un ) + (k1 v1 + k2 v2 + · · · + km vm ) = j1 u1 + j2 u2 + · · · + jn un + k1 v1 + k2 v2 + · · · + km vm .
This sum is an element of the span, because it’s a sum of vectors in S, each multiplied by a scalar —
that is, a linear combination of elements of S. Thus, the span is closed under taking sums.
Take an element of the span and multiply it by a scalar:
This is an element of the span, because it’s a linear combination of elements of S. Thus, the span is
closed under scalar multiplication.
Therefore, the span is a subspace.
Example. Prove that the span of (3, 1, 0) and (2, 1, 0) in R3 is
V = {(a, b, 0) | a, b ∈ R} .
To show that two sets are equal, you need to show that each is contained in the other. To do this, take
a typical element of the first set and show that it’s in the second set. Then take a typical element of the
second set and show that it’s in the first set.
Let W be the span of (3, 1, 0) and (2, 1, 0) in R3 . A typical element of W is a linear combination of the
two vectors:
x · (3, 1, 0) + y · (2, 1, 0) = (3x + 2y, x + y, 0).
Since the sum is a vector of the form (a, b, 0) for a, b ∈ R, it is in V . This proves that W ⊂ V .
Now let (a, b, 0) ∈ V . I have to show that this vector is a linear combination of (3, 1, 0) and h2, 1, 0).
This means that I have to find real numbers x and y such that
The solution is
x = a − 2b, y = −a + 3b.
In other words,
(a − 2b) · (3, 1, 0) + (−a + 3b) · (2, 1, 0) = (a, b, 0).
3
Since (a, b, 0) is a linear combination of (3, 1, 0) and (2, 1, 0), it follows that (a, b, 0) ∈ W . This proves
that V ⊂ W .
Since W ⊂ V and V ⊂ W , I have W = V .
Example. Let
S = {(1, 1, 0), (0, 1, 1)} ⊂ R3 .
(a) Prove or disprove: (3, −1, −4) is in the span of S.
(b) Prove or disprove: (5, −2, 6) is in the span of S.
(a) The vector (3, −1, 4) is in the span of (1, 1, 0) and (0, 1, 1) if it can be written as a linear combination of
(1, 1, 0) and (0, 1, 1). So I try to find numbers a and b such that
I’ll convert this to a matrix equation by writing the vectors as column vectors:
1 0 3
a · 1 + b · 1 = −1 .
0 1 −4
By using column vectors (rather than row vectors), we get a familiar kind of matrix equation:
1 0 3
1 a
1 = −1 .
b
0 1 −4
4
The last matrix says “0 = 1”, a contradiction. The system is inconsistent, so there are no such numbers
a and b. Therefore, (5, −2, 6) is not in the span of S.
Thus, to determine whether the vector b ∈ F n is in the span of v1 , v2 , . . . , vm in F n , form the augmented
matrix
↑ ↑ ↑ ↑
v1 v2 · · · vm b
↓ ↓ ↓ ↓
If the system has a solution, b is in the span, and coefficients of a linear combination of the v’s which
add up to b are given by a solution to the system. If the system has no solutions, then b is not in the span
of the v’s.
(In a general vector space where vectors may not be “numbers in slots”, you have to got back to the
definition of spanning set.)
Example. Consider the following set of vectors in R3 :
1 2 1
0 ,1, 1
−1 3 −4
1 1 1
a= (7x − 11y − z), b= (x + 3y + z), c= (−x + 5y − z).
8 8 8
This shows that, given any vector (x, y, z), I can find a linear combination of the original three vectors
which equals (x, y, z).
Thus, the span of the original set of three vectors is all of R3 .
Example. Let
S = {(1, 2, 1), (1, 4, 2)} in Z35 .
5
(b) Determine whether (1, 1, 1) ∈ hSi.
The last row of the row reduced echelon matrix says “0 = 1”. This contradiction implies that the system
is has no solutions. Therefore, (1, 1, 1) is not in the span of S.
In the next example, we go back to the definition of the span of a set of vectors, rather than writing
down a matrix equation.
6
Suppose there are numbers a, b ∈ R such that
Set x = 0. We get
02 = a sin 0 + be0 , so 0 = b.
The equation becomes
x2 = a sin x.
π
Set x = . We get
2 π 2 π π2
= a sin , so = a.
2 2 4
The equation is now
π2
x2 = sin x.
4
5π
Finally, set x = . We get
2
2
π2 25π 2 π2
5π 5π
= sin , or = , so 25 = 1.
2 4 2 4 4
This contradiction shows there are no numbers a, b ∈ R which make the initial equation true. Hence, h
is not in the span of f and g.
Another definition of the span of a set
There is another way to define the span of a set of vectors. The construction is a little more subtle
than the one we gave earlier. It uses the intersection of a (possibly) infinite collection of subspaces, which
we know by an earlier result is a subspace.
Let V be a vector space over a field F , and let S be a set of vectors in V . Let
\
S̃ = {W | S ⊂ W and W is a subspace of V }.
That is, to form S̃ we intersect all subspaces of V which contain S. Note that there is at least one
subspace that contains S, namely V itself. We will show that S̃ is just the span of S, as constructed earlier:
The set of all linear combinations of elements of S.
(By the way, “S̃” is just temporary notation I’m using for this discussion. After this, I’ll just use hSi
to denote the span of S.)
Theorem. S̃ = hSi.
Proof. First, the span hSi is a subspace of V which contains S. So hSi is one of the subspaces being
intersected to construct S̃, and hence
\
S̃ = {W | S ⊂ W and W is a subspace of V } ⊂ hSi.
To complete the proof, I’ll show that hSi ⊂ S̃. Take an element of hSi, namely a linear combination of
elements of S:
a1 s1 + as s2 + · · · + an sn where ai ∈ F and si ∈ S.
Let W be a subspace of V which contains S. Then s1 , s2 , . . . sn ∈ W , and since W is a subspace,
a1 s1 + as s2 + · · · + an sn ∈ W.
7
Thus, a1 s1 + as s2 + · · · + an sn is contained in every subspace which contains S — that is, it is contained
in every subspace being intersected to construct S̃. Hence, it is contained in S̃. This proves that hSi ⊂ S̃.
Therefore, S̃ = hSi.
If you want to construct an object in math using some “building blocks”, there are often two general
ways to do so. The first way is to start with the building blocks, and combine them to build the object
“from the inside out”. This direct approach is what we did in the definition of hSi as all linear combinations
of elements of S. It requires that you know how to put the blocks together to do the “building”.
Another approach is to take all objects of the “right kind” which contain the building blocks and find the
“smallest”. You find the “smallest” such object by intersecting all the objects of the “right kind” containing
the building blocks. In this approach, you will usually have to intersect an infinite number of sets. And you
will need to show that the intersection of sets of the “right kind” is still a set of the “right kind”. This is
the approach we used in intersecting all subspaces containing S.
Of course, if you’ve done things correctly, the two approaches give the same object at the end, as they
did here. Both approaches are useful in mathematics.
a1 v1 + · · · + an vn = 0 implies a1 = · · · = an = 0.
A set of vectors which is not linearly independent is linearly dependent. (I’ll usually say “independent”
and “dependent” for short.) Thus, a set of vectors S is dependent if there are vectors v1 , . . . , vn ∈ S and
numbers a1 , . . . , an ∈ F , not all of which are 0, such that
a1 v1 + · · · + an vn = 0.
In words, the definition says that if a linear combination of any finite set of vectors in S equals the zero
vector, then all the coefficients in the linear combination must be 0. I’ll refer to such a linear combination
as a trivial linear combination.
On the other hand, a linear combination of vectors is nontrivial if at least one of the coefficients
is nonzero. (“At least one” doesn’t mean “all” — a nontrivial linear combination can have some zero
coefficients, as long as at least one is nonzero.)
Thus, we can also say that a set of vectors is independent if there is no nontrivial linear combination
among finitely many of the vectors which is equal to 0. And a set of vectors is dependent if there is some
nontrivial linear combination among finitely many of the vectors which is equal to 0.
Let’s see a pictorial example of a dependent set. Consider the following vectors u, v, and w in R2 .
w
u
v
I’ll show how to get a nontrivial linear combination of the vectors that is equal to the zero vector.
Project w onto the lines of u and v.
au bv
The projections are multiples au of u and bv of v. Since w is the diagonal of the parallelogram whose
sides are au and bv, we have
w = au + bv, so au + bv − w = 0.
This is a nontrivial linear combination of u, v and w which is equal to the zero vector, so {u, v, w} is
dependent.
In fact, it’s true that any 3 vectors in R2 are dependent, and this pictorial example should make this
reasonable. More generally, if F is a field then any n vectors in F m are dependent if n > m. We’ll prove
this below.
1
Example. If F is a field, the standard basis vectors are
e1 = (1, 0, 0, . . . 0)
e2 = (0, 1, 0, . . . 0)
..
.
en = (0, 0, 0, . . . 1)
a1 e1 = (a1 , 0, 0, . . . 0)
a2 e2 = (0, a2 , 0, . . . 0)
..
.
an en = (0, 0, 0, . . . an )
So
a1 e1 + a2 e2 + · · · + an en = (a1 , a2 , . . . an ).
Since by assumption a1 e1 + a2 e2 + · · · + an en = 0, I get
In this case, you can probably juggle numbers in your head to see that
This shows that the vectors are dependent. There are infinitely many pairs of numbers a and b that
work. In examples to follow, I’ll show how to find numbers systematically in cases where the arithmetic isn’t
so easy.
Example. Suppose u, v, w, and x are vectors in a vector space. Prove that the set {u − v, v − w, w − x, x− u}
is dependent.
Notice that in the four vectors in {u − v, v − w, w − x, x − u}, each of u, v, w, and x occurs once with a
plus sign and once with a minus sign. So
(u − v) + (v − w) + (w − x) + (x − u) = 0.
2
This is a dependence relation, so the set is dependent.
If you can’t see an “easy” linear combination of a set of vectors that equals 0, you may have to determine
independence or dependence by solving a system of equations.
Example. Consider the following sets of vectors in R3 . If the set is independent, prove it. If the set is
dependent, find a nontrivial linear combination of the vectors which is equal to 0.
(a) {(2, 0, −3), (1, 1, 1), (1, 7, 2)}.
(b) {(1, 2, −1), (4, 1, 3), (−10, 1, −11)}.
(a) Write a linear combination of the vectors and set it equal to 0:
2 1 1 0
a · 0 + b · 1 + c · 7 = 0.
−3 1 2 0
a = 0, b = 0, c = 0.
3
This gives the equations
a + 2c = 0, b − 3c = 0.
Thus, a = −2c and b = 3c. I can get a nontrivial solution by setting c to any nonzero number. I’ll use
c = 1. This gives a = −2 and b = 3. So
1 4 −10 0
(−2) · 2 + 3 · 1 + 1 · 1 = 0 .
−1 3 −11 0
If the set is independent, prove it. If the set is dependent, find a nontrivial linear combination of the
vectors which is equal to 0.
Write
4 3 0 0
a · 1 + b · 3 + c · 1 = 0.
2 0 1 0
This gives the matrix equation
4 3 0 a 0
1 3 1 b = 0.
2 0 1 c 0
Row reduce to solve the system:
4 3 0 0 1 2 0 0
1 → →
3 1 0 1 3 1 0
r1 → 4r1 r3 → r3 + 3r1
2 0 1 0 2 0 1 0
1 2 0 0 1 2 0 0
1 → →
3 1 0 0 1 1 0
r2 → r2 + 4r1 r1 → r1 + 3r2
0 1 1 0 0 1 1 0
1 0 3 0 1 0 3 0
0 →
1 1 0 0 1 1 0
r3 → r3 + 4r2
0 1 1 0 0 0 0 0
This gives the equations
a + 3c = 0, b + c = 0.
Thus, a = 2c and b = 4c. Set c = 1. This gives a = 2 and b = 4. Hence, the set is dependent, and
4 3 0 0
2 · 1 + 4 · 3 + 1 · 1 = 0.
2 0 1 0
4
If the set is independent, prove it. If the set is dependent, find a nontrivial inear combination of the
vectors equal to 0.
Write
1 1 0 0
0 2 1 0
a · + b· +c · = .
1 2 2 0
2 1 1 0
This gives the matrix equation
1 1 0 0
a
0 2 1 0
b = .
1 2 2 0
c
2 1 1 0
Row reduce to solve the system:
1 1 0 0 1 0 1 0
0 2 1 0 0 1 2 0
→
1 2 2 0 0 0 0 0
2 1 1 0 0 0 0 0
1 1 0 0
0 2 1 0
2 · + 1 · + 1 · = .
1 2 2 0
2 1 1 0
It’s important to understand this general setup, and not just memorize the special case of vectors in F n ,
as shown in the last few examples. Remember that vectors don’t have to look like things like “(−3, 5, 7, 0)”
(“numbers in slots”). Consider the next example, for instance.
Example. R[x] is a vector space over the reals. Show that the set {1, x, x2 , . . .} is independent.
Suppose
a0 + a1 x + a2 x2 + · · · + an xn = 0.
That is,
a0 + a1 x + a2 x2 + · · · + an xn = 0 + 0 · x + 0 · x2 + · · · + 0 · xn .
Two polynomials are equal if and only if their corresponding coefficients are equal. Hence, a0 = a2 =
· · · = an = 0. Therefore, {1, x, x2 , . . .} is independent.
In some cases, you can tell by inspection that a set is dependent. I noted earlier that a set containing
the zero vector must be dependent. Here’s another easy case.
5
Proof. Suppose v1 , v2 , . . . vn are n vectors in F m , and n > m. Write
a1 v1 + a2 v2 + · · · + an vn = 0.
a1 0
↑ ↑ ↑ a2 0
... = ... .
v1 v2 vn
↓ ↓ ↓
an 0
Note that this matrix has m rows and n + 1 columns, and n > m.
The row-reduced echelon form can have at most one leading coefficient in each row, so there are at most
m leading coefficients. These correspond to the main variables in the solution. Since there are n variables
and n > m, there must be some parameter variables. By setting any parameter variables equal to nonzero
numbers, I get a nontrivial solution for a1 , a2 , . . . an . This implies that {v1 , v2 , . . . vn } is dependent.
a1 0
↑ ↑ ↑ a2 0
... = ... .
v1 v2 vn
↓ ↓ ↓
an 0
a1 v1 + a2 v2 + an vn = 0.
By independence, a1 = a2 = · · · = an = 0. Thus, the system above has only the zero vector 0 as a
solution. An earlier theorem on invertibility shows that this means the matrix of v’s is invertible.
6
Conversely, suppose the following matrix is invertible:
↑ ↑ ↑
A = v1 v2 vn .
↓ ↓ ↓
Let
a1 v1 + a2 v2 + an vn = 0.
Write this as a matrix equation and solve it:
a1 0
↑ ↑ ↑ a2 0
... = ...
v1 v2 vn
↓ ↓ ↓
an 0
a1 0
a2 0
A· ... = ...
an 0
a1 0
a 2 0
A−1 A · −1
... = A · ...
an 0
a1 0
a2 0
. =.
.. ..
an 0
Note that this proposition requires that you have n vectors in F n — the number of vectors must match
the dimension of the space.
The result can also be stated in contrapositive form: The set of vectors is dependent if and only if the
matrix having the vectors as columns is not invertible. I’ll use this form in the next example.
The set is dependent when A is not invertible, and A is not invertible when its determinant is equal to
0. Now
det A = 30(x − 1)(x − 8).
7
Thus, det A = 0 for x = 1 and x = 8. For those values of x, the original set is dependent.
The next proposition says that a independent set can be thought of as a set without “redundancy”, in
the sense that you can’t build any one of the vectors out of the others.
Proposition. Let V be a vector space over a field F , and let S ⊂ V . S is dependent if and only if some
v ∈ S can be expressed as a linear combination of vectors in S.
Proof. Suppose v ∈ S can be written as a linear combination of vectors in S other than v:
v = a1 v1 + · · · + an vn .
a1 v1 + a2 v2 + · · · + an vn = 0
a1 v1 = −a2 v2 − · · · − an vn
−1
a1 a1 v1 = a−1
1 (−a2 v2 − · · · − an vn )
v1 = −a−1 −1
1 a2 v2 − · · · − a1 an vn
8
I have to show all the a’s are 0.
This equation is an identity in x, so I may differentiate it repeatedly to get n equations:
Plug in x = c:
f1 (c) f2 (c) ··· fn (c)
a1 0
f1′ (c) f2′ (c) ··· fn′ (c)
a2 0
f1′′ (c) f2′′ (c) ··· fn′′ (c)
. = . .
. ..
.. .. .. .
. . .
(n−1) (n−1) (n−1) an 0
fn (c) f2 (c) · · · fn (c)
Let f1 (c) f2 (c) ··· fn (c)
f1′ (c) f2′ (c) ··· fn′ (c)
f1′′ (c) f2′′ (c) ··· fn′′ (c)
A= .
.. .. ..
. . .
(n−1) (n−1) (n−1)
fn (c) f2 (c) · · · fn (c)
The determinant of this matrix is the Wronskian W (f1 , f2 , . . . fn )(c), which by assumption is nonzero.
Since the determinant is nonzero, the matrix is invertible. So
a1 0
a2 0
A·
... = ...
an 0
a1 0
a 2 0
A−1 A · −1
... = A · ...
an 0
a1 0
a2 0
. =.
.. ..
an 0
9
Compute the Wronskian:
x x3 x5
3 5
W (x, x , x ) = 1 3x2 5x4 = 16x6 .
0 6x 20x3
I can find values of x for which the Wronskian is nonzero: for example, if x = 1, then W (x, x3 , x5 ) =
16 6= 0. Hence, {x, x3 , x5 } is independent.
The next example shows that the converse of the last theorem is false: You can have a set of independent
functions whose Wronskian is always 0 (so there’s no point where the Wronskian is nonzero).
Example. C 1 (R) denotes the vector space over R consisting of differentiable functions R → R. Let
f (x) = x2 , g(x) = x2 if x ≥ 0 .
−x2 if x < 0
Show that {f, g} is independent in C 1 (R), but W (f, g)(x) = 0 for all x ∈ R.
Note: You can check that g is differentiable at 0, and g ′ (0) = 0.
For independence, suppose that a, b ∈ R and af (x) + bg(x) = 0 for all x ∈ R. Plugging in x = 1, I get
af (1) + bg(1) = 0
a+b=0
af (−1) + bg(−1) = 0
a−b=0
x2 x2
W (f, g)(x) = = 0.
2x 2x
x2 −x2
W (f, g)(x) = = 0.
2x −2x
(0,1)
(1,0) x
The standard basis vectors in R3 are (1, 0, 0), (0, 1, 0), and (0, 0, 1). They can be pictured as arrows of
length 1 pointing along the positive x-axis, the positive y-axis, and the positive z-axis.
z
(0,0,1)
y
(0,1,0)
(1,0,0)
x
Since we’re calling this a “basis”, we’d better check that the name is justified!
Proposition. The standard basis is a basis for F n .
Proof. First, we show that the standard basis spans F n . Let (a1 , a2 , a3 , . . . an ) ∈ F n . I must write this
vector as a linear combination of the standard basis vectors. It’s easy:
a1 · (1, 0, 0, . . . 0)
a2 · (0, 1, 0, . . . 0)
a3 · (0, 0, 1, . . . 0)
..
.
+ an · (0, 0, 0, . . . 1)
(a1 , a2 , a3 , . . . an )
1
For independence, suppose
The computation we did to prove that the set spans shows that the left side is just (a1 , a2 , a3 , . . . an ), so
Since A row reduces to the identity, it is invertible,and there are a number of conditions which are
equivalent to A being invertible.
First, since A is invertible the following system has a unique solution for every (a, b, c):
1 0 1 x a
1 1 0y = b .
0 1 1 z c
In other words, any vector (a, b, c) ∈ R3 can be written as a linear combination of the given vectors.
This proves that the given vectors span R3 .
Second, since A is invertible, the following system has only x = 0, y = 0, z = 0 as a solution:
1 0 1 x 0
1 1 0y = 0.
0 1 1 z 0
2
We’ll generalize the computations we did in the last example.
2. det A 6= 0.
3. The system Ax = 0 has only x = (x1 , x2 , . . . xn ) = (0, 0, . . . 0) as a solution.
x1 v1 + x2 v2 + · · · + xn vn = (0, 0, . . . 0).
Since {v1 , v2 , . . . , vn } is independent, I have x1 = x2 = · · · = xn = 0. This shows that the system above
has only the zero vector as a solution. An earlier result on invertibility shows that A must be invertible.
Conversely, suppose the following matrix is invertible:
↑ ↑ ↑
A = v1 v2 · · · vn .
↓ ↓ ↓
x1 v1 + x2 v2 + · · · + xn vn = (0, 0, . . . 0).
x1 0
↑ ↑ ↑ x2 0
v1 v2 · · · vn
... = ... .
↓ ↓ ↓
xn 0
3
Since A is invertible, the only solution to this system is x1 = x2 = · · · = xn = 0. This shows that
{v1 , v2 , . . . , vn } is independent.
To show that {v1 , v2 , . . . , vn } spans, let (b1 , b2 , . . . bn ) ∈ F n . Consider the system
x1 b1
↑ ↑ ↑ x2 b2
v1 v2 · · · vn
... = ... .
↓ ↓ ↓
xn bn
Since A is invertible, this system has a (unique) solution (x1 , x2 , . . . xn ). That is,
x1 v1 + x2 v2 + · · · + xn vn = (b1 , b2 , . . . bn ).
A basis for V is a spanning set for V , so every vector in V can be written as a linear combination of
basis elements. The next result says that such a linear combination is unique.
Proposition. Let B be a basis for a vector space V . Every v ∈ V can be written in exactly one way as
v = a1 v1 + a2 v2 + · · · + an vn , ai ∈ F, v1 , . . . vn ∈ B.
Proof. Let v ∈ V . Since B spans V , there are scalars a1 , a2 , . . . , an and vectors v1 , . . . vn ∈ B such that
v = a1 v1 + a2 v2 + · · · + an vn .
Suppose that there is another way to do this: There are scalars b1 , b2 , . . . , bm and vectors w1 , . . . wm ∈ B
such that
v = b1 w1 + b2 w2 + · · · + bm wm .
First, note that I can assume that the same set of vectors are involved in both linear combinations —
that is, the v’s and w’s are the same set of vectors. For if not, I can instead use the vectors in the union
S = {v1 , . . . vn } ∪ {w1 , . . . wm }.
I can rewrite both of the original linear combinations as linear combinations of vectors in S, using 0 as
the coefficient of any vector which doesn’t occur in a given combination. Then both linear combinations for
v use the same vectors.
I’ll assume this has been done and just assume that {v1 , v2 , . . . vn } is the set of vectors. Thus, I have
two linear combinations
v = a1 v1 + a2 v2 + · · · + an vn
v = b1 v1 + b2 v2 + · · · + bn vn
Then
a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn .
Hence,
(a1 − b1 )v1 + (a2 − b2 )v2 + · · · + (an − bn )vn = 0.
Since {v1 , v2 , . . . , vn } is independent,
a1 − b1 = 0, a2 − b2 = 0, . . . , an − bn = 0.
Therefore,
a1 = b 1 , a2 = b 2 , . . . , an = b n .
4
That is, the two linear combinations were actually the same. This proves that there’s only one way to
write v as a linear combination of vectors in B.
I want to show that two bases for a vector space must have the same number of elements. I need some
preliminary results, which are important in their own right.
Proof. Write
a11 a12 ··· a1n
a21 a22 ··· a2n
A=
... .. .. .
. .
am1 am2 · · · amn
The condition m < n means that the following system has more variables than equations:
If A row reduces to a row reduced echelon matrix R, then R can have at most m leading coefficients.
Therefore, some of the variables x1 , x2 , . . . , xn will be free variables (parameters); if I assign nonzero values
to the free variables (e.g. by setting all of them equal to 1), the resulting solution will be nontrivial.
Proof. (a) Suppose {w1 , w2 , . . . , wm } is a subset of V , and that m > n. I want to show that {w1 , w2 , . . . , wm }
is dependent.
Write each w as a linear combination of the v’s:
Since m > n, the matrix of a’s has more columns than rows. Therefore, the following system has a
nontrivial solution x1 = b1 , x2 = b2 , . . . xm = bm :
5
That is, not all the b’s are 0, but
But then
b1 a11 a21 ··· am1 b1
↑ ↑ ↑ b2 ↑ ↑ ↑ a12 a22 ··· am2 b2
w1 w2 · · · wm . = v1
.. v2 · · · vn
... .. .. .. .
↓ ↓ ↓ ↓ ↓ ↓ . . .
bm a1n a2n · · · amn bm
Therefore,
b1 0
↑ ↑ ↑ b2 0
w1 w2 · · · wm .. = . .
↓ ↓ ↓
. ..
bm 0
In equation form,
b1 w1 + b2 w2 + · · · + bm wm = 0.
This is a nontrivial linear combination of the w’s which adds up to 0, so the w’s are dependent.
(b) Suppose that {w1 , w2 , . . . , wm } is a set of vectors in V and m < n. I want to show that {w1 , w2 , . . . , wm }
does not span V .
Suppose on the contrary that the w’s span V . Then each v can be written as a linear combination of
the w’s:
v1 = a11 w1 + a12 w2 + · · · + a1m wm
v2 = a21 w1 + a22 w2 + · · · + a2m wm
..
.
vn = an1 w1 + an2 w2 + · · · + anm wm
In matrix form, this is
Since n > m, the coefficient matrix has more columns than rows. Hence, the following system has a
nontrivial solution x1 = b1 , x2 = b2 , . . . xn = bn :
Thus,
a11 a21 ··· an1 b1 0
a12 a22 ··· an2 b2 0
. .. .. . = . .
.. . . .. ..
a1m a2m · · · anm bn 0
6
Multiplying the v and w equation on the right by the b-vector gives
Hence,
b1 0
↑ ↑ ↑ b2 0
v1 v2 · · · vn
... = ... .
↓ ↓ ↓
bn 0
In equation form, this is
b1 v1 + b2 v2 + · · · + bn vn = 0.
Since not all the b’s are 0, this is a nontrivial linear combination of the v’s which adds up to 0 —
contradicting the independence of the v’s.
This contradiction means that the w’s can’t span after all.
Corollary. Let F be a field.
Corollary. If {v1 , . . . , vn } is a basis for a vector space V , then every basis for V has n elements.
Proof. If {w1 , . . . , wm } is another basis for V , then m can’t be less than n or {w1 , . . . , wm } couldn’t span.
Likewise, m can’t be greater than n or {w1 , . . . , wm } couldn’t be independent. Therefore, m = n.
7
Example. Let R[x] denote the R-vector space of polynomials with coefficients in R.
Show that {1, x, x2 , x3 , . . .} is a basis for R[x].
First, a polynomial has the form an xn + an−1 xn−1 + · · · + a1 x + a0 . This is a linear combination of
elements of {1, x, x2 , x3 , . . .}. Hence, the set spans R[x].
To show that the set is independent, suppose there are numbers a0 , a1 , . . . an ∈ R such that
a0 + a1 x + · · · + an xn = 0.
This equation is an identity — it’s true for all x ∈ R. Setting x = 0, I get a0 = 0. Then plugging a0 = 0
back in gives
a1 x + · · · + an xn = 0.
Since this is an identity, I can differentiate both sides to obtain
The next result shows that, in principle, you can construct a basis by:
(a) Starting with an independent set and adding vectors.
Part (a) means that if S is an independent set, then there is a basis T such that S ⊂ T . (If S was a
basis to begin with, then S = T .) Part (b) means that if S is a spanning set, then there is a basis T such
that T ⊂ S.
I’m only proving the result in the case where V has finite dimension n, but it is true for vector spaces
of any dimension.
Proof.
(a) Let {v1 , . . . , vm } be independent. If this set spans V , it’s a basis, and I’m done. Otherwise, there is a
vector v ∈ V which is not in the span of {v1 , . . . , vm }.
I claim that {v, v1 , . . . , vm } is independent. Suppose
av + a1 v1 + · · · + am vm = 0.
1
v = − (a1 v1 + · · · + am vm ) .
a
Since v has been expressed as a linear combination of the vk ’s, it’s in the span of the vk ’s, contrary to
assumption. Therefore, this case is ruled out.
8
The only other possibility is a = 0. Then a1 v1 + · · · + am vm = 0, so independence of the vk ’s implies
a1 = · · · = am = 0. Therefore, {v, v1 , . . . , vm } is independent.
I can continue adding vectors in this way until I get a set which is independent and spans — a basis.
The process must terminate, since no independent set in V can have more than n elements.
(b) Suppose {v1 , . . . , vm } spans V . I want to show that some subset of {v1 , . . . , vm } is a basis.
If {v1 , . . . , vm } is independent, it’s a basis, and I’m done. Otherwise, there is a nontrivial linear combi-
nation
a1 v1 + · · · + am vm = 0.
Assume without loss of generality that a1 6= 0. Then
1
v1 = − (a2 v2 + · · · + am vm ) .
a1
Since v1 is a linear combination of the other v’s, I can remove it and still have a set which spans V ; that
is, V = hv2 , . . . , vm i.
I continue throwing out vectors in this way until I reach a set which spans and is independent — a basis.
The process must terminate, because no set containing fewer than n vectors can span V .
It’s possible to carry out the “adding vectors” and “removing vectors” procedures in some specific cases.
The algorithms are related to those for finding bases for the row space and column space of a matrix,
which I’ll discuss later.
Suppose you know a basis should have n elements, and you have a set S with n elements (“the right
number”). To show S is a basis, you only need to check either that it is independent or that it spans — not
both. I’ll justify this statement, then show by example how you can use it. I need a preliminary result.
Proposition. Let V be a finite dimensional vector space over a field F , and let W be a subspace of V . If
dim W = dim V , then V = W .
Proof. Suppose dim W = dim V = n, but V 6= W . I’ll show that this leads to a contradiction.
Let {x1 , x2 , . . . , xn } be a basis for W . Suppose this is not a basis for V . Since it’s an independent set,
the previous result shows that I can add vectors y1 , y2 , . . . ym to make a basis for V :
{x1 , x2 , . . . , xn , y1 , y2 , . . . ym }.
But this is a basis for V with more than n elements, which is impossible.
Therefore, {x1 , x2 , . . . , xn } is also a basis for V . Let x ∈ V . Since {x1 , x2 , . . . , xn } spans V , I can write
x as a linear combination of the elements of {x1 , x2 , . . . , xn }:
x = a1 x1 + a2 x2 + · · · + an xn , ai ∈ F.
9
Proof. (a) Suppose S is independent. Consider W , the span of S. Then S is independent and spans W ,
so S is a basis for W . Since S has n elements, dim W = n. But W ⊂ V and dim V = n. By the preceding
result, V = W .
Hence, S spans V , and S is a basis for V .
(b) Suppose S spans V . Suppose S is not independent. By an earlier result, I can remove some elements of
S to get a set T which is a basis for V . But now I have a basis T for V with fewer than n elements (since I
removed elements from S, which had n elements).
This is a contradiction, and hence S must be independent.
Example. (a) Determine whether {(1, 2, 1), (2, 1, 1), (2, 0, 1)} is a basis for the Z3 vector space Z33 .
(b) Determine whether {(1, 1, 0), (1, 2, 2), (2, 2, 0)} is a basis for the Z3 vector space Z33 .
(a) Form the matrix with the vectors as columns and row reduce:
1 2 2 1 0 0
2 1 0 → 0 1 0
1 1 1 0 0 1
Since the matrix row reduces to the identity, it is invertible. Since the matrix is invertible, the vectors
are independent. Since we have 3 vectors in a 3-dimensional vector space, the Corollary says that the set is
a basis.
(b) Form the matrix with the vectors as columns and row reduce:
1 1 2 1 0 2
1 2 2 → 0 1 0
0 2 0 0 0 0
The matrix did not row reduce to the identity, so it is not invertible. Since the matrix is not invertible,
the vectors aren’t independent. Hence, the vectors are not a basis.
The row vectors are (1, 0), (0, 1), and (0, 0). The row space is the subspace of R2 spanned by these
vectors. Since the first two vectors are the standard basis vectors for R2 , the row space is R2 .
Lemma. Let A be a matrix with entries in a field. If E is an elementary row operation, then E(A) has the
same row space as A.
Proof. If E is an operation of the form ri ↔ rj , then E(A) and A have the same rows (except for order),
so it’s clear that their row vectors have the same span. Hence, the matrices have the same row space.
If E is an operation of the form ri → ari where a 6= 0, then A and E(A) agree except in the i-th row.
We have
a1 r1 + · · · + ai ri + · · · + am rm = a1 r1 + · · · + ai a−1 (ari ) + · · · + am rm ,
Note that the vectors r1 , . . . , ari , . . . , rm are the rows of E(A). So this equation says any linear
combination of the rows r1 , . . . , rm of A is a linear combination of the rows of E(A). This means that the
row space of A is contained in the row space of E(A).
Going the other way, a linear combination of the rows r1 , . . . , ari , . . . , rm of E(A) looks like this:
b1 r1 + · · · + bi (ari ) + · · · + bm rm .
But this is a linear combination of the rows r1 , . . . , rm of A, so the row space of E(A) is contained in
the row space of A. Hence, A and E(A) have the same row space.
Finally, suppose E is a row operation of the form ri → ri + arj , where a ∈ F . Then
This shows that the row space of A is contained in the row space of E(A).
Conversely,
1
Since row operations preserve row space, row equivalent matrices have the same row space. In particular,
a matrix and its row reduced echelon form have the same row space.
The next proposition describes some of the components of a vector in the row space of a row-reduced
echelon matrix R. Such a vector is a linear combination of the nonzero rows of R.
Proposition. Let R = {rij } be a row reduced echelon matrix over a field with nonzero rows r1 , . . . , rp .
Suppose the leading entries of R occur at
But the only nonzero element in column jk is the leading entry rkjk = 1. Therefore, the only nonzero
term in the sum is ak rkjk = ak .
This result looks a bit technical, but it becomes obvious if you consider an example. Here’s a row
reduced echelon matrix over R:
0 1 2 0 −1 0
0 0 0 1 2 0
.
0 0 0 0 0 1
0 0 0 0 0 0
Here’s a vector in the row space, a linear combination of the nonzero rows:
The leading entries occur in columns j1 = 2, j2 = 4, and j3 = 6. The 2nd , 4th , and 6th components of
the vector are
v2 = a, v4 = b, v6 = c.
You can see from the picture why this happens. The coefficients a, b, c multiply the leading entries. The
leading entries are all 1’s, and they’re the only nonzero elements in their columns. So in the components of
the vector corresponding to those columns, you get a, b, and c.
Corollary. The nonzero rows of a row reduced echelon matrix over a field are independent.
Proof. Suppose R is a row reduced echelon matrix with nonzero rows r1 , . . . , rp . Suppose the leading
entries of R occur at (1, j1 ), (2, j2 ), . . ., where j1 < j2 < · · ·. Suppose
0 = a1 r1 + · · · + ap rp .
The proposition implies that ak = vjk = 0 for all k. Therefore, {ri } are independent.
Corollary. The nonzero rows of a row reduced echelon matrix over a field form a basis for the row space of
the matrix.
2
Proof. The nonzero rows span the row space, and are independent, by the preceding corollary.
Algorithm. Let V be a finite-dimensional vector space, and let v1 , . . . , vm be vectors in V . Find a basis
for W = hv1 , . . . , vm i, the subspace spanned by the vi .
Let M be the matrix whose i-th row is vi . The row space of M is W . Let R be a row-reduced echelon
matrix which is row equivalent to M . Then R and M have the same row space W , and the nonzero rows of
R form a basis for W .
Example. Consider the vectors v1 = (1, 0, 1, 1), v2 = (−2, 1, 1, 0), and v3 = (7, −2, 1, 3) in R4 . Find a basis
for the subspace hv1 , v2 , v3 i spanned by the vectors.
Construct a matrix with the vi as its rows and row reduce:
1 0 1 1 1 0 1 1
−2 1 1 0 → 0 1 3 2
7 −2 1 3 0 0 0 0
The vectors (1, 0, 1, 1) and (0, 1, 3, 2) form a basis for hv1 , v2 , v3 i.
Example. Determine the dimension of the subspace of R3 spanned by (1, 2, −1), (1, 1, 1), and (2, −2, 1).
Form a matrix using the vectors as the rows and row reduce:
1 2 −1 1 0 0
1 1 1 → 0 1 0
2 −2 1 0 0 1
The subspace has dimension 3, since the row reduced echelon matrix has 3 nonzero rows.
Definition. The rank of a matrix over a field is the dimension of its row space.
Example. Find the rank of the following matrix over Z5 :
1 4 2 1
3 3 1 2.
0 1 0 4
3
Therefore,
← r1 →
← r2 →
[ a1 a2 · · · an ] .. = [ a1 r1 + a2 r2 + · · · + an rn ] .
.
← rn →
If instead of a single row vector on the left I have an entire matrix, here’s what I get:
Hence, the rows of the product are linear combinations of the rows r1 , r2 , . . . rn .
Proposition. Let M and N be matrices over a field F which are compatible for multiplication. Then
rank(M N ) ≤ rank N.
Proof. The preceding discussion shows that the rows of M N are linear combinations of the rows of N .
Therefore, the rows of M N are all contained in the row space of N .
The row space of N is a subspace, so it’s closed under taking linear combinations of vectors. Hence,
any linear combination of the rows of M N is in the row space of N . Therefore, the row space of M N is
contained in the row space of N .
From this, it follows that the dimension of the row space of M N is less than or equal to the dimension
of the row space of N — that is, rank(M N ) ≤ rank N .
I already have one algorithm for testing whether a set of vectors in F n is independent. That algorithm
involves constructing a matrix with the vectors as the columns, then row reducing. The algorithm will also
produce a linear combination of the vectors which adds up to the zero vector if the set is dependent.
If all you care about is whether or not a set of vectors in F n is independent — i.e. you don’t care about
a possible dependence relation — the results on rank can be used to give an alternative algorithm. In this
approach, you construct a matrix with the given vectors as the rows.
Let M be the matrix whose i-th row is vi . Let R be a row reduced echelon matrix which is row equivalent
to M . If R has m nonzero rows, then {v1 , . . . , vm } is independent. Otherwise, the set is dependent.
If R has p nonzero rows, then R and M have rank p. (They have the same rank, because they have the
same row space.) Suppose p = m. Since {vi } spans, some subset of {vi } is a basis. However, a basis must
contain p = m elements. Therefore, {vi } must be independent.
Any independent subset of the row space must contain ≤ p elements. Hence, if m > p, {vi } must be
dependent.
Example. Determine whether the vectors v1 = (1, 0, 1, 1), v2 = (−2, 1, 1, 0), and v3 = (7, −2, 1, 3) in R4 are
independent.
Form a matrix with the vectors as the rows and row reduce:
1 0 1 1 1 0 1 1
−2 1 1 0 → 0 1 3 2
7 −2 1 3 0 0 0 0
The row reduced echelon matrix has only two nonzero rows. Hence, the vectors are dependent.
4
I already know that every matrix can be row reduced to a row reduced echelon matrix. The next result
completes the discussion by showing that the row reduced echelon form is unique
Proposition. Every matrix over a field can be row reduced to a unique row reduced echelon matrix.
Proof. Suppose M row reduces to R, a row reduced echelon matrix with nonzero rows r1 , . . . , rp . Suppose
the leading coefficients of R occur at (1, j1 ), (2, j2 ), . . ., where j1 < j2 < · · ·.
Let W be the row space of R and let v = (v1 , . . . , vn ) W . Since r1 , . . . rp span the row space W , we
have
v = a1 r1 + · · · + ap rp .
Claim: The first nonzero component of v must occur in column jk , for some k = 1, 2, . . ..
Suppose ak is the first ai which is nonzero. Since the ai ri terms before ak rk are zero, we have
v = ak rk + · · · + ap rp .
The first nonzero element of rk is a 1 at (k, jk ). The first nonzero element in rk+1 , . . . , rp lies to the
right of column jk . Thus, vj = 0 for j < jk , and vjk = ak . Evidently, this is the first nonzero component of
v. This proves the claim.
This establishes that if a row reduced echelon matrix R′ is row equivalent to M , its leading coefficients
must lie in the same columns as those of R. For the rows of R′ are elements of W , and the claim applies.
Next, I’ll show that the nonzero rows of R′ are the same as the nonzero row of R.
Consider, for instance, the first nonzero rows of R and R′ . Their first nonzero components are 1’s lying
in column j1 . Moreover, both r1 and r1′ have zeros in columns j2 , j3 , . . . .
Suppose r1 6= r1′ . Then r1 − r1′ is a nonzero vector in W whose first nonzero component is not in column
j1 , j2 , . . . , which is a contradiction.
The same argument applies to show that rk = rk′ for all k. Therefore, R = R′ .
In my discussion of bases, I showed that every independent set is a subset of a basis. To put it another
way, you can add vectors to an independent set to get a basis.
Here’s how to find specific vectors to add to an independent set to get a basis.
5
Since there are three nonzero rows and the original set had three vectors, the original set of vectors is
indeed independent.
By examining the row reduced echelon form, I see that the vectors (0, 1, 0, 0, 0) and (0, 0, 0, 0, 1) will not
be linear combinations of the others. Reason: A nonzero linear combination of the rows of the row reduced
echelon form must have a nonzero entry in at least one of the first, third, or fourth columns, since those are
the columns containing the leading entries.
In other words, I’m choosing standard basis vectors with a 1’s in positions not occupied by leading
entries in the row reduced echelon form. Therefore, I can add (0, 1, 0, 0, 0) and (0, 0, 0, 0, 1) to the set and
get a new independent set:
{(2, −4, 1, 0, 8), (−1, 2, −1, −1, −4), (2, −4, 1, 1, 7), (0, 1, 0, 0, 0), (0, 0, 0, 0, 1)} .
The column vectors are (1, 0, 0) and (0, 1, 0). The column space is the subspace of R3 spanned by these
vectors. Thus, the column space consists of all vectors of the form
We’ve seen how to find a basis for the row space of a matrix. We’ll now give an algorithm for finding a
basis for the column space.
First, here’s a reminder about matrix multiplication. If A is an m × n matrix and v ∈ F n , then you can
think of the multiplication Av as multiplying the columns of A by the components of v:
This means that if ci is the i-th column of A and v = (a1 , . . . , an ), the product Av is a linear combination
of the columns of A:
a1
↑ ↑ ↑ a2
c1 c2 · · · cn . = a1 c1 + a2 c2 + · · · + an cn .
..
↓ ↓ ↓
an
Proposition. Let A be a matrix, and let R be the row reduced echelon matrix which is row equivalent to
A. Suppose the leading entries of R occur in columns j1 , . . . , jp , where j1 < · · · < jp , and let ci denote the
i-th column of A. Then {cj1 , . . . , cjp } is independent.
Proof. Suppose that
aj1 cj1 + · · · + ajp cjp = 0, for ai ∈ F.
Form the vector v = (vi ), where
0 if i ∈
/ {j1 , . . . , jp }
vi =
ai if i ∈ {j1 , . . . , jp }
1
However, since R is in row reduced echelon form, c′jk is a vector with 1 in the k-th row and 0’s elsewhere.
Hence, {cj1 , . . . , cjp } is independent, and aj1 = · · · = ajp = 0.
The proof provides an algorithm for finding a basis for the column space of a matrix. Specifically,
row reduce the matrix A to a row reduced echelon matrix R. If the leading entries of R occur in columns
j1 , . . . , jp , then consider the columns cj1 , . . . , cjp of A. These columns form a basis for the column space of
A.
Example. Find a basis for the column space of the real matrix
1 −2 3 1 1
2 1 0 3 1
.
0 −5 6 −1 1
7 1 3 10 4
The leading entries occur in columns 1 and 2. Therefore, (1, 2, 0, 7) and (−2, 1, −5, 1) form a basis for
the column space of A.
Note that if A and B are row equivalent, they don’t necessarily have the same column space. For
example,
1 2 1 → 1 2 1
.
1 2 1 r2 → r2 − r1 0 0 0
However, all the elements of the column space of the second matrix have their second component equal
to 0; this is obviously not true of elements of the column space of the first matrix.
Example. Find a basis for the column space of the following matrix over Z3 :
0 1 1 0
A = 1 2 1 0.
2 1 2 1
2
I showed earlier that you can add vectors to an independent set to get a basis. The column space basis
algorithm shows how to remove vectors from a spanning set to get a basis.
Example. Find a subset of the following set of vectors which forms a basis for R3 .
1 −1 1 4
2 , 1 , 1 , −1
1 −1 1 2
The leading entries occur in columns 1, 2, and 4. Therefore, the corresponding columns of the original
matrix are independent, and form a basis for R3 :
1 −1 4
2 , 1 , −1 .
1 −1 2
Definition. Let A be a matrix. The column rank of A is the dimension of the column space of A.
This is really just a temporary definition, since we’ll show that the column rank is the same as the rank
we define earlier (the dimension of the row space).
Proof. Let R be the row reduced echelon matrix which is row equivalent to A. Suppose the leading entries
of R occur in columns j1 , . . . , jp , where j1 < · · · < jp , and let ci denote the i-th column of A. By the
preceding lemma, {cj1 , . . . , cjp } is independent. There is one vector in this set for each leading entry, and
the number of leading entries equals the row rank. Therefore,
3
Now consider AT . This is A with the rows and columns swapped, so
Therefore,
column rank(A) = rank(A).
Proposition. Let A, B, P and Q be matrices, where P and Q are invertible. Suppose A = P BQ. Then
rank A = rank B.
Proof. I showed earlier that rank M N ≤ rank N . This was row rank; a similar proof shows that
Since row rank and column rank are the same, rank M N ≤ rank M .
Now
But B = P −1 AQ−1 , so repeating the computation gives rank B ≤ rank A. Therefore, rank A = rank B.
Definition. The null space (or kernel) of a matrix A is the set of vectors x such that Ax = 0. The
dimension of the null space of A is called the nullity of A, and is denoted nullity(A).
The null space is the same as the solution space of the system of equations Ax = 0. I showed earlier
that if A is an m × n matrix, then the solution space is a subspace of F n . Thus, the null space of a matrix
is a subspace of F n .
(a)
1
3 −1 1 0
2 = .
−1 1 1 0
−1
Ax = 0.
In the row reduced echelon form, suppose that {xi1 , xi2 , . . . , xip } are the variables corresponding to the
leading entries, and suppose that {xj1 , xj2 , . . . , xjq } are the free variables. Note that p + q = n.
Put the solution in parametric form, writing the leading entry variables {xi1 , xi2 , . . . , xip } in terms of
the free variables (parameters) {xj1 , xj2 , . . . , xjq }:
Plug these expressions into the general solution vector x = (x1 , x2 , . . . , xn ) for the xi components:
So xi1 = fi1 (xj1 , xj2 , . . . , xjq ) for xi1 , then xi2 = fi2 (xj1 , xj2 , . . . , xjq ), for xi2 , and so on. Leave the xj
1
components (those with just {xj1 , xj2 , . . . , xjq }) alone. Schematically, the result looks like this:
(f ′
s)
ik ∗
∗
∗
xj1 1 0 0
(fik ′ s) ∗ ∗ ∗
x1
xj2 0 1 0
x2 ′
. = (fik s) = xj1 ∗ + xj2 ∗ + · · · + xjq ∗ .
.. .. . . .
.. .. ..
.
xn
∗ ∗
∗
(fik ′ s)
0 0 1
xjq
(fik ′ s) ∗ ∗ ∗
The ∗’s represent the stuff that’s left after factoring xj1 , xj2 , . . . out of the f -terms.
In the last expression, the vectors which are being multiplied by xj1 , xj2 , . . . , xjq form a basis for the
null space.
First, the vectors span the null space, because the equation above has expressed an arbitrary vector in
the null space as a linear combination of the vectors.
Second, the vectors are independent. Suppose the linear combination above is equal to the zero vector
(0, 0, . . . 0):
∗ ∗ ∗
1 0 0
∗ ∗ ∗
0
0 1 0
0
∗ ∗ ∗
. + xj2 . + · · · + xjq . = ... .
xj1
.. .. ..
∗
∗
∗ 0
0 0 1
∗ ∗ ∗
Then (f ′
ik s)
xj1
(fik ′ s)
0
xj2
0
(fik ′ s) = . .
.. ..
.
0
(fik ′ s)
xjq
′
(fik s)
We see that xj1 = xj2 = · · · = xjq = 0.
This description is probably hard to understand with all the subscripts flying around, but I think the
examples which follow will make it clear.
Before giving an example, here’s an important result that comes out of the algorithm.
n = rank A + nullity A.
Proof. In the algorithm above, p, the number of leading entry variables, is the rank of A. And q, the number
of free variables, is the same as the number of vectors in the basis for the null space. That is, q = nullity(A).
Finally, I observed earlier that p + q = n. Thus, n = rank A + nullity A.
2
This theorem is a special case of the First Isomorphism Theorem, which you’d see in a course in
abstract algebra.
Example. Find the nullity and a basis for the null space of the real matrix
1 2 0 3
1 2 1 −2 .
2 4 1 1
Let’s follow the steps in the algorithm. First, row reduce the matrix to row-reduced echelon form:
1 2 0 3 1 2 0 3
1 2 1 −2 → 0 0 1 −5
2 4 1 1 0 0 0 0
I’ll use w, x, y, and z as my solution variables. Thinking of the last matrix as representing equations
for a homogeneous system, I have
w + 2x + 3z = 0, or w = −2x − 3z,
y − 5z = 0, or y = 5z.
I’ve expressed the leading entry variables in terms of the free variables. Now I substitute for w and y
in the general solution vector (w, x, y, z):
w −2x − 3z −2 −3
x x 1 0
= = x· +z· .
y 5z 0 5
z z 0 1
After substituting, I broke the resulting vector up into pieces corresponding to each of the free variables
x and z.
The equation above shows that every vector (w, x, y, z) in the null space can be written as a linear
combination of (−2, 1, 0, 0) and (−3, 0, 5, 1). Thus, these two vectors span the null space. They’re also
independent: Suppose
−2 −3 0
1 0 0
x· +z· = .
0 5 0
0 1 0
Then
−2x − 3z 0
x 0
= .
5z 0
z 0
Looking at the second and fourth components, you can see that x = 0 and z = 0.
Hence, {(−2, 1, 0, 0), (−3, 0, 5, 1)} is a basis for the null space. The nullity is 2.
Notice also that the rank is 2, the number of columns is 4, and 4 = 2 + 2, which confirms the preceding
theorem.
Example. Consider the following matrix over Z3 :
1 1 0 2
A = 2 2 1 2.
1 1 1 0
Find bases for the row space, column space, and null space.
3
Row reduce the matrix:
1 1 0 2 1 1 0 2
2 2 1 2 → 0 0 1 1
1 1 1 0 0 0 0 0
{(1, 1, 0, 2), (0, 0, 1, 1)} is a basis for the row space.
The leading entries occur in columns 1 and 3. Taking the first and third columns of the original matrix,
I find that {(1, 2, 1), (0, 1, 1)} is a basis for the column space.
Using a, b, c, and d as variables, I find that the row reduced matrix gives the equations
a + b + 2d = 0, or a = 2b + d,
c + d = 0, or c = 2d.
Thus,
a 2b + d 2 1
b b 1 0
= = b · + d · .
c 2d 0 2
d d 0 1
Therefore, {(2, 1, 0, 0), (1, 0, 2, 1)} is a basis for the null space.
Since you can think of f as taking a 2-dimensional vector as its input and producing a 3-dimensional
vector as its output, you could write
But I’ll supress some of the angle brackets when there’s no danger of confusion.
Example. Define f : R2 → R3 by
I’ll show that f is a linear transformation the hard way. First, I need two 2-dimensional vectors:
u = (u1 , u2 ), v = (v1 , v2 ).
1
I must show that
f (ku + v) = kf (u) + f (v).
I’ll compute the left side and the right side, then show that they’re equal. Here’s the left side:
kf (u)+f (v) = kf ((u1 , u2 ))+f ((v1 , v2 )) = k(u1 +2u2 , u1 −u2 , −2u1 +3u2 )+(v1 +2v2 , v1 −v2 , −2v1 +3v2 ) =
(k(u1 + 2u2 ) + (v1 + 2v2 ), k(u1 − u2 ) + (v1 − v2 ), k(−2u1 + 3u2 ) + (−2v1 + 3v2 )).
Therefore, f (ku + v) = kf (u) + f (v), so f is a linear transformation.
This was a pretty disgusting computation, and it would be a shame to have to go through this every
time. I’ll come up with a better way of recognizing linear transformations shortly.
|f (a + h) − f (a) − Df (a)(h)|
lim = 0.
h→0 |h|
Since f produces outputs in Rm , you can think of f as being built out of m component functions.
Suppose that f = (f1 , f2 , . . . , fm ).
It turns out that the matrix of Df (a) (relative to the standard bases on Rn and Rm ) is the m × n matrix
whose (i, j)th entry is
∂fi
Dj f i = .
∂xj
2
This matrix is called the Jacobian matrix of f at a.
For example, suppose f : R2 → R2 is given by
f (x, y) = (x2 y 3 , x2 − y 5 ).
Then
2xy 3 3x2 y 2
Df (x, y) = .
2x −5y 4
The next lemma gives an easy way of constructing — or recognizing — linear transformations.
Theorem. Let F be a field, and let A be an n × m matrix over F . The function f : F m → F n given by
f (u) = A · u for u ∈ F m
is a linear transformation.
Proof. This is pretty easy given the rules for matrix arithmetic. Let u, v ∈ F m and let k ∈ F . Then
Example. Define f : R2 → R3 by
I’ll show that f is a linear transformation the easy way. Observe that
1 2
x
f ((x, y)) = 1 −1 .
y
−2 3
f is given by multiplication by a matrix of numbers, exactly as in the lemma. ((x, y) is taking the place
of u.) So the lemma implies that f is a linear transformation.
Lemma. Let F be a field, and let V and W be vector spaces over F . Suppose that f : V → W is a linear
transformation. Then:
a. f (~0) = ~0.
b. f (−v) = −f (v) for all v ∈ V .
Proof. (a) Put u = ~0, v = ~0, and k = 1 in the defining equation for a linear transformation. Then
3
(b) I know that −v = (−1) · v, so
The lemma gives a quick way of showing a function is not a linear transformation.
Example. Define g : R2 → R2 by
g(x, y) = (x + 1, y + 2).
Then
g(0, 0) = (1, 2) 6= (0, 0).
Since g does not take the zero vector to the zero vector, it is not a linear transformation.
Be careful! If f (~0) = ~0, you can’t conclude that f is a linear transformation. For example, I showed
that the function f (x, y) = (x2 , y 2 , xy) is not a linear transformation from R2 to R3 . But f (0, 0) = (0, 0, 0),
so it does take the zero vector to the zero vector.
Next, I want to prove the result I mentioned earlier: Every linear transformation on a finite-dimensional
vector space can be represented by matrix multiplication. I’ll begin by reviewing some notation.
Definition. The standard basis vectors for F m are
Thus, ei is an m-dimensional vector with a 1 in the ith position and 0’s elsewhere. For instance, in R3 ,
I showed earlier that {e1 , e2 , . . . , em } is a basis for F m . This implies the following result.
Lemma. Every vector u ∈ F m can be written uniquely as
u = u1 e1 + u2 e2 + · · · + um em ,
where u1 , u2 , . . . , um ∈ W .
Example. In R3 ,
Proof. Regard the standard basis vectors e1 , e2 , . . . , em as m-dimensional column vectors. Then f (e1 ),
f (e2 ), . . . , f (em ) are n-dimensional column vectors, because f produces outputs in F n . Take these m
n-dimensional column vectors f (e1 ), f (e2 ), . . . , f (em ) and build a matrix:
↑ ↑ ↑
A = f (e1 ) f (e2 ) · · · f (em ) .
↓ ↓ ↓
4
I claim that A is the matrix I want. To see this, take a vector u ∈ Rm and write it in component form:
u1
u2
u = (u1 , u2 , . . . , um ) =
... .
um
u = u1 e1 + u2 e2 + · · · + um em .
Then I can use the fact that f is a linear transformation — so f of a sum is the sum of the f ’s, and
constants (like the ui ’s) can be pulled out — to write
u1
↑ ↑ ↑ u2
A · u = f (e1 ) f (e2 ) ... = u1 f (e1 ) + u2 f (e2 ) + · · · + um f (em ).
· · · f (em )
↓ ↓ ↓
um
To get the last equality, think about how matrix multiplication works.
Therefore, f (u) = A · u.
Linear transformations and matrices are not quite identical. If a linear transformation is like a person,
then a matrix for the transformation is like a picture of the person — the point being that there can be many
different pictures of the same person. You get different “pictures” of a linear transformation by changing
coordinates — something I’ll discuss later.
Example. Define f : R2 → R3 by
I already know that f is a linear transformation, and I found its matrix by inspection. Here’s how it
would work using the theorem. I feed the standard basis vectors into f :
You can combine linear transformations to obtain other linear transformations. First, I’ll consider sums
and scalar multiplication.
Definition. Let f, g : V → W be linear transformations of vector spaces over a field F , and let k ∈ F .
5
a. The sum f + g of f and g is the function V → W which is defined by
Lemma. Let f, g : V → W be linear transformations of vector spaces over a field F , and let k ∈ F .
a. f + g is a linear transformation.
b. k · f is a linear transformation.
Proof. I’ll prove the first part by way of example and leave the proof of the second part to you.
Let u, v ∈ V and let k ∈ F . Then
Y Z
g
f
X
g of
The ◦ between the g and f does not mean multiplication, but it’s so easy to confuse that I’ll usually
just write “g(f (x))” when I want to compose functions.
Note that things go from left to right in the picture, but that they go right to left in “g(f (x))”. The
effect is to do f first, then g.
Lemma. Let f : U → V and g : V → W be linear transformations of vector spaces over a field F . Then
g ◦ f : U → W is a linear transformation.
(g ◦ f )(ku1 + u2 ) = g(f (ku1 + u2 )) = g(kf (u1 ) + f (u2 )) = kg(f (u1 )) + g(f (u2 )) = k(g ◦ f )(u1 ) + (g ◦ f )(u2 ).
f g
Rm −→ Rn −→ Rp
6
f and g can be represented by matrices; I’ll use [f ] for the matrix of f and [g] for the matrix of g, so
The matrix for g ◦ f is [g] · [f ], the product of the matrices for f and g.
Example. Suppose
That is,
(g ◦ f )(x, y) = 3x + 9y, 2x − 7y, −3x − 24y).
Example. The idea of composing transformations can be extended to affine transformations. For the
sake of this example, you can think of an affine transformation as a linear transformation plus a translation
(a constant). This provides a powerful way of doing various geometric constructions.
For example, I wanted to write a program to generate self-similar fractals. There are many self-similar
fractals, but I was interested in those that can be constructed in the following way. Start with an initiator,
which is a collection of segments in the plane. For example, this initiator is a square:
7
Next, I need a generator. It’s a collection of segments which start at the point (0, 0) and end at the
point (1, 0). Here’s a generator shaped like a square hump:
(1/3,1/3) (2/3,1/3)
The construction proceeds in stages. Start with the initiator and replace each segment with a scaled
copy of the generator. (There is an issue here of which way you “flip” the generator when you copy it, but
I’ll ignore this for simplicity.) Here’s what you get by replacing the segments of the square with copies of
the square hump:
Now keep going. Take the current figure and replace all of its segments with copies of the generator.
And so on. Here’s what you get after around 4 steps:
Roughly, self-similarity means that if you enlarge a piece of the figure, the enlarged piece looks like
the original. If you imagine carrying out infinitely many steps of the construction above, you’d get a figure
which would look “the same” no matter how much you enlarged it — which is a very crude definition of a
fractal. If you’re interested in this stuff, you should look at Benoit Mandelbrot’s classic book ([1]) — it has
great pictures!
What does this have to do with transformations? The idea is that to replace a segment of the current
figure with a scaled copy of the generator, you need to stretch the generator, rotate it, then translate it.
Here’s a picture with a different initiator and generator:
8
Stretching by a factor of k amounts to multiplying by the matrix
k 0
.
0 k
By thinking of the operations as linear or affine transformations, it is very easy to write down the
formula.
0 0
0 0
1 1
... = ... ,
AB
0 0
..
.
0 0
0 0
... = ... .
AB
1 1
9
Put the vectors in the equations above into the columns of a matrix. They form the identity:
1 0 ··· 0 1 0 ··· 0
0 1 ··· 0 0 1 ··· 0
AB ... ... .. =. .
. .. ..
..
.
, or ABI = I, so AB = I.
0 0 ··· 1 0 0 ··· 1
The matrix of f is
1 3
.
1 4
Therefore, the matrix of f −1 is
4 −3
.
−1 1
Hence, the inverse transformation is
f −1 (f (x, y)) = f −1 (x + 3y, x + 4y) = (4(x + 3y) − 3(x + 4y), −(x + 3y) + (x + 4y)) = (x, y).
Example. Let R2 [x] denote the set of polynomials with real coefficients of degree 2 or less. Thus,
x2 + 3x + 2, −7x2 , 0, 42x − 5.
10
You can represent polynomials in R2 [x] by vectors. For example, here is how to represent x2 + 3x + 2
as a vector:
( 2 , 3 , 1 )
↑ ↑ ↑
2 + 3x + x2
I’m writing the coefficients with the powers increasing to make it easy to extend this to higher powers.
For example,
( 7 , −1 , 4 , 5 )
↑ ↑ ↑ ↑
7 − x + 4x2 + 5x3
This means that D is a linear transformation R2 [x] → R2 [x]. What is its matrix?
To find the matrix of a linear transformation (relative to the standard basis), apply the transformation
to the standard basis vectors. Use the results as the columns of your matrix.
In vector form, R2 [x] is just R3 , so the standard basis vectors are
As I apply D, I’ll translate to polynomial notation to make it easier for you to follow:
Example. Construct a linear transformation which maps the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 to the
parallelogram in R3 determined by the vectors (2, −3, 6) and (1, −1, 1).
11
The idea is to send vectors for the square’s sides — namely, (1, 0) and (0, 1) — to the target vectors
(2, −3, 6) and (1, −1, 1). If I do this with a linear transformation, the rest of the square will go with them.
[1] Benoit Mandelbrot, The fractal geometry of nature. New York: W. H. Freeman and Company, 1983.
[ISBN 0-7167-1186-9]
v = a1 v1 + · · · + an vn , vi ∈ B, ai ∈ F.
(Let me remind you of why this is true. Since a basis spans, every v ∈ V can be written in this way.
On the other hand, if a1 v1 + · · · + an vn = a′1 v1 + · · · + a′n vn are two ways of writing a given vector, then
(a1 − a′1 )v1 + · · · (an − a′n )vn = 0, and by independence a1 − a′1 = 0, . . . , an − a′n = 0 — that is, a1 = a′1 ,
. . . , an = a′n . So the representation of a vector in this way is unique.)
Consider the situation where B is a finite ordered basis — that is, fix a numbering v1 , . . . , vn of the
elements of B. If v = a1 v1 + · · · + an vn , the ordered list of coefficients (a1 , . . . , an ) is uniquely associated
with v. The {ai } are the components of v with respect to the (ordered) basis B; I will use the notation
v = (a1 , . . . , an )B .
It is easy to confuse a vector with the representation of the vector in terms of its components relative
to a basis. This confusion arises because representation of vectors which is most familiar is that of a vector
as an ordinary n-tuple in Rn :
Rn = {(a1 , . . . , an ) | ai ∈ R}.
This amounts to identifying the elements of Rn with their representation relative to the standard basis
e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1).
is a basis for R3 .
These are three vectors in R3 , which has dimension 3. Hence, it suffices to check that they’re indepen-
dent. Form the matrix with the elements of B as its rows and row reduce:
1 0 1 1 0 0
2 −1 1 → 0 1 0
3 1 0 0 0 1
The vectors are independent. Three independent vectors in R3 must form a basis.
(b) Find the components of (15, −1, 2) relative to B.
I must find numbers a, b, and c such that
15 1 2 3
−1 = a · 0 + b · −1 + c · 1 .
2 1 1 0
1
This is equivalent to the matrix equation
1 2 3 a 15
0 −1 1 b = −1 .
1 1 0 c 2
Set up the matrix for the system and row reduce to solve:
1 2 3 15 1 0 0 −2
0 −1 1 −1 → 0 1 0 4
1 1 0 2 0 0 1 3
says
1 2 3
0 −1 1 vB = vstd .
1 1 0
In (b), I knew vstd and I wanted vB ; this time it’s the other way around. So I simply put (7, −2, 2)B
into the vB spot and multiply:
1 2 3 7 9
0 −1 1 −2 = 4 .
1 1 0 2 5
M vB = vstd .
I’ll write [B → std] for M , and call it a translation matrix. Again, [B → std] translates vectors
written in terms of B to vectors written in terms of the standard basis.
The inverse of a square matrix M is a matrix M −1 such that M M −1 = M −1 M = I, where I is the
identity matrix. If I multiply the last equation on the left by M −1 , I get
M −1 M vB = M −1 vstd , or vB = M −1 vstd .
[std → B] = [B → std]−1 .
2
In the example above, left multiplication by the following matrix translates vectors from B to the
standard basis:
1 2 3
[B → std] = 0 −1 1 .
1 1 0
The inverse of is 1 3 5
−
4 4 4
−1
1 3 1
[std → B] = [B → std] = − − .
4 4 4
1 1 1
−
4 4 4
Left multiplication by this matrix translates vectors from the standard basis to B.
Example. (Translating vectors from one basis to another) The translation analogy is a useful one,
since it makes it easy to see how to set up arbitrary changes of basis.
For example, suppose
1 2 1
B′ = −1 , 0 , 1
2 1 1
Remember that the product [standard → B][B′ → standard] is read from right to left! Thus, the
composite operation [standard → B][B′ → standard] translates a B′ vector to a standard vector, and then
translates the resulting standard vector to a B vector. Moreover, I have matrices which perform each of the
right-hand operations.
This matrix translates vectors from B′ to the standard basis:
1 2 1
−1 0 1 .
2 1 1
The numbers {aij } are uniquely determined by f . The m × n matrix A = (aij ) is the matrix of f
relative to the ordered bases B and C. I’ll use [f ]B,C to denote this matrix. Here’s how to find it.
• To find [f ]B,C , take an element vj in the basis B, apply f to vj , and express the result as a linear
combination of elements of C The coefficients in the linear combination make up the j th column of
[f ]B,C .
B
↑ ↑ ↑
C f (v1 ) f (v2 ) · · · f (vn )
↓ ↓ ↓
I’ll use std to denote the standard basis for F n .
Then
6 −13
[f ]B,C = .
11 π
Read the description of [f ]B,C preceding this example and verify that [f ]B,C was constructed by following
the steps in the description.
Find [f ]std,std .
Apply f to the elements of the standard basis for R2 , and write the results in terms of the standard
basis for R3 :
f (1, 0) = (1, 3, −1) = 1 · (1, 0, 0) + 3 · (0, 1, 0) + (−1) · (0, 0, 1),
f (0, 1) = (2, 0, 5) = 2 · (1, 0, 0) + 0 · (0, 1, 0) + 5 · (0, 0, 1).
1
Take the coefficients in the linear combinations and use them to make the columns of the matrix:
1 2
[f ]std,std = 3 0 .
−1 5
Find [f ]std,B .
Apply f to the elements of the standard basis for R2 , and write the results in terms of B:
vj = (0, . . . , 0, 1, 0, . . . , 0).
j
Then
0
a
11 ··· a1j ··· a1n
..
a1j
a21 ··· a2j ··· a2n . a2j
. .. .. 1 = .. .
.
.
. . . .
..
am1 · · · amj · · · amn amj
0
m
X m
X
This is correct, since f (vj ) = aij wi , and the representation of aij wi in terms of the basis C =
i=0 i=0
{w1 , . . . , wm } is
(a1j , a2j , . . . , amj ).
The matrix of a linear transformation is like a snapshot of a person — there are many pictures of a
person, but only one person. Likewise, a given linear transformation can be represented by matrices with
respect to many choices of bases for the domain and range.
2
In the last example, finding [f ]std,std turned out to be easy, whereas finding the matrix of f relative to
other bases is more difficult. Here’s how to use change of basis matrices to make things simpler.
Suppose you have bases B and C and you want [f ]B,C .
1. Find [f ]std,std . Usually, you can find this from the definition.
2. Find the change of basis matrices [B → std] and [C → std]. (Take the basis elements written in terms of
the standard bases and use them as the columns of the matrices.)
3. Find [std → C] = [C → std]−1 .
4. Then
[f ]B,C = [std → C][f ]std,std [B → std].
Do you see why this works? Reading from right to left, an input vector written in terms of B is translated
to the standard basis by [B → std]. Next, [f ]std,std takes the standard vector, applies f , and writes the output
as a standard vector. Finally, [std → C] takes the standard vector output and translates it to a C vector.
I’ll illustrate this in the next example.
Example. Define f : R2 → R3 by
x+y 1 1
x x
f = 2x − y = 2 −1 .
y y
x−y 1 −1
The matrix above is the matrix of f relative to the standard bases of R2 and R3 .
Next, consider the following bases for R2 and R3 , respectively:
2 1
B= , ,
1 1
1 1 −1
C = 1,2, 0 .
1 1 −2
Hence, the inverse matrix translates vectors from the standard basis to C:
−1
1 1 −1 4 −1 −2
[std → C] = 1 2 0 = −2 1 1 .
1 1 −2 1 0 −1
3
Therefore,
4 −1 −2 1 1 7 7
2 1
[f ]B,C = −2 1 1 2 −1 = −2 −3 .
1 1
1 0 −1 1 −1 2 2
The numbers are simple enough that I could figure out the linear combination by inspection.
Apply T :
3 1 0 1 0
T =T 3· + (−5) · =3·T + (−5) · T =
−5 0 1 0 1
1 1 3 −5 −2
3· + (−5) · = + = .
1 −1 3 5 8
Therefore,
3 1 1 1 3 −1
= = .
−5 2 1 −1 −5 4 B
Hence,
3 2 1 −1 2
T = = .
−5 −1 2 4 B
9
4
Example. Suppose T : R2 → R2 is given by
T (x, y) = (4x − 2y, x + y).
Let
5 1
B= , .
3 1
(a) Find [T ]std,std .
4 −2
[T ]std,std = .
1 1
(b) Find [T ]B,B .
First,
5 1
[B → std] = .
3 1
Hence,
−1
−1 5 1 1 1 −1
[std → B] = [B → std] = = .
3 1 2 −3 5
Then
1 1 −1 4 −2 5 1 3 0
[T ]B,B = [std → B][T ]std,std [B → std] = = .
2 −3 5 1 1 3 1 −1 2
(c) Compute [T ((4, 3)B )]B .
This means: Apply T to the vector (4, 3)B and write the result in terms of B.
3 0 4 12
[T ((4, 3)B )]B = [T ]B,B · (4, 3)B = = .
−1 2 3 2 B
i is a special symbol, sometimes called the complex or imaginary unit. It behaves somewhat like a
variable, but it has a special multiplication property we’ll see below.
The word “imaginary” is traditional, but not very good: It might lead you to believe that there’s
something “fake” about complex numbers, as opposed to the real numbers. In fact, the complex numbers
are a number system like the real numbers or the integers, and there is nothing “fake” about them: I’ll
sketch a construction of the complex numbers using matrices below.
You can picture a complex number a+ bi as a point in the x-y-plane; we think of “a” as the x-coordinate
and bi (or b) as the y-coordinate.
a+bi
bi
i · i = −1.
This means that i2 = −1, so i is described as “the square root of −1”. You might object that −1 doesn’t
have a square root. What is true is that no real number is a square root of −1, but the complex numbers
are a different number system.
You can divide by (nonzero) complex numbers by “multiplying the top and bottom by the conjugate”:
a + bi a + bi c − di (ac + bd) + (−ad + bc)i
= · = .
c + di c + di c − di c2 + d2
With these operations, the set of complex numbers forms a field.
At this point, you might be a bit suspicious — how can we just make up a bunch of symbols and
operations and call the result “the complex numbers”? Here’s a sketch which describes how you can “build”
the complex numbers using matrices. The complex number a + bi will correspond to the real 2 × 2 matrix
a b
.
−b a
1
Setting a = 0 and b = 1, we see that the complex number i should correspond to the matrix
0 1
.
−1 0
The operations will be matrix addition and multiplication. Let’s try them out:
a b c d a+c b+d
+ = ,
−b a −d c −b − d a + c
a b c d ac − bd ad + bc
= .
−b a −d c −ad − bc ac − bd
Notice that we’re getting the same expressions we gave above for addition and multiplication of complex
numbers. Thus, if you’re comfortable with matrices and matrix arithmetic, you can think of “a+bi” notation
as shorthand for the matrix equivalents above. In an abstract algebra course, you’ll probably see another
construction of the complex numbers using quotient rings. In linear algebra, we’ll just use the “a + bi”
form of complex numbers, as it is the simplest for computations.
To continue with our discussion of complex number arithmetic, I’ll note here that the conjugate of
complex number is obtained by flipping the sign of the imaginary part. The conjugate of a + bi is denoted
a + bi or sometimes (a + bi)∗ . Thus,
a + bi = a − bi.
The norm of a complex number is
p
|a + bi| = a2 + b 2 .
Note that
(a + bi)(a + bi) = (a + bi)(a − bi) = a2 + b2 = |a + bi|2 .
When a complex number is written in the form a + bi, it’s said to be in rectangular form. There is
another form for complex numbers that is useful: The polar form reiθ . In this form, r and θ have the same
meanings that they do in polar coordinates.
DeMoivre’s formula relates the polar and rectangular forms:
This key result can be proven, for example, by expanding both sides in power series. Using this, we get
2
Example. Convert 3 + 4i to polar form.
3 + 4i 3 4
3 + 4i = |3 + 4i| · =5 + i .
|3 + 4i| 5 5
4 3
Let θ = sin−1 (or cos−1 ). Then
5 5
3 + 4i = 5(cos θ + i sin θ).
In the examples that follow, we’ll the polar and complex exponential forms of complex numbers to
simplify algebraic computations, derive trig identities, and compute integrals.
√
Example. (A trick with Demoivre’s formula) Find (1 + i 3)8 .
It would be tedious to try to multiply this out. Instead, I’ll try to write the expression in terms of
cos θ + i sin θ for a good choice of θ.
√ !8
√ 8 8 1 3 π π 8
(1 + i 3) = 2 · +i = 256 cos + i sin = 256(eπi/3 )8 = 256e8πi/3 =
2 2 3 3
√ !
√
8π 8π 2π 2π 1 3
256 cos + i sin = 256 cos + i sin = 256 − + i = −128 + 128i 3.
3 3 3 3 2 2
Example. Let
2 −3 1
A = 1 −2 1 ∈ M (3, R).
1 −3 2
You can use row and column operations to simplify the computation of det(A − λI):
2−λ −3 1 2−λ −3 1
1 −2 − λ 1 = 0 1−λ −1 + λ
1 −3 2−λ 1 −3 2−λ
r2 → r2 − r3
2−λ −3 −2
= 0 1−λ 0 .
1 −3 −1 − λ
c3 → c3 + c2
(Adding a multiple of a row or a column to a row or column, respectively, does not change the determi-
nant.) Now expand by cofactors of the second row:
Example. A matrix A ∈ Mn (F ) is upper triangular if Aij = 0 for i > j. Thus, the entries below the main
diagonal are zero. (Lower triangular matrices are defined in an analogous way.)
The eigenvalues of a triangular matrix
λ1 ∗ ··· ∗
0 λ2 ··· ∗
A=
... .. .. ..
. . .
0 0 · · · λn
1
are just the diagonal entries λ1 , . . . , λn . (You can prove this by induction on n.)
Remark. To find the eigenvalues of a matrix, you need to find the roots of the characteristic polynomial.
There are formulas for finding the roots of polynomials of degree ≤ 4. (For example, the quadratic
formula gives the roots of a quadratic equation ax2 + bx + c = 0.) However, Abel showed in the early part
of the 19-th century that the general quintic is not solvable by radicals. (For example, 2x5 − 5x4 + 5 is
not solvable by radicals over Q.) In the real world, the computation of eigenvalues often requires numerical
approximation.
If λ is an eigenvalue of A, then det(A − λI) = 0. Hence, the n × n matrix A − λI is not invertible. It
follows that A−λI must row reduce to a row reduced echelon matrix R with fewer than n leading coefficients.
Thus, the system Rx = 0 has at least one free variable, and hence has more than one solution. In particular,
Rx = 0 — and therefore, (A − λI)x = 0 — has at least one nonzero solution.
Definition. Let A ∈ Mn (F ), and let λ be an eigenvalue of A. An eigenvector (or a characteristic vector)
of A for λ is a nonzero vector v ∈ F n such that
Av = λv.
Equivalently,
(A − λI)v = 0.
Example. Let
2 −3 1
A = 1 −2 1 ∈ M (3, R).
1 −3 2
The eigenvalues are λ = 0, λ = 1 (double).
First, I’ll find an eigenvector for λ = 0.
2 −3 1
A − 0 · I = 1 −2 1 .
1 −3 2
This says
a − c = 0
b − c = 0
Therefore, a = c, b = c, and the eigenvector is
a c 1
b = c = c1.
c c 1
2
Notice that this is the usual algorithm for finding a basis for the solution space of a homogeneous system
(or the null space of a matrix).
I can set c to any nonzero number. For example, c = 1 gives the eigenvector (1, 1, 1). Notice that there
are infinitely many eigenvectors for this eigenvalue, but all of these eigenvectors are multiples of (1, 1, 1).
Likewise,
1 −3 1 1 −3 1
A − I = 1 −3 1 → 0 0 0
1 −3 1 0 0 0
Hence, the eigenvectors are
a 3b − c 3 −1
b = b = b1 + c 0 .
c c 0 1
Taking b = 1, c = 0 gives (3, 1, 0); taking b = 0, c = 1 gives (−1, 0, 1). This eigenvalue gives rise to two
independent eigenvectors.
Note, however, that a double root of the characteristic polynomial need not give rise to two independent
eigenvectors.
Definition. Matrices A, B ∈ M (n, F ) are similar if there is an invertible matrix P ∈ M (n, F ) such that
P AP −1 = B.
Lemma. Similar matrices have the same characteristic polynomial (and hence the same eigenvalues).
Proof.
P AP −1 − λI = P AP −1 − λP IP −1 = P (A − λI)P −1 .
Therefore, the matrices A − λI and P AP −1 − λI are similar. Hence, they have the same determinant.
The determinant of A − λI is the characteristic polynomial of A and the determinant of P AP −1 − λI is the
characteristic polynomial of P AP −1 .
Definition. Let T : V → V be a linear transformation, where V is a finite-dimensional vector space. The
characteristic polynomial of T is the characteristic polynomial of a matrix of T relative to a basis B of
V.
The preceding lemma shows that this is independent of the choice of basis. For if B and C are bases for
V , then
[T ]C,C = [B → C][T ]B,B [C → B] = [B → C][T ]B,B [B → C]−1 .
Therefore, [T ]C,C and [T ]B,B are similar, so they have the same characteristic polynomial.
This shows that it makes sense to speak of the eigenvalues and eigenvectors of a linear transformation
T.
Definition. A matrix A ∈ Mn (F ) is diagonalizable if A has n independent eigenvectors — that is, if there
is a basis for F n consisting of eigenvectors of A.
Proposition. A ∈ Mn (F ) is diagonalizable if and only if it is similar to a diagonal matrix.
Proof. Let {v1 , . . . , vn } be n independent eigenvectors for A corresponding to eigenvalues {λ1 , . . . , λn }. Let
T be the linear transformation corresponding to A:
T (v) = Av.
Since Avi = λi vi for all i, the matrix of T relative to the basis B = {v1 , . . . , vn } is
λ1 0 · · · 0
0 λ2 · · · 0
[T ]B,B =
... .. .. .
. .
0 0 · · · λn
3
Now A is the matrix of T relative to the standard basis, so
The matrix [B → std] is obtained by building a matrix using the v1 , . . . , vn as the columns. Then
[std → B] = [B → std]−1 .
Hence,
λ1 0 · · · 0
0 λ2 · · · 0
.
.. .. .. = [B → std]−1 · A · [B → std].
. .
0 0 · · · λn
Conversely, if D is diagonal, P is invertible, and D = P −1 AP , the columns c1 , . . . , cn of P are indepen-
dent eigenvectors for A. In fact, if
λ1 0 · · · 0
0 λ2 · · · 0
D= ... .. .. ,
. .
0 0 · · · λn
then P D = AP says
λ1 0 ··· 0
↑ ↑ ↑ 0 λ2 ··· 0 ↑ ↑ ↑
c1 c2 · · · cn
... .. ..
= A c 1 c2 · · · cn .
↓ ↓ ↓ . . ↓ ↓ ↓
0 0 · · · λn
Hence, λi ci = Aci .
In an earlier example, I showed that A has 3 independent eigenvectors (1, 1, 1), (3, 1, 0), (−1, 0, 1).
Therefore, A is diagonalizable.
To find a diagonalizing matrix, build a matrix using the eigenvectors as the columns:
1 3 1
P = 1 1 0.
1 0 1
You can check by finding P −1 and doing the multiplication that you get a diagonal matrix:
−1
1 3 1 2 −3 1 1 3 1
P −1 AP = 1 1 0 1 −2 1 1 1 0 =
1 0 1 1 −3 2 1 0 1
−1 3 −1 2 −3 1 1 3 1 0 0 0
1 −2 1 1 −2 1 1 1 0 = 0 1 0.
1 −3 2 1 −3 2 1 0 1 0 0 1
Of course, I knew this was the answer! I should get a diagonal matrix with the eigenvalues on the main
diagonal, in the same order that I put the corresponding eigenvectors into P .
4
You can put the eigenvectors in as the columns of P in any order: A different order will give a diagonal
matrix with the eigenvalues on the main diagonal in a different order.
Example. Let
4 −4 −5
A = 1 0 −3 ∈ M (3, R).
0 0 2
Find the eigenvalues and, for each eigenvalue, a complete set of eigenvectors. If A is diagonalizable, find
a matrix P such that P −1 AP is a diagonal matrix.
4 − x −4 −5
4 − x −4
|A − xI| = 1 −x −3 = (2 − x) = (2 − x)(x2 − 4x + 4) = −(x − 2)3 .
1 −x
0 0 2−x
The eigenvalue is x = 2.
Now
2 −4 −5 1 −2 0
A − 2I = 1 −2 −3 → 0 0 1.
0 0 0 0 0 0
Thinking of this as the coefficient matrix of a homogeneous linear system with variables a, b, and c, I
obtain the equations
a − 2b = 0, c = 0.
Then a = 2b, so
a 2b 2
b = b = b · 1.
c 0 0
(2, 1, 0) is an eigenvector. Since there’s only one independent eigenvector — as opposed to 3 — the
matrix A is not diagonalizable.
Now
1
−4 3 −5 1 0
B − 1 · I = 12 −8 14 → 0
2
1 −1
10 −7 12 0 0 0
Thinking of this as the coefficient matrix of a homogeneous linear system with variables a, b, and c, I
obtain the equations
1
a + c = 0, b − c = 0.
2
Set c = 2. This gives a = −1 and b = 2. Thus, the only eigenvectors are the nonzero multiples of
(−1, 2, 2). Since there is only one independent eigenvectors, B is not diagonalizable.
5
Proof. Suppose to the contrary that {v1 , . . . , vn } is dependent. Let p be the smallest number such that the
subset {v1 , . . . , vp } is dependent. Then there is a nontrivial linear relation
a1 v1 + · · · ap vp = 0.
vp = b1 v1 + · · · + bp−1 vp−1 .
Since the eigenvalues are distinct, the terms λp − λi are nonzero. Hence, this is a linear relation in
v1 , . . . , vp−1 which contradicts minimality of p — unless b1 = · · · = bp−1 = 0.
In this case, vp = 0, which contradicts the fact that vp is an eigenvector. Therefore, the original set
must in fact be independent.
Example. Let A be an n × n real matrix. The complex eigenvalues of A always come in conjugate pairs
a + bi and a − bi.
Moreover, if v is an eigenvector for λ = a + bi, then the conjugate v ∗ is an eigenvector for λ∗ = a − bi.
For suppose Av = λv. Taking complex conjugates, I get
A∗ v ∗ = λ∗ v ∗ , Av ∗ = λ∗ v ∗ .
I knew that the second row (2, 2 − i) must be a multiple of the first row, because I know the system has
nontrivial solutions. So I don’t have to work out what multiple it is; I can just zero out the second row on
general principles.
This only works for 2 × 2 matrices, and only for those which are A − λI’s in eigenvector computations.
6
Next, there’s no point in going all the way to row reduced echelon form. I just need some nonzero vector
(a, b) such that
−1 − i −1 a 0
= .
0 0 b 0
That is, I want
(−1 − i)a + (−1)b = 0.
I can find an a and b that work by swapping −1 − i and −1, and negating one of them. For example,
take a = 1 (−1 negated) and b = −1 − i. This checks:
Notice that you get a diagonal matrix with the eigenvalues on the main diagonal, in the same order in
which you listed the eigenvectors.
Example. For the following matrix, find the eigenvalues over C, and for each eigenvalue, a complete set of
independent eigenvectors.
Find a diagonalizing matrix and the corresponding diagonal matrix.
−2 0 5
A= 0 2 0.
−5 0 4
−2 − x 0 5
−2 − x 5
|A − xI| = 0 2−x 0 = (2 − x) = (2 − x)[(x + 2)(x − 4) + 25] =
−5 4−x
−5 0 4−x
(2 − x)(x2 − 2x + 17).
Now
x2 − 2x + 17 = 0
(x − 1)2 + 16 = 0
(x − 1)2 = −16
x − 1 = ±4i
x = 1 ± 4i
The eigenvalues are x = 2 and x = 1 ± 4i.
For x = 2, I have
−4 0 5 1 0 0
A − 2I = 0 0 0 → 0 0 1.
−5 0 2 0 0 0
7
With variables a, b, and c, the corresponding homogeneous system is a = 0 and c = 0. This gives the
solution vector
a 0 0
b = b = b · 1.
c 0 0
Taking b = 1, I obtain the eigenvector (0, 1, 0).
For x = 1 + 4i, I have
−3 − 4i 0 5 −5 0 3 − 4i
0 1 − 4i 0 → 0 1 0
−5 0 3 − 4i −5 0 3 − 4i
I multiplied the first row by 3 − 4i, then divided it by 5. This made it the same as the third row.
I divided the second row by 1 − 4i.
(I knew the the first and third rows had to be multiples, since they’re clearly independent of the second
row. Thus, if they weren’t multiples, the three rows would be independent, the eigenvector matrix would be
invertible, and there would be no eigenvectors [which must be nonzero].)
Now I can wipe out the third row by subtracting the first:
−5 0 3 − 4i −5 0 3 − 4i
0 1 0 → 0 1 0 .
−5 0 3 − 4i 0 0 0
There will only be one parameter (c), so there will only be one independent eigenvector. To get one,
switch the “−5” and “3 − 4i” and negate the “−5” to get “5”. This gives a = 3 − 4i, b = 0, and c = 5. You
can see that these values for a and c work:
Now X X
T vi = δijT vj , so (Aji I − δijT ) vj = 0.
j j
Hence, X X
0 = (adj B)ki Bij vj = (adj B)ki Bij vj .
j j
This equation holds for all i and all k, so I’ll still get 0 if I sum on i. So I’ll sum on i, then interchange
the order of summation: XX
0= (adj B)ki Bij vj ,
i j
!
X X
0= (adj B)ki Bij vj .
j i
P
Now i (adj B)ki Bij is the (k, j)-th entry of (adj B · B) = |B|I. Hence,
X
0= |B|δkjvj = |B|vk .
j
1
Corollary. The minimal polynomial divides the characteristic polynomial.
an y (n) + · · · + a2 y ′′ + a1 y ′ + a0 y = 0.
Here y is a function of x, and an , . . . , a0 are constants. Linear means the equation is a sum of the
derivatives of y, each multiplied by x stuff. (In this case, the x stuff is constant.) Homogeneous means that
the right side is 0 — there’s no term involving only x.
d
It’s convenient to let D = stand for the operation of differentiating with respect to x. (Note that
dx
d dy
D = is the operation of differentiation, whereas Dy = is the derivative.) In this notation, D2
dx dx
computes the second derivative, D3 computes the third derivative, and so on. The equation above becomes
(an Dn + · · · a2 D2 + a1 D + a0 )y = 0.
Example. The following equations are linear homogeneous equations with constant coefficients:
y ′′′ + 3y ′′ + 3y ′ + y = 0,
y ′′ − 5y ′ − 6y = 0,
((D − 1)(D − 2)(D − π)y = 0.
A solution to the equation is a function y = f (x) which satisfies the equation. Equivalently, if you think
of an Dn + · · · a2 D2 + a1 D + a0 as a linear transformation, it is an element of the kernel of the transformation.
The general solution is a linear combination of the elements of a basis for the kernel, with the
coefficients being arbitrary constants.
The form of the equation makes it reasonable that a solution should be a function whose derivatives are
constant multiples of itself. emx is such a function:
d mx d mx dn mx
e = remx , e = m2 emx , ..., e = mn emx .
dx dx2 dxn
Plug emx into
y (n) + bn−1 y (n−1) + · · · + b2 y ′′ + b1 y ′ + b0 y = 0.
The result:
mn emx + bn−1 mn−1 emx + · · · + b2 m2 emx + b1 memx + b0 emx = 0.
Factor out emx and cancel it. This leaves
mn + bn−1 mn−1 + · · · + b2 m2 + b1 m + b0 = 0.
Thus, emx is a solution to the original equation exactly when m is a root of this polynomial. The
polynomial is called the characteristic polynomial; as the derivation showed, it’s obtained by building a
polynomial using the coefficients of the original differential equation.
1
Example. Solve y ′′ − 5y ′ − 6y = 0.
The characteristic polynomial is m2 − 5m − 6; solving m2 − 5m − 6 = 0 yields (m − 6)(m + 1) = 0, so
m = 6 or m = −1. The general solution is
y = c1 e6x + c2 e−x .
You can check this by plugging back in. Here are the derivatives:
Therefore,
What happens if there are repeated roots? Look at the equation y ′′ − 2y ′ + y = 0. The characteristic
equation is x2 − 2x + 1 = 0, which has x = 1 as a double root. It is true that ex is a solution, but it would
be incorrect to write y = c1 ex + c2 ex .
The terms c1 ex and c2 ex are redundant — you could combine them to get y = (c1 + c2 )ex = c3 ex . To
put it another way, as function ex and ex are linearly dependent.
It is reasonable to suppose that for a second order equation you should have two different solutions. ex
is one; how can you find another?
The idea is to guess the form that such a solution might take. Guess:
y = f (x)ex ,
y = c1 ex + c2 xex .
In fact, this is the general solution — notice the two arbitrary constants. The functions ex and xex are
indpendent solutions to the original equation.
2
In general, if m is a repeated root of multiplicity k in the characteristic polynomial, you get terms emx ,
xe , . . . , xk−1 emx in the general solution.
mx
d3 y d2 y dy
Example. Solve 3
− 3 2
+3 − y = 0.
dx dx dx
The characteristic equation is m3 − 3m2 + 3m − 1 = 0, which has m = 1 as a root with multiplicity 3.
The general solution is
y = c1 ex + c2 xex + c3 x2 ex .
The characteristic equation is (m2 − 1)(m2 − m − 2) = 0, or (m − 1)(m + 1)2 (m − 2) = 0. The roots are
1, 2, and −1 (double). The general solution is
Note: You can write the terms in the solution in any order you please. Nor does it matter which “c”
goes with which term, since they are arbitrary constants.
Example. (Linear systems) Suppose x and y are functions of t. Consider the system of differential
equations
dx dy
= x′ = x + 4y, = y ′ = 2x + 3y.
dt dt
I want to solve for x and y in terms of t.
Solve the second equation for x:
1
x = (y ′ − 3y) .
2
Differentiate:
1
x′ = (y ′′ − 3y ′ ) .
2
Plug the expressions for x and x′ into the first equation:
1 ′′ 1
(y − 3y ′ ) = (y ′ − 3y) + 4y.
2 2
Simplify:
y ′′ − 4y ′ − 5y = 0.
The characteristic equation is m2 − 4m − 5 = 0, or (m − 5)(m + 1) = 0. The roots are m = 5 and
m = −1. Therefore,
y = c1 e5t + c2 e−t .
Now y ′ = 5c1 e5t − c2 e−t , so
1 ′ 1
5c1 e5t − c2 e−t − 3 c1 e5t + c2 e−t = c1 e5t − 2c2 e−t .
x= (y − 3y) =
2 2
There are other ways of solving linear systems, but for small systems brute force works reasonably well!
3
Now suppose the characteristic equation has a complex root a + bi. From basic algebra, complex roots
of real polynomials come in conjugate pairs: a + bi and a − bi. It’s reasonable to expect solutions
c1 e(a+bi)x and c2 e(a−bi)x .
However, these are complex solutions, and you should have real solutions to the original real differential
equation. I’ll use the complex exponential formula
eiθ = cos θ + i sin θ, θ ∈ R.
You can derive this formula by considering the Taylor series for eiθ , cos θ, and sin θ.
Now
c1 e(a+bi)x + c2 e(a−bi)x = c1 eax eibx + c2 eax e−ibx =
eax (c1 (cos bx + i sin bx) + c2 (cos bx − i sin bx)) = eax ((c1 + c2 ) cos bx + i(c1 − c2 ) sin bx) .
Let c3 = c1 + c2 and c4 = i(c1 − c2 ). Observe that c1 and c2 can be solved for in terms of c3 and c4 , so
no generality is lost with this substitution. Then
c1 e(a+bi)x + c2 e(a−bi)x = eax (c3 cos bx + c4 sin bx).
Each pair of conjugate complex roots a±bi in the characteristic equation generates a pair of independent
solutions of this form.
Example. Solve y ′′ + y = 0.
The characteristic equation m2 + 1 = 0 has roots ±i. The solution is
y = c1 cos x + c2 sin x.
The characteristic equation (m2 + 4)2 = 0 has repeated complex roots: m = ±2i (each double). The
solution is
y = c1 cos 2x + c2 sin 2x + c3 x cos 2x + c4 x sin 2x.
(What’s the solution to (D2 + 2D + 5)2 y = 0?)
This is a constant coefficient linear homogeneous system. Thus, the coefficients aij are constant,
and you can see that the equations are linear in the variables x1 , . . . , xn and their derivatives. The reason
for the term “homogeneous” will be clear when I’ve written the system in matrix form.
The primes on x′1 , . . . , x′n denote differentiation with respect to an independent variable t. The problem
is to solve for x1 , . . . , xn in terms of t.
Write the system in matrix form as
a11 ··· a1n
x′1 x
.. a21 ··· a2n .1
. = ... .. .. .
.
x′n xn
an1 · · · ann
Equivalently,
~x′ = A~x.
(A nonhomogeneous system would look like ~x′ = A~x + ~b.)
It’s possible to solve such a system if you know the eigenvalues (and possibly the eigenvectors) for the
coefficient matrix
a11 · · · a1n
a21 · · · a2n
. ..
.. .
an1 · · · ann
First, I’ll do an example which shows that you can solve small linear systems by brute force.
dx1
= x′1 = −29x1 − 48x2 ,
dt
dx2
= x′2 = 16x1 + 27x2 .
dt
The idea is to solve for x1 and x2 in terms of t.
One approach is to use brute force. Solve the first equation for x2 , then differentiate to find x′2 :
1 1
x2 = (−x′1 − 29x1 ), x′2 = (−x′′1 − 29x′1 ).
48 48
Plug these into second equation:
1 1
(−x′′1 − 29x′1 ) = 16x1 + 27 · (−x′1 − 29x1 ), x′′1 + 2x′1 − 15x1 = 0.
48 48
1
This is a constant coefficient linear homogeneous equation in x1 . The characteristic equation is m2 −
2m − 15 = 0. The roots are m = −5 and m = 3. Therefore,
x1 = c1 e−5t + c2 e3t .
1 1 1 2
−(−5c1 e−5t + 3c2 e3t ) − 29(c1 e−5t + c2 e3t ) = − c1 e−5t − c2 e3t .
x2 = (−x′1 − 29x1 ) =
48 48 2 3
The procedure works, but it’s clear that the computations would be pretty horrible for larger systems.
To describe a better approach, look at the coefficient matrix:
−29 −48
A=
16 27
−29 − λ −48
|A − λI| = = (λ + 29)(λ − 27) + 16 · 48 = λ2 + 2λ − 15.
16 27 − λ
This is the same polynomial that appeared in the example. Since λ2 + 2λ − 15 = (λ + 5)(λ − 3), the
eigenvalues are λ = −5 and λ = 3.
Thus, you don’t need to go through the process of eliminating x2 and isolating x1 . You know that
x1 = c1 e−5t + c2 e3t
once you know the eigenvalues of the coefficient matrix. You can now finish the problem as above by plugging
x1 back in to solve for x2 .
This is better than brute force, but it’s still cumbersome if the system has more than two variables.
I can improve things further by making use of eigenvectors as well as eigenvalues. Consider the system
~x′ = A~x.
Av = λv.
I claim that ~x = ceλt v is a solution to the equation, where c is a constant. To see this, plug it in:
To obtain the general solution to ~x′ = A~x, you should have “one arbitrary constant for each differenti-
ation”. In this case, you’d expect n arbitrary constants. This discussion should make the following result
plausible.
• Suppose the matrix A has n independent eigenvectors v1 , . . . , vn with corresponding eigenvalues λ1 ,
. . . , λn . Then the general solution to ~x′ = A~x is
~x = c1 eλ1 t v1 + · · · cn eλn t vn .
2
Example. Solve
dx1
= x′1 = −29x1 − 48x2 ,
dt
dx2
= x′2 = 16x1 + 27x2 .
dt
Example. Find the general solution (x(t), y(t)) to the linear system
dx
=x+y
dt
dy
= 6x + 2y
dt
The matrix form is
x′ 1 1 x
= .
y′ 6 2 y
Let
1 1
A= .
6 2
3
1−x 1
det(A − xI) = = x2 − 3x − 4 = (x − 4)(x + 1).
6 2−x
The eigenvalues are x = 4 and x = −1.
For x = 4, I have
−3 1 3 −1
A − 4I = → .
6 −2 0 0
If (a, b) is an eigenvector, then
3a − b = 0, b = 3a.
So
a a 1
= =a· .
b 3a 3
(1, 3) is an eigenvector.
For x = −1, I have
2 1 2 1
A+I = → .
6 3 0 0
If (a, b) is an eigenvector, then
2a + b = 0, b = −2a.
So
a a 1
= =a· .
b −2a −2
(1, −2) is an eigenvector.
The solution is
x 1 1
= c1 e4t + c2 e−t .
y 3 −2
5−λ 5
= λ2 − 2λ + 5.
−4 −3 − λ
The eigenvalues are λ = 1 ± 2i. You can check that the eigenvectors are:
−2 + i
λ = 1 − 2i : b ·
2
−2 − i
λ = 1 + 2i : b ·
2
Observe that the eigenvectors are conjugates of one another. This is always true when you have a
complex eigenvalue.
The eigenvector method gives the following complex solution:
x1 −2 + i −2 − i
= c1 e(1−2i)t + c2 e(1+2i)t =
x2 2 2
4
(−2(c1 + c2 ) + i(c1 − c2 )) cos 2t + ((c1 + c2 ) + 2i(c1 − c2 )) sin 2t
et .
2(c1 + c2 ) cos 2t − 2i(c1 − c2 ) sin 2t
Note that the constants occur in the combinations c1 + c2 and i(c1 − c2 ). Something like this will always
happen in the complex case. Set d1 = c1 + c2 and d2 = i(c1 − c2 ). The solution is
x1 t (−2d1 + d2 ) cos 2t + (d1 + 2d2 ) sin 2t
=e .
x2 2d1 cos 2t − 2d2 sin 2t
In fact, if you’re given initial conditions for x1 and x2 , the new constants d1 and d2 will turn out to be
real numbers.
You can get a picture of the solution curves for a system x~′ = f (~x) even if you can’t solve it by sketching
the direction field. Suppose you have a two-variable linear system
′
x a b x
= .
y′ c d y
x′ = x − y, y ′ = x + y.
5
Thus, from the second line of the table, I’d draw the vector (−1, −1) starting at the point (1, 0).
Here’s a sketch of the vectors: y
While it’s possible to plot fields this way, it’s very tedious. You can use software to plot fields quickly.
Here is the same field as plotted by Mathematica:
The first picture shows the field as it would be if you plotted it by hand. As you can see, the vectors
overlap each other, making the picture a bit ugly. The second picture is the way Mathematica draws the
field by default: The vectors’ lengths are scaled so that the vectors don’t overlap. In subsequent examples,
I’ll adopt the second alternative when I display a direction field picture.
The arrows in the pictures show the direction of increasing t on the solution curves. You can see from
these pictures that the solution curves for this system appear to spiral out from the origin.
Example. (A compartment model) Two tanks hold 50 gallons of liquid each. The first tank starts with
25 pounds of dissolved salt, while the second starts with pure water. Pure water flows into the first tank at
3 gallons per minute; the well-stirred micture flows into tank 2 at 4 gallons per minute. The mixture in tank
2 is pumped back into tank 1 at 1 gallon per minute, and also drains out at 3 gallons per minute. Find the
amount of salt in each tank after t minutes.
Let x be the number of pounds of salt dissolved in the first tank at time t and let y be the number of
pounds of salt dissolved in the second tank at time t. The rate equations are
dx gal lbs gal ylbs gal xlbs
= 3 0 + 1 − 4 ,
dt min gal min 50gal min 50gal
dy gal xlbs gal ylbs gal ylbs
= 4 − 1 − 3 .
dt min 50gal min 50gal min 50gal
Simplify:
x′ = −0.08x + 0.02y, y ′ = 0.08x − 0.08y.
Next, find the characteristic polynomial:
−0.08 − λ 0.02
= λ2 + 0.16λ + 0.048 = (λ + 0.04)(λ + 0.12).
0.08 −0.08 − λ
6
The eigenvalues are λ = −0.04, λ = −0.12.
Consider λ = −0.04:
" 1#
−0.04 0.02 1 −
A + 0.04I = → 2
0.08 −0.04 0 0
1 1
This says a − b = 0, so a = b. Therefore,
2 2
"1 # "1#
a b
= 2 =b 2 .
b b 1
1 1
This says a + b = 0, so a = − b. Therefore,
2 2
" 1 # " 1#
a − b −
= 2 =b 2 .
b b 1
The direction field for the system is shown in the first picture. In the second picture, I’ve sketched in
some solution curves.
7
The solution curve picture is referred to as the phase portrait.
The eigenvectors (1, 2) and (−1, 2) have slopes 2 and −2, respectively. These appear as the two lines
(linear solutions).
1 1
re(a + bi) = a = ((a + bi) + (a − bi)) = ((a + bi) + (a + bi)∗ ) ,
2 2
1 1
im(a + bi) = b = ((a + bi) − (a − bi)) = ((a + bi) − (a + bi)∗ ) .
2 2
I’ll apply this to eλt~v , using the fact that
∗ ∗
eλt~v = eλ t~v ∗ .
Then
1 λt ∗
re eλt~v = e ~v + eλ t~v ∗ ,
2
1 λt ∗
im eλt~v = e ~v − eλ t~v ∗ .
2
The point is that since the terms on the right are independent solutions, so are the terms on the left.
The terms on the left, however, are real solutions. Here is what this means.
• If a linear system has a pair of complex conjugate eigenvalues, find the eigenvector solution for one
of them (the “eλt~v ” above). Then take the real and imaginary parts to obtain two independent real
solutions.
Set
1 −1
A= .
1 1
The eigenvalues are λ = 1 ± i.
Consider λ = 1 + i:
−i −1 1 −i
A − (1 + i)I = →
1 −i 0 0
The last matrix says a − bi = 0, so a = bi. The eigenvectors are
a bi i
= =b .
b b 1
8
Take b = 1. This yields the eigenvector (i, 1).
Write down the complex solution
i i i − sin t + i cos t
e(1+i)t = et eit = et (cos t + i sin t) = et .
1 1 1 cos t + i sin t
The eigenvector method produces a solution to a (constant coefficient homogeneous) linear system
whenever there are “enough eigenvectors”. There might not be “enough eigenvectors” if the characteristic
polynomial has repeated roots.
I’ll consider the case of repeated roots with multiplicity two or three (i.e. double or triple roots). The
general case can be handled by using the exponential of a matrix.
Consider the following linear system:
~x ′ = A~x.
Suppose λ is an eigenvalue of A of multiplicity 2, and ~v is an eigenvector for λ. eλt~v is one solution; I
want to find a second independent solution.
Recall that the constant coefficient equation (D − 3)2 y = 0 had independent solutions e3x and xe3x .
By analogy, it’s reasonable to guess a solution of the form
~x = teλt w.
~
Here w
~ is a constant vector.
Plug the guess into ~x ′ = A~x:
~x ′ = teλt λw
~ + eλt w
~ = A(teλt w).
~
~ = λw
Aw ~ and ~ = 0.
w
While it’s true that teλt · 0 = 0 is a solution, it’s not a very useful solution. I’ll try again, this time using
~x = teλt w
~ 1 + eλt w
~ 2.
Then
~x ′ = teλt λw
~ 1 + eλt w
~ 1 + λeλt w
~ 2.
Note that
A~x = teλt Aw
~ 1 + eλt Aw
~ 2.
Hence,
teλt λw
~ 1 + eλt w
~ 1 + λeλt w
~ 2 = teλt Aw
~ 1 + eλt Aw
~ 2.
9
Equate coefficients in eλt , teλt :
~ 1 = λw
Aw ~1 so (A − λI)w
~ 1 = 0,
~2 = w
Aw ~ 1 + λw
~2 so (A − λI)w
~2 = w
~ 1.
In other words, w
~ 1 is an eigenvector, and w
~ 2 is a vector which is mapped by A − λI to the eigenvector.
~ 2 is called a generalized eigenvector.
w
Example. Solve
′−3 −8
~x = ~x.
2 5
−3 − λ −8
= (λ + 3)(λ − 5) + 16 = λ2 − 2λ + 1 = (λ − 1)2 .
2 5−λ
Therefore, λ = 1 is an eigenvalue of multiplicity 2.
Now
−4 −8 1 2
A−I = →
2 4 0 0
The last matrix says a + 2b = 0, or a = −2b. Therefore,
a −2b −2
= =b .
b b 1
Write w
~ = (c, d). The equation becomes
−4 −8 c −2
= .
2 4 d 1
Row reduce:
" 1#
−4 −8 −2 1 2
→ 2
2 4 1 0 0 0
1 1
The last matrix says that c + 2d = , so c = −2d + . In this situation, I may take d = 0; doing so
2 2
1
produces w
~= ,0 .
2
This work generates the solution "1#
t −2 t
te +e 2 .
1 0
10
The general solution is
" 1 #!
t −2 t −2 t
~x = c1 e + c2 te +e 2 .
1 1 0
The first picture shows the direction field; the second shows the phase portrait, with some typical
solution curves. This kind of phase portrait is called an improper node.
For λ = 1,
0 0 0 1 0 0
A − I = 1 0 0 → 0 1 −1
2 −1 1 0 0 0
The last matrix implies that a = 0 and b = c, so the eigenvectors are
a 0 0
b = c = c · 1.
c c 1
(A − I)w
~ = ~v .
11
That is, ′
0 0 0 a 0
1 0 0 b′ = 1 .
2 −1 1 c′ 1
Solving this system yields a′ = 1, b′ = c′ + 1. I can take c′ = 0, so b′ = 1, and
a′ 1
~ = b′ = 1 .
w
c′ 0
The solution is
0 0 0 1
~x = c1 e2t 0 + c2 et 1 + c2 tet 1 + et 1 .
1 1 1 0
I’ll give a brief description of the situation for an eigenvalue λ of multiplicity 3. First, if there are three
independent eigenvectors ~u, ~v , w,
~ the solution is
eλt ~u.
(A − λI)~v = ~u.
A second solution is
teλt ~u + eλt~v .
Next, obtain another generalized eigenvector w
~ by solving
(A − λI)w
~ = ~v .
(A − λI)w
~ = a~u + b~v .
dy
= ky is given by y = c0 ekt .
dt
A constant coefficient linear system has a similar form, but we have vectors and a matrix instead of
scalars:
y ′ = Ay.
Thus, if A is an n × n real matrix, then y = (y1 , y2 , . . . , yn ).
It’s natural to ask whether you can solve a constant coefficient linear system using some kind of expo-
nential, as with the exponential growth equation.
If a solution to the system is to have the same form as the growth equation solution, it should look like
y = eAt y0 .
But “eAt ” seems to be e raised to a matrix power! How does that make any sense? It turns out that
the matrix exponential eAt can be defined in a reasonable way.
From calculus, the Taylor series for ez is
∞
X zn
ez = .
n=0
n!
The powers An make sense, since A is a square matrix. It is possible to show that this series converges
for all t and every matrix A.
As a consequence, I can differentiate the series term-by-term:
∞ ∞ ∞ n−1 n−1 ∞
d At X tn−1 An X tn−1 An X t A X tm Am
e = n = =A =A = AeAt .
dt n=0
n! n=1
(n − 1)! n=1
(n − 1)! m=0
m!
This shows that eAt solves the differential equation y ′ = Ay. The initial condition vector y(0) = y0
yields the particular solution
y = eAt y0 .
This works, because e0·A = I (by setting t = 0 in the power series).
Another familiar property of ordinary exponentials holds for the matrix exponential: If A and B com-
mute (that is, AB = BA), then
eA eB = eA+B .
You can prove this by multiplying the power series for the exponentials on the left. (eA is just eAt with
t = 1.)
1
Example. Compute eAt if
2 0
A= .
0 3
2n
2 0 4 0 0
A= , A2 = , . . . , An = .
0 3 0 9 0 3n
Therefore,
(2t)n
P
∞
∞ n n
t 2 0
n=0 0 2t
e 0
n!
X
At
e = = = .
n! 0 3n P∞ (3t)n 0 e3t
n=0 0 n=0
n!
You can compute the exponential of an arbitrary diagonal matrix in the same way:
For example, if
−3 0 0 e−3t 0 0
A= 0 4 0 , then eAt = 0 e4t 0 .
1.73t
0 0 1.73 0 0 e
Using this idea, we can compute eAt when A is diagonalizable. First, I’ll give examples where we can
compute eAt : First, using a “lucky” pattern, and second, using our knowledge of how to solve a system of
differential equations.
Example. Compute eAt if
1 2
A= .
0 1
Hence,
tn 2ntn
P
∞ n ∞ P∞ t
2tet
At
X t 1 2n n=0
n! n=0
n! e
e = = P ∞ tn = 0 .
n! 0 1 et
n=0 0 n=0
n!
Here’s where the last equality came from:
∞ n
X t
= et ,
n=0
n!
∞ ∞ ∞
X 2ntn X tn−1 X tm
= 2t = 2t = 2tet .
n=0
n! n=1
(n − 1)! m=0
m!
2
In the last example, we got lucky in being able to recognize a pattern in the terms of the power series.
Here is what happens if we aren’t so lucky.
Example. Compute eAt , if
3 −10
A= .
1 −4
If you compute powers of A as in the last two examples, there is no evident pattern:
2 −1 10 3 7 −30 4 −9 50
A = , A = , A = ,....
−1 6 3 −14 −5 26
It looks like it would be difficult to compute the matrix exponential using the power series.
I’ll use the fact that eAt is the solution to a linear system. The system’s coefficient matrix is A, so the
system is
x′ = 3x − 10y
y ′ = x − 4y
You can solve this system by hand. For instance, the first equation gives
1 ′ 3 1 ′′ 3
y=− x + x so y′ = − x + x′ .
10 10 10 10
x′′ + x′ − 2x = 0.
The solution is
x = c1 et + c2 e−2t .
Plugging this into the expression for y above and doing some ugly algebra gives
1 t 1 −2t
y= c1 e + c2 e .
5 2
Next, remember that if B is a 2 × 2 matrix,
1 0
B = first column of B and B = second column of B.
0 1
c1 et + c2 e−2t
" #
x= 1 t 1 −2t .
c1 e + c2 e
5 2
We noted earlier that we can compute eAt fairly easily in case A is diagonalizable. Recall that an n × n
matrix A is diagonalizable if it has n independent eigenvectors. (This is true, for example, if A has n
distinct eigenvalues.)
Suppose A is diagonalizable with independent eigenvectors v1 , . . . , vn and corresponding eigenvalues
λ1 , . . . , λn . Let S be the matrix whose columns are the eigenvectors:
↑ ↑ ↑
S = v1 v2 · · · vn .
↓ ↓ ↓
Then
λ1 0 ··· 0
0 λ2 ··· 0
S −1 AS =
... .. .. = D.
. .
0 0 · · · λn
We saw earlier how to compute the exponential for the diagonal matrix D:
eλ1 t 0 ··· 0
0 eλ2 t ··· 0
eDt ...
= .. .. .
. .
0 0 · · · eλn t
4
Notice how each “SS −1 ” pair cancels. Continuing in this way — you can give a formal proof using
induction — we have (S −1 AS)n = S −1 An S. Therefore,
∞ n ∞ n n
!
Dt
X t (S −1 AS)n X t A
e = = S −1 S = S −1 eAt S.
n=0
n! n=0
n!
Then
SeDt S −1 = SS −1 eAt SS −1
SeDt S −1 = eAt
Notice that S and S −1 have “switched places” from the original diagonalization equation.
Hence, λ1 t
e 0 ··· 0
0 eλ2 t · · · 0 −1
eAt = S
... .. .. S .
. .
0 0 · · · eλn t
Thus, if A is diagonalizable, find the eigenvalues and use them to construct the diagonal matrix with
the exponentials in the middle. Find a set of independent eigenvectors and use them to construct S and
S −1 . Putting everything into the equation above gives eAt .
Example. Compute eAt if
3 5
A= .
1 −1
The characteristic polynomial is (x − 4)(x + 2) and the eigenvalues are λ = 4, λ = −2. Since there are
two different eigenvalues and A is a 2 matrix, A is diagonalizable. The corresponding eigenvectors are (5, 1)
and (−1, 1). Thus,
5 −1 −1 1 1 1
S= , S = .
1 1 6 −1 5
Hence,
The characteristic polynomial is (x − 1)(x − 2)2 and the eigenvalues are λ = 1 and λ = 2 (a double
root). The corresponding eigenvectors are (3, −1, 3) for λ = 1, and (2, 1, 0) and (2, 0, 1) for λ = 2. Since I
have 3 independent eigenvectors, the matrix is diagonalizable.
I have
3 2 2 −1 2 2
S = −1 1 0 , S −1 = −1 3 2 .
3 0 1 3 −6 −5
From this, it follows that
5
Here’s a quick check you can use when computing eAt . Plugging t = 0 into eAt gives e0 = I, the identity
matrix. For instance, in the last example, if you set t = 0 in the right side, it checks:
1 0 0
0 1 0.
0 0 1
However, this check isn’t foolproof — just because you get I by setting t = 0 doesn’t mean your answer
is right. However, if you don’t get I, your answer is surely wrong!
A better check that is a little more work is to compute the derivative of eAt , and then set t = 0. You
should get A. To see this, note that
d At
e = AeAt .
dt
d At
Evaluating both sides of this equation at t = 0 gives e = A. Since the theory of differential
dt t=0
equations tells us that the solution to an initial value problem of this kind is unique, if your answer passes
these checks then it is eAt . I think it’s good to do the first (easier) check, even if you don’t do the second.
If you try this in the previous example, you’ll find that the second check works as well.
Unfortunately, not every matrix is diagonalizable. How do we compute eAt for an arbitrary real matrix?
One approach is to use the Jordan canonical form for a matrix, but this would require a discussion
of canonical forms, a large subject in itself.
Note that any method for finding eAt requires finding the eigenvalues of A which is, in general, a difficult
problem. For instance, there are methods from numerical analysis for approximating the eigenvalues of a
matrix.
I’ll describe an iterative algorithm for computing eAt that only requires that one know the eigenvalues
of A. There are various such algorithms for computing the matrix exponential; this one, which is due to
Richard Williamson [1], seems to me to be the easiest for hand computation. It’s also possible to implement
this method using a computer algebra system like maxima or Mathematica.
Let A be an n × n matrix. Let {λ1 , λ2 , . . . , λn } be a list of the eigenvalues, with multiple eigenvalues
repeated according to their multiplicity.
The last phrase means that if the characteristic polynomial is (x − 1)3 (x − 5), the eigenvalue 1 is listed
3 times. So your list of eigenvalues might be {1, 1, 1, 5}. But you can list them in any order; if you wanted
to show off, you could make your list {1, 5, 1, 1}. It will probably make the computations easier and less
error-prone if you list the eigenvalues in some “nice” way (so either {1, 1, 1, 5} or {5, 1, 1, 1}).
Let
a1 = eλ1 t ,
Z t
ak = eλk (t−u) ak−1 (u) du, k = 2, . . . , n,
0
B1 = I,
Bk = (A − λk−1 I) · Bk−1 , k = 2, . . . , n,
Then
eAt = a1 B1 + a2 B2 + . . . + an Bn .
Remark. If you’ve seen convolutions before, you might recognize that the expression for ak is a convolu-
tion:
ak = eλk t ⋆ ak−1 (t).
In general, the convolution of f and g is
Z t
(f ⋆ g)(t) = f (t − u)g(u) du.
0
6
If you haven’t seen this before, don’t worry: you do not need to know this! The important thing (which
gives the definition of ak ) is the integral on the right side.
To prove that this algorithm works, I’ll show that the expression on the right satisfies the differential
equation x′ = Ax. To do this, I’ll need two facts about the characteristic polynomial p(x).
1. (x − λ1 )(x − λ2 ) · · · (x − λn ) = p(x).
2. (Cayley-Hamilton Theorem) p(A) = 0.
Observe that if p(x) is the characteristic polynomial, then using the first fact and the definition of the
B’s,
p(x) = (x − λ1 )(x − λ2 ) · · · (x − λn )
p(A) = (A − λ1 I)(A − λ2 I) · · · (A − λn I)
= I(A − λ1 I)(A − λ2 I) · · · (A − λn I)
= B1 (A − λ1 I)(A − λ2 I) · · · (A − λn I)
= B2 (A − λ2 I) · · · (A − λn I)
..
.
= Bn (A − λn I)
By the Cayley-Hamilton Theorem,
Bn (A − λn I) = 0. (∗)
I will use this fact in the proof below. First, let’s see an example of the Cayley-Hamilton theorem. Let
2 3
A= .
2 1
7
a′k = λk ak + ak−1 .
Therefore,
(a1 B1 + a2 B2 + . . . + an Bn )′ =
λ1 a1 B1 +
λ2 a2 B2 + a1 B2 +
λ3 a3 B3 + a2 B3 +
..
.
λn an Bn + an−1 Bn .
Expand the ai−1 Bi terms using
λ1 a1 B1 +
λ2 a2 B2 + a1 AB1 − λ1 a1 B1 +
λ3 a3 B3 + a2 AB2 − λ2 a2 B2 +
..
.
λn an Bn + an−1 ABn−1 − λn−1 an−1 Bn−1 =
λn an Bn + A(a1 B1 + a2 B2 + . . . + an−1 Bn−1 ) =
λn an Bn − Aan Bn + A(a1 B1 + a2 B2 + . . . + an Bn ) =
−an (A − λn I)Bn + A(a1 B1 + a2 B2 + . . . + an Bn ) =
−an · 0 + A(a1 B1 + a2 B2 + . . . + an Bn ) =
A(a1 B1 + a2 B2 + . . . + an Bn )
(The result (*) proved above was used in the next-to-the-last equality.) Combining the results above,
I’ve shown that
(a1 B1 + a2 B2 + . . . + an Bn )′ = A(a1 B1 + a2 B2 + . . . + an Bn ).
This shows that M = a1 B1 + a2 B2 + . . . + an Bn satisfies M ′ = AM .
Using the power series expansion, I have e−At A = Ae−At . So
(Remember that matrix multiplication is not commutative in general!) It follows that e−At M is a
constant matrix.
Set t = 0. Since a2 = · · · = an = 0, it follows that M (0) = I. In addition, e−A·0 = I. Therefore,
e −At
M = I, and hence M = eAt .
Example. Use the matrix exponential to solve
′ 3 −1 3
y = y, y(0) = .
1 1 4
8
First, I’ll compute the ak ’s:
a1 = e2t ,
Z t Z t
a2 = e2t ⋆ a1 (t) = e2(t−u) e2u du = e2t du = te2t .
0 0
You can get the general solution by replacing (3, 4) with (c1 , c2 ).
Example. Find eAt if
1 0 0
A= 1 1 0.
−1 −1 2
Z t
a3 = e2(t−u) ueu du = −tet − et + e2t .
0
9
This example will demonstrate how the algorithm for eAt works when the eigenvalues are complex.
The characteristic polynomial is x2 + 2x + 2. The eigenvalues are λ = −1 ± i. I will list them as
{−1 + i, −1 − i}.
First, I’ll compute the ak ’s. a1 = e(−1+i)t , and
Z t Z t
a2 = e(−1+i)(t−u) e(−1−i)u du = e(−1+i)t e(1−i)u e(−1−i)u du =
0 0
Z t
i −2it i (−1−i)t
e(−1+i)t e−2iu du = e(−1+i)t − e(−1+i)t .
e −1 = e
0 2 2
Next, I’ll compute the Bk ’s. B1 = I, and
3−i −5
B2 = A − (−1 + i)I = .
2 −3 − i
Therefore,
i (−1−i)t
1 0 3−i −5
eAt = e(−1+i)t + e − e(−1+i)t .
0 1 2 2 −3 − i
I want a real solution, so I’ll use DeMoivre’s Formula to simplify:
Notice that all the i’s have dropped out! This reflects the obvious fact that the exponential of a real
matrix must be a real matrix.
Finally, the general solution to the original system is
−t cos t + 3 sin t −5 sin t c1
y=e .
2 sin t cos t − 3 sin t c2
Example. Solve the following system using both the matrix exponential and the eigenvector methods.
2 −1
y′ = y.
1 2
10
The corresponding solution is
(2+i)t 1 2t cos t + i sin t
e =e .
−i sin t − i cos t
Therefore,
At (2+i)t 1 0 2t −i −1 2t cos t − sin t
e =e + e sin t =e .
0 1 1 −i sin t cos t
The solution is
cos t − sin t c1
y = e2t .
sin t cos t c2
Taking into account some of the algebra I didn’t show for the matrix exponential, I think the eigenvector
approach is easier.
Example. Solve the following system using both the matrix exponential and the (generalized) eigenvector
methods.
′ 5 −8
y = y.
2 −3
y = (y1 , y2 ) is the solution vector.
I’ll do this first using the generalized eigenvector method, then using the matrix exponential.
The characteristic polynomial is x2 − 2x + 1. The eigenvalue is λ = 1 (double).
4 −8
A−I = .
2 −4
Ignore the first row, and divide the second row by 2, obtaining the vector (1, −2). I want (a, b) such
that (1)a + (−2)b = 0. Swap 1 and −2 and negate the −2: I get (a, b) = (2, 1). This is an eigenvector for
λ = −1.
11
Since I only have one eigenvector, I need a generalized eigenvector. This means I need (a′ , b′ ) such that
4 −8 a′ 2
= .
2 −4 b′ 1
Therefore, t
e + 4tet −8tet
1 0 4 −8
eAt = et + tet = .
0 1 2 −4 2tet et − 4tet
The solution is
et + 4tet −8tet
c1
y= .
2tet e − 4tet
t
c2
In this case, finding the solution using the matrix exponential may be a little bit easier.
[1] Richard Williamson, Introduction to differential equations. Englewood Cliffs, NJ: Prentice-Hall, 1986.
(b)
hx, ay + zi = hay + z, xi = a hy, xi + hz, xi = ahy, xi + hz, xi = a hx, yi + hx, zi .
If a ∈ R, then a = a, and so hx, ay + zi = a hx, yi + hx, zi.
(c) As in the proof of (b), I have hy, xi = hy, xi, so
Remarks. Why include complex conjugation in the symmetry axiom? Suppose the symmetry axiom had
read
hx, yi = hy, xi .
Then
0 < hix, ixi = i hx, ixi = i hix, xi = i · i hx, xi = − hx, xi .
This contradicts hx, xi > 0. That is, I can’t have both pure symmetry and positive definiteness.
Example. Suppose u, v, and w are vectors in a real inner product space V . Suppose
hv, vi = 4, hw, wi = 3.
1
(a) Compute hu + 3w, v − 5wi.
(b) Compute hv + w, v − wi.
(a) Using the linearity and symmetry properties, I have
hu + 3w, v − 5wi = hu, v − 5wi + h3w, v − 5wi = hu, vi − hu, 5wi + h3w, vi − h3w, 5wi =
(b)
hv + w, v − wi = hv, vi − hv, wi + hw, vi − hw, wi = hv, vi − hw, wi = 4 − 3 = 1.
(a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn .
It’s easy to verify that the axioms for an inner product hold. For example, suppose (a1 , . . . , an ) 6= ~0.
Then at least one of a1 , . . . , an is nonzero, so
I can use an inner product to define lengths and angles. Thus, an inner product introduces (metric)
geometry into vector spaces.
Definition. Let V be an inner product space, and let x, y ∈ V .
hx, yi
cos θ = .
kxkkyk
hx, yi
Remark. The definition of the angle between x and y wouldn’t make sense if the expression was
kxkkyk
greater than 1 or less than −1, since I’m asserting that it’s the cosine of an angle.
In fact, the Cauchy-Schwarz inequality (which I’ll prove below) will show that
hx, yi
−1 ≤ ≤ 1.
kxkkyk
2
(b) kaxk = |a|kxk. (“|a|” denotes the absolute value of a.)
Proof. (a) Squaring kxk = hx, xi1/2 gives kxk2 = hx, xi.
(c) x 6= 0 implies hx, xi > 0, and hence kxk > 0. Conversely, if x = 0, then h0, 0i = 0, so kxk = 0.
0 ≤ hax − by, ax − byi = a2 hx, xi − 2ab hx, yi + b2 hy, yi = a2 kxk2 − 2ab hx, yi + b2 kyk2 .
The trick is to pick “nice” values for a and b. I will set a = kyk and b = kxk. (A rationale for this is
that I want the expression kxkkyk to appear in the inequality.)
I get
kyk2 kxk2 − 2kykkxk hx, yi + kxk2 kyk2 ≥ 0
2kxk2 kyk2 − 2kxkkyk hx, yi ≥ 0
2kxk2 kyk2 ≥ 2kxkkyk hx, yi
kxk2 kyk2 ≥ kxkkyk hx, yi
kxkkyk ≥ hx, yi .
In the last inequality, x and y are arbitrary vectors. So the inequality is still true if x is replaced by
−x. If I replace x with −x, then k − xk = kxk and h−x, yi = − hx, yi, and the inequality becomes
kxkkyk ≥ − hx, yi .
Since kxkkyk is greater than or equal to both hx, yi and − hx, yi, I have
kxkkyk ≥ | hx, yi |.
(e)
kx + yk2 = hx + y, x + yi = kxk2 + 2 hx, yi + kyk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .
3
Example. R3 is an inner product space using the standard dot product of vectors. The cosine of the angle
between (2, −2, 1) and (6, −8, 24) is
Example. Let C[0, 1] denote the real vector space of continuous functions on the interval [0, 1]. Define an
inner product on C[0, 1] by
Z 1
hf, gi = f (x)g(x) dx.
0
Given that this is a real inner product, I may apply the preceding proposition to produce some useful
results. For example, the Cauchy-Schwarz inequality says that
Z 1 1/2 Z 1 1/2 Z 1
2 2
f (x) dx g(x) dx ≥ f (x)g(x) dx .
0 0 0
hvi , vj i = δij .
Note that the n × n matrix whose (i, j)-th component is δij is the n × n identity matrix.
4
Example. Consider the following set of vectors in R2 :
3 4 4 3
, , − , .
5 5 5 5
I have
3 4 3 4 9 16
, · , = + = 1,
5 5 5 5 25 25
3 4 4 3 12 12
, · − , =− + = 0,
5 5 5 5 25 25
4 3 4 3 16 9
− , · − , = + = 1.
5 5 5 5 25 25
It follows that the set is orthonormal relative to the dot product on R2 .
Example. Let C[0, 2π] denote the complex-valued continuous functions on [0, 2π]. Define an inner product
by Z 2π
1
hf, gi = f (x)g(x) dx.
2π 0
Let m, n ∈ Z. Then
2π
1
Z
eimx e−inx dx = δmn .
2π 0
It follows that the following set is orthonormal in C[0, 2π] relative to this inner product:
1
√ eimx m = . . . , −1, 0, 1, . . .
2π
Proposition. Let {vi } be an orthogonal set of vectors, vi 6= 0 for all i. Then {vi } is independent.
Proof. Suppose
a1 vi1 + a2 vi2 + · · · + an vin = ~0.
Take the inner product of both sides with vi1 :
5
Proposition. Let {vi } be an orthonormal basis for V , and let v ∈ V . Then
X
v= hv, vi i vi .
i
Note: In fact, the sum above is a finite sum — that is, only finitely many terms are nonzero.
Proof. Since {vi } is a basis, there are elements aj ∈ F and viv ∈ {vi } such that
As in the proof of the last proposition, all the inner product terms on the right vanish, except that
hvi1 , vi1 i = 1 by orthonormality. Thus,
hv, vi1 i = a1 .
Taking the inner product of both sides of the original equation with vi2 , . . . vin shows
Example. Let C[0, 2π] denote the complex inner product space of complex-valued continuous functions on
[0, 2π], where the inner product is defined by
2π
1
Z
hf, gi = f (x)g(x) dx.
2π 0
6
Suppose I try to compute the “components” of f (x) = x relative to this orthonormal set by taking inner
products — that is, using the approach of the preceding example.
For m = 0,
1
Z 2π √
√ x dx = π 2π.
2π 0
Suppose m 6= 0. Then
2π 2π √
1 1 1 −mix i 2π
Z
−mix i −mix
√ xe dx = √ xe − 2e = .
2π 0 2π m m 0 m
There are infinitely many nonzero components! Of course, the reason this does not contradict the earlier
result is that f (x) = x may not lie in the span of S. S is orthonormal, hence independent, but it is not a
basis for C[0, 2π].
In fact, since emix = cos mx + i sin mx, a finite linear combination of elements of S must be periodic.
It is still reasonable to ask whether (or in what sense) f (x) = x can be represented by the the infinite
sum √ √ !
∞
√ X i 2π mix i 2π −mix
π 2π + e − e .
m=1
m m
For example, it is reasonable to ask whether the series converges uniformly to f at each point of [0, 2π].
The answers to these kinds of questions would require an excursion into the theory of Fourier series.
Since it’s so easy to find the components of a vector relative to an orthonormal basis, it’s of interest to
have an algorithm which converts a given basis to an orthonormal one.
The Gram-Schmidt algorithm converts a basis to an orthonormal basis by “straightening out” the
vectors one by one.
v2
v - proj v v2
2 1
v1
proj v v2
1
The picture shows the first step in the straightening process. Given vectors v1 and v2 , I want to replace
v2 with a vector perpendicular to v1 . I can do this by taking the component of v2 perpendicular to v1 , which
is
hv1 , v2 i
v2 − v1 .
hv1 , v1 i
Lemma. (Gram-Schmidt algorithm) Let {v1 , . . . , vk } be a set of nonzero vectors in an inner product
space V . Suppose v1 , . . . , vk−1 are pairwise orthogonal. Let
k−1
X hvi , vk i
vk′ = vk − vi .
i=1
hvi , vi i
7
Proof. Let j ∈ {1, . . . , k − 1}. Then
k−1
X hvi , vk i
hvj , vk′ i = hvj , vk i − hvj , vi i .
i=1
hvi , vi i
Now hvj , vi i = 0 for i 6= j because the set is orthogonal. Hence, the right side collapses to
hvj , vk i
hvj , vk i − hvj , vj i = hvj , vk i − hvj , vk i = 0.
hvj , vj i
Suppose that I start with an independent set {v1 , . . . , vn }. Apply the Gram-Schmidt procedure to the
set, beginning with v1′ = v1 . This produces an orthogonal set {v1′ , . . . , vn′ }. In fact, {v1′ , . . . , vn′ } is a nonzero
orthogonal set, so it is independent as well.
To see that each vk′ is nonzero, suppose
k−1
X hvi , vk i
0 = vk′ = vk − vi .
i=1
hvi , vi i
Then
k−1
X hvi , vk i
vk = vi .
i=1
hvi , vi i
This contradicts the independence of {vi }, because vk is expressed as the linear combination of v1 ,
. . . vk−1 .
In general, if the algorithm is applied iteratively to a set of vectors, the span is preserved at each state.
That is,
hv1 , . . . , vk i = hv1′ , . . . , vk′ i.
This is true at the start, since v1 = v1′ . Assume inductively that
8
Example. (Gram-Schmidt) Apply Gram-Schmidt to the following set of vectors in R3 (relative to the
usual dot product):
3 −1 2
0, 0 , 9 .
4 7 11
Example. (Gram-Schmidt) Find an orthonormal basis (relative to the usual dot product) for the subspace
of R4 spanned by the vectors
9
(1, 1, 0, 13) · (1, 0, 2, 2) (1, 1, 0, 13) · (8, 1, −4, 0)
(1, 1, 0, 13) − (1, 0, 2, 2) − (8, 1, −4, 0) =
(1, 0, 2, 2) · (1, 0, 2, 2) (8, 1, −4, 0) · (8, 1, −4, 0)
27 9 8 1 4
(1, 1, 0, 13) − · (1, 0, 2, 2) − · (8, 1, −4, 0) = (1, 1, 0, 13) − (3, 0, 6, 6) − , ,− ,0 =
9 81 9 9 9
26 8 50
− , ,− ,7 .
9 9 9
If at any point you wind up with a vector with fractions, it’s a good idea to clear the fractions before
continuing. Since multiplying a vector by a number doesn’t change its direction, it remains perpendicular
to the vectors already constructed.
Thus, I’ll multiply the last vector by 9 and use
v1
v2
(v1 , v2 , . . . vn ) =
... .
vn
(v1 , v2 , . . . vn )T = [ v1 v2 · · · vn ] .
hx, yi = xT AT Ay.
10
Finally,
hx, xi = xT AT Ax = (Ax)T (Ax).
Now Ax is an n × 1 vector — I’ll label its components this way:
u1
u2
Ax =
... .
un
Then
u1
u2
hx, xi = (Ax)T (Ax) = [ u1 u2 · · · un ] 2 2 2
... = u1 + u2 + · · · + un ≥ 0.
un
That is, the inner product of a vector with itself is a nonnegative number. All that remains is to show
that if the inner product of a vector with itself is 0, them the vector is ~0.
Using the notation above, suppose
Then u1 = u2 = · · · = un = 0, because a nonzero u would produce a positive number on the right side
of the equation.
So
u1 0
u2 0
Ax = ~
... = ... = 0.
un 0
Finally, I’ll use the fact that A is invertible:
This proves that the function is positive definite, so it’s an inner product.
Example. The previous lemma provides lots of examples of inner products on Rn besides the usual dot
product. All I have to do is take an invertible matrix A and form AT A, defining the inner product as above.
For example, this 2 × 2 real matrix is invertible:
5 2
A= .
2 1
Now
29 12
AT A = .
12 5
(Notice that AT A will always be symmetric.) The inner product defined by this matrix is
29 12 y1
h(x1 , x2 ), y1 , y2 )i = [ x1 x2 ] = 29x1 y1 + 12x2 y1 + 12x1 y2 + 5x2 y2 .
12 5 y2
11
Definition. A matrix A in M (n, R) is orthogonal if AAT = I.
Proposition. Let A be an orthogonal matrix.
(a) det(A) = ±1.
(b) AAT = I = AT A — in other words, AT = A−1 .
(c) The rows of A form an orthonormal set. The columns of A form an orthonormal set.
(d) A preserves dot products — and hence, lengths and angles — in the sense that
(Ax) · (Ay) = x · y.
AAT = I
A−1 AAT = A−1 I
IAT = A−1 I
AT = A−1
(d) The ordinary dot product of vectors x = (x1 , x2 , . . . xn ) and y = (y1 , y2 , . . . yn ) can be written as a matrix
multiplication:
y1
y2 T
x · y = [ x1 x2 · · · xn ] .. = x y.
.
yn
(Remember the convention that vectors are column vectors.)
Suppose A is orthogonal. Then
In other words, orthogonal matrices preserve dot products. It follows that orthogonal matrices will also
preserve lengths of vectors and angles between vectors, because these are defined in terms of dot products.
Example. Find real numbers a and b such that the following matrix is orthogonal:
a 0.6
A= .
b 0.8
12
Since the columns of A must form an orthonormal set, I must have
0.6a + 0.8b = 0.
The easy way to get a solution is to swap 0.6 and 0.8 and negate one of them; thus, a = −0.8 and
b = 0.6.
Since k(−0.8, 0.6)k = 1, I’m done. (If the a and b I chose had made k(a, b)k =
6 1, then I’d simply divide
(a, b) by its length.)
Example. Orthogonal 2 × 2 matrices represent rotations of the plane about the origin or reflections
across a line through the origin.
Rotations are represented by matrices
cos θ − sin θ
.
sin θ cos θ
You can check that this works by considering the effect of multiplying the standard basis vectors (1, 0)
and (0, 1) by this matrix.
Multiplying a vector by the following matrix product reflects the vector across the line L that makes an
angle θ with the x-axis:
cos θ − sin θ 1 0 cos θ sin θ
.
sin θ cos θ 0 −1 − sin θ cos θ
Reading from right to left, the first matrix rotates everything by −θ radians, so L coincides with the
x-axis. The second matrix reflects everything across the x-axis. The third matrix rotates everything by θ
radians. Hence, a given vector is rotated by −θ and reflected across the x-axis, after which the reflected
vector is rotated by θ. The net effect is to reflect across L.
Many transformation problems can be easily accomplished by doing transformations to reduce a general
problem to a special case.
u x
u x
1
2. Translations. A translation R2 → R2 has the form
x u e
= + .
y v f
e and f are numbers. This translation just translates everything by the vector he, f i.
v y
<e,f>
u x
A translation by a nonzero vector is not a linear map, because linear maps must send the zero vector
to the zero vector. However, translations are very useful in performing coordinate transformations. I’ll
introduce the following terminology for the composite of a linear transformation and a translation.
Definition. Let A be a real m × n matrix. An affine map is a function f : Rn → Rm of the form
f (x) = Ax + b, where x ∈ Rn and b ∈ Rm .
Example. Find an affine map which carries the unit square 0 ≤ u ≤ 1, 0 ≤ v ≤ 1 to the parallelogram in
the x-y plane with vertices A(2, −1), B(5, 3), C(4, −6), D(7, −2).
I’m going to make a rough sketch of the parallelogram first. I want to know the orientation of the
vertices — e.g. that B and C are next to A, and that D is opposite A.
y
A
D
I’m going to do the transformation in two steps. First, I’ll take the square to a parallelogram which is
the right size and shape, but which has a corner at the origin. Next, I’ll move the parallelogram so it’s at
the right place.
2
−−→ −→
The vectors from A to B and to C are AB = h3, 4i and AC = h2, −5i. I saw above that if I construct a
matrix with h3, 4i as the first column and h2, −5i as the second column, then it will multiply h1, 0i to h3, 4i
and h0, 1i to h2, −5i:
3 2 1 3 3 2 0 2
= =
4 −5 0 4 4 −5 1 −5
So the following transformation takes the unit square to a parallelogram of the right shape:
x 3 2 u
= .
y 4 −5 v
This transformation takes (0, 0) to (0, 0); I want (0, 0) to go to the point A (which I used as the base
point for my two vectors). To fix this, just translate by A:
x 3 2 u 2
= + .
y 4 −5 v −1
To see this, just check that the unit vectors h1, 0i, h0, 1i go to the right places:
cos θ − sin θ 1 cos θ cos θ − sin θ 0 − sin θ
= =
sin θ cos θ 0 sin θ sin θ cos θ 1 cos θ
θ θ
You can see from the picture that h1, 0i and h0, 1i have both been rotated by an angle θ. Other vectors
can be built out of these two vectors, so other vectors are rotated by θ as well.
4. Reflections. This is how to do a reflection across the x-axis:
x 1 0 u u
= = .
y 0 −1 v −v
It is clear that this reflects things across the x-axis, because it simply negates the second component.
3
What about reflection across a line L making an angle θ with the origin? It’s messy to do using analytic
geometry, but very easy using matrices. Simply do a rotation through −θ which carries L to the x − axis.
Next, reflect across the x-axis. Finally, do a rotation through −θ which carries the x-axis back to L. Each
of these transformations can be accomplished by matrix multiplication; just multiply the three matrices to
do reflection across L.
Translations, rotations, and reflections are examples of rigid motions. They preserve distances between
points, as well as areas.
-p/3 p/3
x x x
Thus, √
π π cos π π 1 3
cos − sin sin −
A= 3 3 1 0 3 3 =
√ 2 2 .
π π 0 −1 π π 3 1
sin cos − sin cos
3 3 3 3 2 2
(c) If z1 , z2 ∈ C, then
z1 · z2 = z1 · z2 .
The proofs are easy; just write out the complex numbers (e.g. z1 = a + bi and z2 = c + di) and compute.
The conjugate of a matrix A is the matrix A obtained by conjugating each element: That is,
(A)ij = Aij .
kA + B = k · A + B and AB = A · B.
You can prove these results by looking at individual elements of the matrices and using the properties
of conjugation of numbers given above.
Definition. If A is a complex matrix, A∗ is the conjugate transpose of A:
A∗ = AT .
Note that the conjugation and transposition can be done in either order: That is, AT = (A)T . To see
this, consider the (i, j)th element of the matrices:
Example. If
1 − 2i 4
1 + 2i 2−i 3i
A= , then A∗ = 2 + i −2 − 7i .
4 −2 + 7i 6 + 6i
−3i 6 − 6i
Since the complex conjugate of a real number is the real number, if B is a real matrix, then B ∗ = B T .
Remark. Most people call A∗ the adjoint of A — though, unfortunately, the word “adjoint” has already
been used for the transpose of the matrix of cofactors in the determinant formula for A−1 . (Sometimes
1
people try to get around this by using the term “classical adjoint” to refer to the transpose of the matrix
of cofactors.) In modern mathematics, the word “adjoint” refers to a property of A∗ that I’ll prove below.
This property generalizes to other things which you might see in more advanced courses.
The ( )∗ operation is sometimes called the Hermitian — but this has always sounded ugly to me, so
I won’t use this terminology.
Since this is an introduction to linear algebra, I’ll usually refer to A∗ as the conjugate transpose,
which at least has the virtue of saying what the thing is.
Proposition. Let U and V be complex matrices, and let k ∈ C.
(a) (U ∗ )∗ = U .
(b) (kU + V )∗ = kU ∗ + V ∗ .
(c) (U V )∗ = V ∗ U ∗ .
(d) If u, v ∈ Cn , their dot product is given by
u · v = v ∗ u.
u · v = u1 v1 + u2 v2 + · · · + un vn .
Notice that you take the complex conjugates of the components of v before multiplying!
This can be expressed as the matrix multiplication
u1
u2
u · v = [ v1 v2 ... = v u.
· · · vn ] ∗
un
2
It’s a common notational abuse to write the number “−4 + 13i” instead of writing it as a 1 × 1 matrix
“[−4 + 13i]”.
(b)
(1 − 8i)a + (2 + 3i)b = 0.
I can get a solution (a, b) by switching the numbers 1 − 8i and 2 + 3i and negating one of them:
(a, b) = (2 + 3i, −1 + 8i).
There are two points about the equation u · v = v ∗ u which might be confusing. First, why is it necessary
to conjugate and transpose v? The reason for the conjugation goes back to the need for inner products to
be positive definite (so u · u is a nonnegative real number).
The reason for the transpose is that I’m using the convention that vectors are column vectors. So if u
and v are n-dimensional column vectors and I want the product to be a number — i.e. a 1 × 1 matrix — I
have to multiply an n-dimensional row vector (1 × n) and an n-dimensional column vector (n × 1). To get
the row vector, I have to transpose the column vector.
Finally, why do u and v switch places in going from the left side to the right side? The reason you write
v ∗ u instead of u∗ v is because inner products are defined to be linear in the first variable. If you use u∗ v you
get a product which is linear in the second variable.
Of course, none of this makes any difference if you’re dealing with real numbers. So if x and y are
vectors in Rn , you can write
x · y = xT y or x · y = y T x.
3
(c) The columns of a unitary matrix form an orthonormal set.
Proof. (a)
(U x) · (U y) = (U y)∗ (U x) = y ∗ U ∗ U x = y ∗ Ix = y ∗ x = x · y.
Since U preserves inner products, it also preserves lengths of vectors, and the angles between them. For
example,
kxk2 = x · x = (U x) · (U x) = kU xk2 , so kxk = kU xk.
kU xk = kλxk = |λ|kxk.
Here ck T is the complex conjugate of the kth column ck , transposed to make it a row vector. If you look
at the dot products of the rows of U ∗ and the columns of U , and note that the result is I, you see that the
equation above exactly expresses the fact that the columns of U are orthonormal.
For example, take the first row c1 T . Its product with the columns c1 , c2 , and so on give the first row of
the identity matrix, so
c1 · c1 = 1, c1 · c2 = 0, . . . , c1 · cn = 0.
This says that c1 has length 1 and is perpendicular to the other columns. Similar statements hold for
c2 , . . . , cn .
(a, b) · (1 + 2i, 1 − i) = 0
a
[ 1 − 2i 1 + i ] =0
b
This gives
(1 − 2i)a + (1 + i)b = 0.
4
I may take a = 1 + i and b = −1 + 2i. Then
√
k(1 + i, −1 + 2i)k = 7.
√
So I need to divide each of a and b by 7 to get a unit vector. Thus,
1 1
(c, d) = √ (1 + i), √ (−1 + 2i) .
7 7
Au · v = u · A∗ v.
Proof.
u · A∗ v = (A∗ v)∗ u = v ∗ (A∗ )∗ u = v ∗ Au = Au · v.
Remark. If (·, ·) is any inner product on a vector space V and T : V → V is a linear transformation, the
adjoint T ∗ of T is the linear transformation which satisfies
(This definition assumes that there is such a transformation.) This explains why, in the special case
of the complex inner product, the matrix A∗ is called the adjoint. It also explains the term self-adjoint in
the next definition.
Corollary. (Adjointness) let A ∈ M (n, R) and let u, v ∈ Rn . Then
Au · v = u · AT v.
Proof. This follows from adjointness in the complex case, because A∗ = AT for a real matrix.
Definition. An complex matrix A is Hermitian (or self-adjoint) if A∗ = A.
Note that a Hermitian matrix is automatically square.
For real matrices, A∗ = AT , and the definition above is just the definition of a symmetric matrix.
It is no accident that the diagonal entries are real numbers — see the result that follows.
Here’s a table of the correspondences between the real and complex cases:
5
Proposition. Let A be a Hermitian matrix.
(a) The diagonal elements of A are real numbers, and elements on opposite sides of the main diagonal are
conjugates.
Proof. (a) Since A = A∗ , I have Aij = Aji . This shows that elements on opposite sides of the main diagonal
are conjugates.
Taking i = j, I have
Aii = Aii .
But a complex number is equal to its conjugate if and only if it’s a real number, so Aii is real.
Therefore, λ = λ — but a number that equals its complex conjugate must be real.
(c) Suppose µ is an eigenvalue of A with eigenvector u and λ is an eigenvalue of A with eigenvector v. Then
Example. Let
1 2−i
A= .
2 + i −3
Show that the eigenvalues are real, and that eigenvectors for different eigenvalues are orthogonal.
x2 + 2x − 8 = (x + 4)(x − 2).
(2 − i, −5) is an eigenvector.
For 2, the eigenvector matrix is
−1 2 − i
A − 2I = .
2 + i −5
(2 − i, 1) is an eigenvector.
Note that
(2 − i, −5) · (2 − i, 1) = (2 + i)(2 − i) + (1)(−5) = 5 − 5 = 0.
Thus, the eigenvectors are orthogonal.
6
Since real symmetric matrices are Hermitian, the previous results apply to them as well. I’ll restate the
previous result for the case of a symmetric matrix.
Corollary. Let A be a symmetric matrix.
(a) The elements on opposite sides of the main diagonal are equal.
From (a), a diagonalizing matrix and the corresponding diagonal matrix are
2 3 1 0
P = and D = .
−3 2 0 3
Now P −1 AP = D, so
2 3 1 0 1 2 −3 1 31 12
A = P DP −1
= = .
−3 2 0 3 13 3 2 13 12 21
Compute the characteristic polynomial of A, and show directly that the eigenvalues must be real num-
bers.
7
p − x q + ri
|A − xI| = = (x − p)(x − s) − (q + ri)(q − ri) = x2 − (p + s)x + [ps − (q 2 + r 2 )].
q − ri s − x
The discriminant is
(p+s)2 −4(1)[ps−(q 2 +r 2 )] = (p2 +2ps+s2 )−4ps+4(q 2 +r 2 ) = (p2 −2ps+s2 )+4(q 2 +r 2 ) = (p−s)2 +4(q 2 +r 2 ).
Since this is a sum of squares, it can’t be negative. Hence, the roots of the characteristic polynomial —
the eigenvalues — must be real numbers.
A~v = λ~v .
Let
↑ ↑ ↑
U = ~v −
→2 . . . −
u u→n .
↓ ↓ ↓
Since the columns are orthonormal, U is unitary. Then
λ junk
0
U ∗ AU = [std → B] · A · [B → std] =
... .
B
0
The issue here is why the first column of the last matrix is what it is. To see this, notice that [std →
B] · A · [B → std] is the matrix [T ]B,B of the linear transformation T (~x) = A~x relative to B. But
1
(Note that since U is unitary, U ∗ = U −1 .)
Proof. Find a unitary matrix U such that U ∗ AU = T , where T is upper triangular. Then since A∗ = A,
(U ∗ AU )∗ = T ∗ , U ∗ A∗ U = T ∗ , U ∗ AU = T ∗ .
But then T = T ∗ . T is upper triangular, T ∗ (the conjugate transpose) is lower triangular, so T must
be diagonal.
Corollary. (The Principal Axis Theorem) If A is a real symmetric matrix, there is an orthogonal matrix
O and a diagonal matrix D such that
O T AO = D.
(Note that since O is orthogonal, O T = O −1 .)
Proof. Real symmetric matrices are Hermitian and real orthogonal matrices are unitary, so the result follows
from the Spectral Theorem.
I showed earlier that for a Hermitian matrix (or in the real case, a symmetric matrix), eigenvectors
corresponding to different eigenvalues are perpendicular. Consequently, if I have an n × n Hermitian matrix
(or in the real case, an n × n symmetric matrix) with n different eigenvalues, the corresponding eigenvectors
form an orthogonal basis. I can get an orthonormal basis — and hence, a unitary diagonalizing matrix (or
in the real case, an orthogonal diagonalizing matrix) — by simply dividing each vector by its length.
Things are a little more complicated if I have fewer than n eigenvalues. The Spectral Theorem guarantees
that I’ll have n independent eigenvectors, but some eigenvalues will have several eigenvectors. In this case,
I’d need to use Gram-Schmidt on the eigenvectors for each eigenvalue to get an orthogonal set of eigenvectors
for each eigenvalue. Eigenvectors corresponding to different eigenvalues are still perpendicular by the result
cited earlier, so the orthogonal sets for the eigenvalues fit together to form an orthogonal basis. As before, I
get an orthonormal basis by dividing each vector by its length.
To keep the computations simple, I’ll stick to the first case (n different eigenvalues) in the examples
below.
Example. Let
−1 2 0
A = 2 2 0.
0 0 3
Find an orthogonal matrices O which diagonalizes A. Find O −1 and the corresponding diagonal matrix.
The characteristic polynomial is
(x − 3)[(x − 2)(x + 1) − (2)(2)] = −(x − 3)2 (x + 2).
The eigenvalues are x = 3 and x = −2.
For x = 3, the eigenvector matrix is
−4 2 0 2 −1 0
A − 3I = 2 −1 0 → 0 0 0
0 0 0 0 0 0
This gives the independent eigenvectors (1, 2, 0) and (0, 0, 1). Dividing them by their lengths, I get
1
√ (1, 2, 0) and (0, 0, 1).
5
For x = −2, the eigenvector matrix is
1 2 0 1 2 0
A + 2I = 2 4 0 → 0 0 1
0 0 5 0 0 0
2
1
This gives the independent eigenvector (−2, 1, 0). Dividing it by its length, I get √ (−2, 1, 0).
5
Thus, the orthogonal diagonalizing matrix is
1 2
√ 0 −√
5 5
2 1
O= .
√ 0 √
5 5
0 1 0
Then
1
2
√ √ 0
5 5
O −1 = OT = 0 0 1.
2 1
−√ √ 0
5 5
The diagonal matrix is
3 0 0
O T AO = 0 3 0 .
0 0 −2
Example. Let
3 1 − 2i
A= .
1 + 2i −1
Find a unitary matrix U which diagonalizes A.
Since A is Hermitian, the Spectral Theorem applies.
The characteristic polynomial is
3 − λ 1 − 2i
det(A − λI) = det = λ2 − 2λ − 8 = (λ − 4)(λ + 2).
1 + 2i −1 − λ
(Since this is a 2 × 2 A − λI matrix, I know that the second row must be a multiple of the first.)
An eigenvector (a, b) must satisfy
(−1)a + (1 − 2i)b = 0.
I can get a solution by swapping the −1 and the 1 − 2i and negating the −1 to give 1: (a, b) = (1 − 2i, 1)
is an eigenvector for λ = 4.
For λ = −2, partial row reduction gives
5 1 − 2i 5 1 − 2i
A + 2I = → .
1 + 2i 1 0 0
Using the same technique as I used for λ = 4, I see that (a, b) = (1−2i, −5) is an eigenvector for λ = −2.
The result I proved earlier says that these eigenvectors are automatically perpendicular. Check by
taking their complex dot product:
3
Find the lengths of the eigenvectors:
√ √
k(1 − 2i, 1)k = 6, k(1 − 2i, −5)k = 30.
1 1
√ (1 − 2i, 1), √ (1 − 2i, −5).
6 30
ut = a2 uxx .
The solution to the one-dimensional heat equation with an arbitrary initial distribution is an infinite
sum
∞
n2 π 2 a2 t
X πnx
bn exp − sin .
n=1
L2 L
Consider integrable functions f and g defined on the interval −L ≤ x ≤ L, where L > 0. Define the
inner product of f and g by
1 L
Z
hf, gi = f (x)g(x) dx.
L −L
Example. Verify the linearity and symmetry axioms for an inner product for
L
1
Z
hf, gi = f (x)g(x) dx.
L −L
L L
1 1
Z Z
a· f1 (x)g(x) dx + f2 (x)g(x) dx = a hf1 , gi + hf2 , gi .
L −L L −L
Symmetry is easy:
L L
1 1
Z Z
hf, gi = f (x)g(x) dx = g(x)f (x) dx = hg, f i .
L −L L −L
Note that this should be called “inner product” (with quotes), since it isn’t positive definite. Since
[f (x)]2 ≥ 0, it follows that
L L
1 1
Z Z
hf, f i = f (x) · f (x) dx = [f (x)]2 dx ≥ 0.
L −L L −L
1
But you could have a function which was nonzero — for example, a function which was 0 at every point
in −L ≤ x ≤ L, but nonzero at a single point — such that (f, f ) = 0.
We can get around this problem by confining ourselves to continuous functions. Unfortunately, many
of the functions we’d like to apply Fourier series to aren’t continuous.
Example. I’ll illustrate the first formula with some numerical examples.
If I take different cosines, I should get 0. I’ll try
1 L
3πx 7πx
Z
cos cos dx.
L −L L L
Use the identity
1
cos a cos b = (cos(a + b) + cos(a − b)) .
2
I get
3πx 7πx 1 10πx 4πx
cos cos = cos + cos .
L L 2 L L
So
L L
1 3πx 7πx 1 1 10πx 4πx
Z Z
cos cos dx = cos + cos dx =
L −L L L L −L 2 L L
L
1 L 10πx L 4πx
sin + sin = 0.
2L 10π L 4π L −L
(Remember that the sine of a multiple of π is 0!)
If I do the integral with both cosines the same, I use the double angle formula:
1 L
Z L
3πx 3πx 1 6πx
Z
cos cos dx = 1 + cos dx =
L −L L L 2L −L L
L
1 L 6πx
x+ sin = 1.
2L 6π L −L
There is nothing essentially different in the derivation of the general formulas.
In terms of the (almost) inner product defined above, the orthogonality relations are:
πjx πkx 0 if j 6= k
cos , cos =
L L 1 if j = k
πjx πkx
cos , sin =0
L L
πjx πkx 0 if j 6= k
sin , sin =
L L 1 if j = k
2
In other words, the sine and cosine functions form an orthnormal set (again, allowing that we don’t
quite have an inner product here).
1
As a constant function, I’ll use √ . You can verify that this is perpendicular to the sines and cosines
2
above, and that it has length 1.
For vectors in an inner product space, you can get the components of a vectors by taking the inner prod-
uct of the vector with elements of an orthonormal basis. For example, if {u1 , u2 , . . . , un } is an orthonormal
basis, then a vector x can be written as a linear combination of the u’s by taking inner product of x with
the u’s:
x = hx, u1 iu1 + hx, u2 iu2 + · · · + hx, un iun .
By analogy, I might try to expand a function in terms of the sines and cosines by taking the inner
1
product of f with the sines and cosines, and the constant function √ . Doing so, I get
2
L
1
Z πnx
an = f (x) cos dx
L −L L
L
1
Z πnx
bn = f (x) sin dx
L −L L
πnx
The cosine term is an cos .
L
πnx
The sine term is bn sin .
L
π·0·x
If I set n = 0 in the an formula, then since cos = 1, I’d get
L
L
1
Z
a0 = f (x) dx.
L −L
1
So for the constant function √ , the coefficient is
2
L L
1 1 1 1 1
Z Z
f (x) · √ dx = √ · f (x) dx = √ a0 .
L −L 2 2 L −L 2
1
But my constant function is √ , so the constant term in the series will be
2
1 1 1
√ a0 · √ = a0 .
2 2 2
You can interpret it as an expression for f as an infinite “linear combination” of the orthnormal sine
and cosine functions.
There are several issues here. For one thing, this is an infinite sum, not the usual finite linear combination
which expresses a vector in terms of a basis. Questions of convergence always arise with infinite sums.
Moreover, even if the infinite sum converges, why should it converge to f (x)?
In addition, we’re only doing this by analogy with our results on inner product spaces, because this
“inner product” only satisfies two of the inner product axioms.
3
It’s important to know that these issues exist, but their treatment will be deferred to an advanced course
in analysis.
Before doing some examples, here are some notes about computations.
You can often use the complex exponential (DeMoivre’s formula) to simplify computations:
πinx πnx πnx
exp = cos + i sin .
L L L
Specifically, let
L
1
Z
πinx
ck = f (x) exp dx.
L −L L
Then
an = re cn , bn = im cn .
This allows me to compute a single integral, then find the real and imaginary parts of the result to get
the sine and cosine coefficients.
On some occasions, symmetry can be used to obtain the values of some of the coefficients.
1. If a function is even — that is, if f (x) = f (−x) for all x, so the graph is symmetric about the y-axis
— then bn = 0 for all n.
2. If a function is odd — that is, if f (−x) = −f (x) for all x, so the graph is symmetric about the origin
— then an = 0 for all n.
This makes sense, since the cosine functions are even and the sine functions are odd.
Here’s a final remark before I get to the examples. A sum of periodic functions is periodic. Since sine
and cosine are periodic, you can only expect to represent periodic functions by a Fourier series. Therefore,
outside the interval −L ≤ x ≤ L, you must “repeat” f (x) in periodic fashion (so f (x) = f (x + L) for all x).
For example, consider the function f (x) = x, −1 ≤ x < 1. L = 1, so I must “repeat” the function every
two units. Picture:
y
x
-1 1
The Fourier expansion of f (x) only converges to y = x on the interval −1 < x < 1. Outside of that
interval, it converges to the periodic function in the picture (except perhaps at the jump discontinuities).
4
1
0.5
-4 -2 2 4
-0.5
-1
0 1
i πinx i i i
+ − eπinx = eπin − 1 .
e 1 − e−πin −
πn −1 πn 0 πn πn
Now
e−πni = cos πn − i sin πn = cos πn = (−1)n
eπni = cos πn + i sin πn = cos πn = (−1)n
So
2i
cn = (1 − (−1)n ) .
πn
Now
2
an = re cn = 0, and bn = im cn = (1 − (−1)n ) .
πn
So the Fourier expansion is
∞
X 2
f (x) ∼ (1 − (−1)n ) sin πnx.
n=1
πn
Here are the graphs of the first, fifth, and tenth partial sums:
1 1 1
-4 -2 2 4 -4 -2 2 4 -4 -2 2 4
-1 -1 -1
Notice the “ringing” at the points of discontinuity, and how the graphs are approaching the original
square wave.
Note that if x = 0, all the terms of the series are 0, and the series converges to 0. In fact, under
reasonable conditions — specifically, if f is of bounded variation in an interval around a point c — the
5
Fourier series will converges to the average of the left and right-hand limits at the point. In this case, the
left-hand limit is −1, the right-hand limit is 1, and their average is 0. On the other hand, f (0) was defined
to be 1.
0.5
-4 -2 2 4
1 0 1 1
1 4
Z Z Z
2 0
a0 = f (x) dx = 1 dx + (x − 1) dx = [x]−1 + (x − 1)3 = .
−1 −1 0 3 0 3
0 1
−i πinx i 2 2i
e + − (x − 1)2 + 2 2 (x − 1) + 3 3 eπinx =
πn −1 πn π n π n 0
i 2i i 2 2i
1 − e−πni + 3 3 eπin +
− + 2 2 − 3 3.
πn π n πn π n π n
Therefore,
i 2i i 2 2i
cn = − (1 − (−1)n ) + 3 3 (−1)n + + 2 2− 3 3 =
πn π n πn π n π n
i 2i 2 2i
(−1)n + 3 3 (−1)n + 2 2 − 3 3 .
πn π n π n π n
Now take real and imaginary parts:
2 1 2
an = re cn = , bn = im cn = (−1)n − 3 3 (1 − (−1)n ) .
π 2 n2 πn π n
6
Example. Find the Fourier expansion of
x+π if −π ≤ x < 0
f (x) = and f (x) = f (x + 2π) for all x.
0 if 0 ≤ x < π
π 0 π 0
1 1 1 1 1 2
Z Z Z
π
a0 = f (x) dx = (x + π) dx + (0) dx = x + πx = .
π −π π −π π 0 π 2 −π 2
Next, compute the higher order terms. As in the computation of a0 , I only need the integral from π to
0, since the function equals 0 from 0 to π:
π 0 0
1 1 1 1 1 inx
Z Z
πinx/π inx inx
cn = f (x)e dx = (x + π)e dx = (x + π)e + 2e =
π −π π −π π in n −π
1 1 1 1 1 1 1 1 −πin
·π+ 2 − · 0 · e−πin − 2 e−πin = + − e =
π in n in n in πn2 πn2
1 1 1
− i+ − (−1)n .
n πn2 πn2
Thus,
1 1 1
an = − (−1)n and bn = − .
πn2 πn2 n
The series is
∞
π X 1 1 n 1
f (x) ∼ + − (−1) cos nx − sin nx .
4 n=1 πn2 πn2 n