Linear Algebra note
Linear Algebra note
T.R. RAMADAS
1. Introduction
1.1. Preliminaries. You are supposed to know about the set Q of rational
numbers, the set R of real numbers and the set C of complex numbers. These
are all fields (the only ones we will be concerned with for the most part).
Note the inclusions
Q⊂R⊂C
From this point on we deal with finite dimensional vector spaces and maps
between them, freely moving back and forth between abstract (=“basis-free”)
and matrix arguments.
(4) Given a linear map from a finite-dimensional vector space to itself, we
will define the notions of determinant, eigenvalue, eigenvector, (generalised)
eigenspace, and (generalised) eigenspace decomposition. We define the char-
acteristic and minimal polynomials.
(5) We define real/complex inner product spaces, symmetric/hermitian lin-
ear maps. We prove the spectral theorem in “operator” and “matrix ver-
sions.”
(6) Rotations/Reflections in three dimensions
(7) Quotients, duals, tensors.
1.3. Please note the following conventions. Until further notice, our
definitions, arguments, and theorems will refer to the real numbers. In fact
everything will continue to hold if we replace the real numbers with the ratio-
nals or with the complex numbers. Some authors deal with such a situation
by working with a field k which is declared to be any one of the above. I
choose to take the more informal route.
In the context of real vector spaces, we will use “scalar” and “real num-
ber” interchangeably. In general if we replace R by an arbitrary field k (in
particular by Q or C) by scalar we will mean an element of k. When it is
important to specify the field, similarly we will talk of “a k-vector space V ”
or a vector space “over k”.
4 T.R. RAMADAS
R×V →V
(λ, v) ↦ λ v “scalar multiplication”
such that
(1) Addition of vectors is associative and commutative. There is a
unique vector (“the zero vector”) 0V (or simply 0 when confusion
is unlikely) such that v + 0V = 0V + v = v for every vector v. Ev-
ery vector v has its unique “additive inverse”, denoted −v such that
v + (−v) = (−v) + v = 0V .
(2) Obvious identities hold, which relate to the interplay between addi-
tion and scalar multiplication:
λ (v + w) = λ v + λ w
(λ + µ) v = λ v + µ v (A)
1 v=v
λ (µ v) = λµ v (B)
Exercises:
(1) (a) The composition of two linear maps is linear. (b) The inverse of a
bijective linear map is linear, so a bijective linear map is automatically an
isomorphism.
(2) Suppose given a vector space V and a subspace W (i.e., a subset closed
under addition and scalar multiplication). Then 0V ∈ W and for every
w ∈ W , −w ∈ W . Thus W is a vector space and the inclusion W ↪ V is a
linear, injective map.
(3) Given a linear map T ∶ V → Ṽ , its kernel is the subset of V defined as:
ker(T ) ≡ {v ∈ V ∣T (v) = 0Ṽ }
Prove (a) ker(T ) is a subspace (b) T is injective iff ker(T ) = {0V } (i.e., the
kernel is the singleton set containing (only) the zero vector of V .)
(4) The image of a linear map T ∶ V → Ṽ ,
T (V ) ≡ image(T ) ≡ {ṽ ∈ Ṽ ∣∃v ∈ V such that ṽ = T (v)} = {T (v)∣v ∈ V }
is a subspace of Ṽ . If ker(T ) = {0V }, we have an isomorphism V →
image(T ).
In terms that may be more familiar to you, V is the domain of T , Ṽ is the
codomain, and T (V ) = image(T ) is the range of T .
Definition 3.3. Given a vector space V and two subspaces W1 and W2 , the
sum W1 + W2 is the subset {w1 + w2 ∣w1 ∈ W1 , w2 ∈ W2 }.
Exercise: (a) Given any family {Vα } of subspaces of a vector space V the
intersection ⋂ Vα is a subspace. (b) The sum W1 + W2 is a subspace. In fact,
it is the intersection
W1 + W2 = ⋂ V ′
Wi ⊂V ′ ⊂V
LINEAR ALGEBRA 7
Show that this makes W̃1 ×W̃2 into a vector space, with zero vector (0W̃1 , 0W̃2 )
and −(w̃1 , w̃2 ) = (−tw1 , −tw2 ).
(3) Suppose V = W1 ⊕W2 . Consider the cartesian product W1 ×W2 , endowed
with the structure of a vector space as in (2). Prove that the map W1 ×W2 →
V given by
(w1 , w2 ) ↦ w1 + w2
is a bijective linear map and therefore an isomorphism.
(4) Returning to (2), let ι1 ∶ W̃1 → W̃1 × W̃2 and ι2 ∶ W̃2 → W̃1 × W̃2 denote
the injective linear maps defined below:
ι1 (w̃1 ) = (w̃1 , 0W̃2 )
ι2 (w̃2 ) = (0W̃1 , w̃2 )
Prove that V = ker(P ) ⊕ image(P ). Verify that the direct sum decomposi-
tion gives a simple description of P :
P (w1 + w2 ) = w2
if w1 ∈ ker(P ) and w2 ∈ image(P ).
(6) Let V be a vector space and I ∶ V → V a linear map such that
I 2 ≡ I ○ I = IV
By considering the map
1
P = (IV − I)
2
prove that V = V1 ⊕ V−1 , where
V±1 = {v ∈ V ∣Iv = ±v}
.
Remark 3.5. The point of Exercises (1)-(4) is that if V is the direct sum
of two subspaces W1 and W2 , one can identify it with the Cartesian product
W1 × W2 . In this context V is sometimes called the “internal” direct sum of
the subspaces W1 and W2 . Conversely, if we start with two vector spaces W̃1
and W̃2 their Cartesian product is the direct sum of two subspaces which
can be identified with the two “factors” of the Cartesian product. In this
context W̃1 × W̃2 is sometimes called the “external” direct sum of W̃1 and
W̃2 .
LINEAR ALGEBRA 9
(1) A singleton set, denoted R0 . The sole element, denoted 0, is the zero
vector. Addition, scalar multiplication, and additive inverse all defined in
the obvious way: 0 + 0 = 0, λ 0 = 0, −0 = 0.
(2) The set R1 . This is just the set of real numbers R itself, regarded as
a real vector space. Addition and scalar multiplication are defined in the
obvious way. The zero vector is 0 and the additive inverse of λ is −λ.
(3) For n ≥ 2, define Rn to be the set of ordered n-tuples (x1 , . . . , xn ) of real
numbers. Addition and scalar multiplication are defined coordinate-wise:
(x1 , . . . , xn ) + (x′1 , . . . , x′n ) = (x1 + x′1 , . . . , xn + x′n )
λ (x1 , . . . , xn ) = (λx1 , . . . , λxn )
Clearly (0, . . . , 0) is the zero vector and −(x1 , . . . , xn ) = (−x1 , . . . , −xn ).
(4) Let S be an arbitrary set. Then the set of functions from S → R is a
vector space. (This set is often denoted RS , but we will use this notation
sparingly.) Addition and scalar multiplication are defined as follows (here f
and g are functions S → R, and x ∈ S):
(f + g)(x) = f (x) + g(x)
λ f (x) = λf (x)
What is the zero vector? how is −f defined?
(5) Let S = {1, 2, . . . , n}. There is an obvious isomorphism RS → Rn :
f ↦ (f (1), . . . , f (n))
More generally let S is be a finite set with n elements, and suppose a bijection
{1, . . . , n} → S is chosen yielding an ordering (x1 , . . . , xn ) of the elements of
S. Then again we get an isomorphism RS → Rn :
f ↦ (f (x1 ), . . . , f (xn ))
(6) If V, W are vector spaces, let L(V, W ) denote the space of linear maps
from V to W . (In a more algebraic context this would be denoted Hom(A, B).)
This is itself a vector space, with operations defined as follows:
(S + T )(v) = S(v) + T (v) (sum of linear maps S, T )
(λ T )(v) = λ T (v) (multiplying a linear map T by a scalar λ)
(−T )(v) = −T (v)
The zero linear map is the one that sends all vectors in V to 0W .
10 T.R. RAMADAS
Exercises:
(7) Let a < b be real numbers. Then the set C 0 [(a, b)] of continuous real-
valued functions from (a, b) is a subspace of the vector space of all real-
valued functions from (a, b). The set C 1 [(a, b)] of continuously differen-
tiable3 real-valued functions from (a, b) is a subspace of C 0 [(a, b)]. The
map f ↦ f ′ is a linear map C 1 [(a, b)] → C 0 [(a, b)]. What is its kernel?
What is the image?
(8) Is there a linear map I ∶ C 0 [(a, b)] → C 1 [(a, b)] such that if I(g) = f ,
then f ′ = g?
(9) Exhibit a surjective linear map C 0 [(a, b)] → Rn , for n any natural num-
ber.
(10) Consider the map Sum ∶ R2 → R defined as follows:
Sum((x1 , x2 )) = x1 + x2
Check that this is a linear surjective map. Determine the kernel, and draw
a sketch as it would appear on a sheet of graph paper. Determine all linear
maps4 RInv ∶ R → R2 such that
Sum(RInv(t)) = t (S)
Hint: a linear map from R to any vector space is characterised by what
it does to 1. Let RInv(1) = (a, b), and characterise the choices of vector
(a, b) ∈ R2 that are compatible with the equation (S) above.
For each possible RInv (i.e., admissible choice of (a, b)) sketch the image
image(RInv). Do you see a pattern?
(11) Let l, m be natural numbers. Exhibit a linear isomorphism Rl × Rm →
Rl+m .
Warning regarding notation: From now on we will usually drop the “ ” sig-
nifying scalar multiplication; λv will mean λ v.
5.1. Linear dependence and bases. Let V be a vector space, and let S
be a nonempty subset of vectors.
Definition 5.1. The span of the set S is the set (denoted < S >) of vectors
v that can be written as a sum5
v = λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
where λi are scalars and vi ∈ S. (This representation is not necessarily
unique; in particular, m could depend on v.)
The span < S > is a subspace and the intersection of all subspaces that
contain S.
Definition 5.2. We will say that S generates 6 V if V =< S >, i.e., if every
vector v can be written as a sum
v = λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
where λi are scalars and the vi belong to the set S.
Definition 5.3. Let V be a vector space, and let S be a nonempty subset of
nonzero vectors. We will say that S is a linearly independent set of vectors
if the following holds: For any m ≥ 1 and m distinct vectors v1 , . . . , vm in S,
λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
is a nonzero vector unless all the λi vanish.
The usual definition of a linearly independent set does not assume that
the vectors are each nonzero, but this is an immediate consequence of the
definition.
Note that any singleton set {v1 }, with v1 a nonzero vector, is linearly in-
dependent. Clearly any nonempty subset of a linearly independent set is
linearly independent. Sometimes the empty set of vectors is declared to be
linearly independent.
Exercise: Let V be a vector space, and let S be a nonempty finite subset
of nonzero vectors. Let φ ∶ RS → V be the linear map
φ(f ) = ∑ f (v)v
v∈S
Then (a) S generates V iff φ is surjective, and (b) S is linearly independent
iff φ is injective.
5We say that “v is a linear combination of the vectors v , . . . , v ”.
1 m
6In fact, in the vector space context, it is more usual to say S “spans” V .
12 T.R. RAMADAS
We can do better; to recover the above result from the next one start with
S̃ = {v1 }, where v1 is any one of the elements of S.
Theorem 5.6. Let V be a vector space, and let S be a nonempty finite subset
of nonzero vectors that generates V . Let S̃ ⊂ S be a linearly independent
subset. Then V is generated by a linearly independent subset of S that
contains S̃.
Proof. If all the vectors in S ∖ S̃ belong to < S̃ > then clearly < S̃ >=< S >= V
and we are done.
Else, there exists a vector ṽ1 ∈ S∖ < S̃ >. In this case, S̃1 ≡ S̃ ∪ {ṽ1 } is a
linearly independent set. To see this, apply the Lemma 5.4. If < S̃1 >= V we
are done, if not we iterate the process with a ṽ2 ∈ S∖ < S̃1 >, and so on.
When this process terminates, which it must since S is a finite set, we obtain
a linearly independent subset of S that generates V and contains S̃.
Definition 5.7. Let V be a vector space. A set B of vectors is called a basis
of V if B is linearly independent and generates V .
5.2. Dimension. This theory can be developed in two ways. One way
(advocated, for example, in Axler,S: Linear Algebra done Right) is developed
in 5.3. I prefer, as Artin does, to invoke a basic result from matrix theory,
which will be proved in the next section. This is a fact that you are probably
familiar with:
Given n2 linear homogeneous equations in n1 unknowns, with n1 > n2 (i.e.,
if “there are more unknowns than equations”), there is a nontrivial solution.
More formally:
Theorem 5.8. Suppose n1 > n2 . Then any linear map T ∶ Rn1 → Rn2 has a
nonzero kernel.
LINEAR ALGEBRA 13
Proof. We leave the proofs of (1) and (2) as exercises. As for (3), note that
if S is any generating set of V , its image under T ,
T (S) ≡ {T (v)∣v ∈ S}
generates V̂ . (This follows from the surjectivity of T .) So V̂ is finitely
generated. Let {⃗ e′1 , . . . , e⃗′m } be a basis of V̂ . Choose for each e⃗′i a vector
e⃗i ∈ V such that T (⃗ei ) = e⃗′i ). (For later use, we call this construction “lifting
a basis to a linearly independent set”.) I claim that the set {⃗ v1 , . . . , v⃗m } is
linearly independent. For,
λ1 v⃗1 + . . . λm v⃗m = 0 Ô⇒ T (λ1 v⃗1 + . . . λm v⃗m ) = 0 Ô⇒ λ1 v⃗1′ + . . . λm v⃗m
′
=0
Ô⇒ λi = 0 ∀i
This shows that dim V ≥ m = dim V̂ . In case this is an equality, the set
v1 , . . . , v⃗m } is a basis of V , and T is clearly an isomorphism.
{⃗
As for (4), choose a basis {⃗ v1 , . . . , v⃗l } for ker(T ), and extend it to a basis
v1 , . . . , v⃗l , v⃗l+1 , . . . , v⃗n } of V , where n = dim V . Let V̂ ′ ⊂ V be the span of
{⃗
the linearly independent set {⃗ vl+1 , . . . , v⃗n }. Clearly V̂ ′ has dimension n − l.
On the other hand T ∣V̂ ′ ∶ V̂ → image(T ) is an isomorphism9 (why?) so
′
To understand the above proof, keep the following example in mind: consider
the map T ∶ R3 → R3 , given by
T (x, y, z) = (x − y, y − z, z − x)
Then
ker(T ) = {(t, t, t)∣t ∈ R}
image(T ) = {(u, v, w)∣u + v + w = 0}
9This is a recurrent construction and argument, so please make sure you understand
it.
LINEAR ALGEBRA 15
Then a possible basis of ker(T ) consists of the single vector v⃗1 = (1, 1, 1),
and one can extend this to a basis of R3 in many ways. Let us choose the
two vectors
v⃗2 = (1, 0, 0), v⃗3 = (0, 1, 0)
Then V̂ ′ (the span of v⃗2 and v⃗3 ) is the xy-plane. The map T ∣V̂ ′ is
V̂ ′ ∋ (x, y, 0) ↦ (x − y, y, −x) ∈ image(T ) = {(u, v, w)∣u + v + w = 0}
This map is clearly bijective. You can check that (1, 0, −1) (the image of v⃗2 )
and (−1, 1, 0) (the image of v⃗3 ) together form a basis for image(T ).
Remark 5.13. Given a linear map T ∶ V → W , with V finite-dimensional,
the nullity of T is the dimension of its kernel, and rank of T the dimension
of its image:
nullity of T = dim ker(T ) rank of T = dim image(T )
Theorem 5.12(4) is often referred to as the rank-nullity theorem.
Exercise: Once we have the above proof, we can define dimension without
appealing to the Theorem . Your task: use the rank-nullity Theorem to give
a proof of Theorem .
Exercise (Right inverse): Suppose given a linear map  ∶ V → W . A
right inverse B̂ ∶ W → V is a linear map such that  ○ B̂ = IW , where IW is
the identity map of W : IW (w) = w, w ∈ W . Prove that if a right inverse B̂
exists, then (B̂ is injective and) Â is surjective. Suppose from now on that
A is indeed surjective. Prove that
(1) if V is finite-dimensional, so is W ,
(2) if W is finite-dimensional, a right inverse exists, and if V is finite-
dimensional, then dim W ≤ dim V .
In particular, if V, W are finite-dimensional, a linear map  ∶ V → W is
surjective iff a right inverse exists.
Exercise (Left inverse): Suppose given a linear map  ∶ V → W . A left
inverse Ĉ ∶ W → V is a linear map such that Ĉ ○ Â = IV . Prove that if a
left inverse Ĉ exists, then (Ĉ is surjective and) Â is injective. Suppose from
now on that  is indeed surjective. Prove that
(1) if W is finite-dimensional, so is V ,
(2) if W is finite-dimensional, a left inverse exists, and dim W ≥ dim V .
In particular, if V, W are finite-dimensional, a linear map  ∶ V → W is
injective iff a left inverse exists.
Exercise (Inverse): Let V be a finite-dimensional vector space. Show that
linear map  ∶ V → V is surjective iff it is injective iff it is bijective. (Hint:
use the rank-nullity theorem.)
5.4. Sets, ordered sets, and all that. Note that in the above exposition,
we have spoken of the following properties of a set S of vectors in a vector
space:
(1) the span < S > of S,
(2) linear independence of S, and
(3) whether the set S is a basis.
It is more usual to consider ordered sets, and Artin does discuss this matter.
You may find the following remarks helpful. For simplicity let us assume
that S is a finite set, with n elements.
(1) An ordering is a bijective map from the set of integers {1, 2, , . . . , n} to
the set S.
LINEAR ALGEBRA 17
(2) As a matter of notation to specify a set one lists its elements within
curly brackets, for example: {a, l, p} is a set of three letters. Endowed with
their natural order, this would be the ordered set (a, l, p), with the rounded
brackets signaling that this is an ordered set, with the ordering:
1 ↦ a, 2 ↦ l, 3 ↦ p,
(3) The normal practice is define span, linear dependence, and basis as ap-
plied to ordered sets. Immediately one proves that matters are preserved
under changes of ordering.
(4) (This might require some thought.) Specifying a set of vectors S is
equivalent to specifying a linear map RS → V . Specifying an ordered set of
n vectors is equivalent to specifying a linear map Rn → S. In both cases,
the span is the image, linear independence is equivalent to the map being
injective, and the set is a basis if the map is an isomorphism.
(5) Particularly annoying for the pedantic is the fact that the normal way
of naming elements in a set with n elements is by listing them: x1 , x2 , . . . , xn .
So the set gets denoted {x1 , x2 , . . . , xn }. The list privileges the order (x1 , x2 , . . . ).
How does one obtain other orderings? Each permutation σ of {1, 2, . . . , n}
specifies an order: (xσ(1) , xσ(2) , . . . , xσ(n) ).
(6) In practical situations, especially in numerical computations, sets are
usually presented as lists, and come with a natural order. Algorithms that
work with sets presented as lists will result in outputs that depend on the
order.
18 T.R. RAMADAS
6. Matrices
In part (3) of the Exercise below, we define the notion of the transpose of
a matrix. The transpose of a row vector is a “column vector”. There are a
obvious isomorphisms Rn → M1×n → Mn×1 , given by
⎡ v1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
(v1 , . . . , vn ) z→ [v1 . . . . . . vn ] z→ ⎢ ⎥
⎢ . ⎥
∈ Rn ∈ M1×n transpose
⎢ ⎥
⎢ vn ⎥
⎣ ⎦
∈ Mn×1
(Since only one index changes, we can drop the other when we write row/column
vectors.) The natural number n is the length of the row/column vector.
The following problems are mostly very routine, but introduce very impor-
tant concepts and definitions.
Exercises:
(1) Check that Mm×n is indeed a vector space.
(2) Let eij denote the matrix with all entries zero except the ij th , which is
1. For example, if m = 2, n = 2, there are four such matrices:
1 0 0 1 0 0 0 0
e11 = [ ] , e12 = [ ] , e21 = [ ] , e22 = [ ]
0 0 0 0 1 0 0 1
Artin calls these “matrix units”; the terminology is not standard. Show
that the set of matrix units forms a basis for Mm×n , which therefore has
dimension mn.
(3) Given a m × n matrix
⎡ a ⎤
⎢ 11 . . a1n ⎥
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ am1 . . amn ⎥
⎣ ⎦
its transpose, denoted A , is the n × m matrix defined as follows:
tr
⎡ a ⎤
⎢ 11 . . am1 ⎥
⎢ ⎥
A =⎢ . . .
tr
. ⎥
⎢ ⎥
⎢ a1n . . amn ⎥
⎣ ⎦
th tr th
In other words the ji entry of A is the ij entry of A.
Check that A ↦ Atr is an isomorphism Mm×n → Mn×m .
Note that the number of columns of the “first” matrix B is the number of
rows of the “second” matrix A, both being equal to m. Then the matrix
product (or simply “product”) BA is defined to be the l × n matrix C,
⎡ c ⎤
⎢ 11 . . c1n ⎥
⎢ ⎥
C = BA = ⎢ . . . . ⎥
⎢ ⎥
⎢ cl1 . . cln ⎥
⎣ ⎦
with entries given by:
m
cij = ∑ bik akj , i = 1, . . . , l, j = 1, . . . , n .
k=1
In other words,
⎡ a11 . . a1n ⎤
⎢ ⎥
⎡ b ⎤⎢ . ⎥ ⎡ ∑m b a m ⎤
⎢ 11 . . . b1m ⎥⎢ . . . ⎥ ⎢ k=1 1k k1 . . ∑k=1 b1k akn ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . ⎥⎢ . . . . ⎥=⎢ . . . . ⎥
⎢ ⎥⎢ ⎥ ⎢ m ⎥
⎢ bl1 . . . blm ⎥⎢ . . . . ⎥ ⎢ ∑k=1 blk ak1 . . ∑m
k=1 lk akn
b ⎥
⎣ ⎦⎢ ⎥ ⎣ ⎦
⎢ am1 . . amn ⎥
⎣ ⎦
Note that these scalars are determined by  and in turn determine Â. Con-
sider the column vector (of length n)
⎡ ⎤
⎢ v1 ⎥
⎢ ⎥
⎢ . ⎥
v⃗ = ⎢ ⎥ .
⎢ . ⎥
⎢ ⎥
⎢ vn ⎥
⎣ ⎦
LINEAR ALGEBRA 21
Remark 6.1. These are equalities of real numbers, as opposed to the equa-
tions
m
ej ) = ∑ aij e⃗′i j = 1 . . . , n
Â(⃗
i=1
which are equalities between vectors, and serve to define (see below) the
matrix A associated to the linear map Â. Note that in the first equation the
summation is over j (the column index of A) and in the second over i (the
row index of A).
The definition of the determinant when n > 2 is subtle, as we will see later.
26 T.R. RAMADAS
7. Row reduction
(2) Multiply the first equation by a21 /a11 and subtract the resulting equa-
tion from the second. On the “matrix side” this is implemented by left-
multiplying (i.e., multiplying from the left) both sides in this order:
1 0 a a v v̂
[ ] ↷ {[ 11 12 ] [ 1 ] = [ 1 ]}
− aa11
21
1 a21 a22 v2 v̂2
a11 a12 v 1 0 v̂
{[ ][ 1 ] = [ ] [ 1 ]}
a21 a22 v2 0 1 v̂2
We have introduced the identity matrix on the right to make a point.
Namely, the two column vectors are spectators in the computations, which
28 T.R. RAMADAS
are all being done with 2×2 matrices. Carrying out the matrix computations
on both sides, we get
1 0 v 1 a −a12 v̂
[ ][ 1 ] = [ 22 ][ 1 ]
0 1 v2 det A −a21 a11 v̂2
v = v⃗′
A⃗
A= ∑ aij eij
i=1,...,m j=1,...,n
a11 a12
(For example, if m = n = 2 and A = [ ] we have
a21 a22
a11 e11 + a12 e12 + a21 e21 + a22 e22 = a11 [ ] + a12 [ ] + a21 [ ] + a22 [ ] = A .)
1 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1
We next introduce certain simple invertible square matrices, which will play
an important part in the subsequent discussion. These elementary matrices
come in three types, and can be characterised by their shapes as well as their
distinct actions on column/row vectors. (We deal with m×m matrices, with
m fixed, and we will consider only actions on column vectors.)
⎡ 1 0 . . . 0 0 ⎤ ⎡ 1 0 . . . 0 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥ ⎢ 0 1 . . . 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ . ⎥
⎢ . . . . . . . ⎥ ⎢ . . . a . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥ or ⎢ . . . . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ . . a . . . . ⎥ ⎢ . . . . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥ ⎢ 0 0 . . . 1 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 ⎥
⎣ 0 0 . . . .0 1 ⎦ ⎣ 0 . . . .0 1 ⎦
⎡ 1 0 . . . 0 0 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ vj ⎥ ⎢ vj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . a . . . . ⎥⎢ vi ⎥ ⎢ vi + avj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v ⎥
⎣ 0 0 . . . .0 1 ⎦⎣ vm ⎦ ⎣ m ⎦
and
⎡ 1 0 . . . 0 0 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v + av ⎥
⎢ . . . . a . . ⎥⎢ vi ⎥ ⎢ i j ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ vj ⎥ ⎢ vj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v ⎥
⎣ 0 0 . . . .0 1 ⎦⎣ vm ⎦ ⎣ m ⎦
(2) Elementary matrices of the second kind: Im + eij + eji − eii − ejj , where
i ≠ j. Such a matrix has the shape:
⎡ 1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 1 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 0 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥⎦
⎣
(3) Elementary matrices of the third kind: Im + (c − 1) eii , for i and a scalar
c ≠ 0. Such a matrix has the shape:
⎡ 1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎢ c ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎣ 1 ⎦
● all the entries in the rows below the ith row are zero, or
● there is a pivot in the (i + 1)th row to the right of the pivot in
the ith column.
In the above example, the pivots are in red, as are the entries above the
pivots. The ∗ entries can be arbitrary real numbers. Another way of visual-
ising a matrix in row-echelon form: start with a m′ ×n matrix (with m′ ≤ m)
of the shape:
⎡ 0 0 1 ∗ 0 0 ∗ ∗ 0 0 . . . ∗ ⎤
⎢ ⎥
⎢ ∗ ∗ ∗ ⎥
⎢ 0 0 0 0 1 0 0 0 . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 1 ∗ ∗ 0 0 . . . ∗ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 1 0 . . . ∗ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 1 . . . ∗ ⎥
⎣ ⎦
and add m − m′ rows at the bottom with all entries zero.
Exercise: If m = n, and A is invertible and in row-echelon form, show that
A = In .
32 T.R. RAMADAS
7.4. Row reduction to row-echelon form. The main result is the fol-
lowing:
Theorem 7.2. Any m × n matrix A can be brought into a matrix A′ in row-
echelon form by a sequence of elementary row operations. In other words,
there exists a sequence (not unique) of elementary matrices Ek , . . . , E1 such
that
A′ ≡ Ek Ek−1 . . . E1 A
is in row-echelon form.
For the proof see Artin. As Artin says, the row-echelon form A′ is uniquely
determined by A, but he does not prove this, and we will not use the fact.
Corollary 7.3. A n × n matrix is invertible iff it is a product of elementary
matrices.
The kernel of  can also be described explicitly, but this is subtler. We leave
the proof as an exercise.
LINEAR ALGEBRA 33
is a solution.
36 T.R. RAMADAS
8. Determinants
det [λ] = λ
(The matrix A−ij , and others like it, are referred to as submatrices of A.)
(A better way of remembering the signs than what was done in the previous
draft of these notes is to note – as Artin does – that the signs in the sum
alternate.) Our definition is “by expansion by minors of the first column”.
We do not define the closely related notions of a minor/cofactor, because
we have no occasion to use them.
We will find it notationally and conceptually convenient to view a n × n
matrix as one column of n row vectors, each of length n.
⎡ ⃗1 ⎤
⎢ R ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⃗n ⎥
⎣ R ⎦
where
⃗ p = [ap1 . . apn ] p = 1, . . . , n
R
Theorem 8.2. The determinant function det ∶ Mn×n → R satisfies the
following properties:
(1) det In = 1.
(2) det is separately linear in the rows. That is, for any 1 ≤ j ≤ n,
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎢ R ⎥ ⎢ R ⎥ ⎢ R ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⃗ ′ ⃗′ ⎥
det ⎢ λ Rj + λ Rj ⎥ = λdet ⎢ ⃗ ⎥ + λ det ⎢⎢
′ ⃗
Rj′ ⎥
⎢ Rj ⎥ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⃗ ⎥ ⎢ ⃗ ⎥ ⎢ ⃗ ⎥
⎢ R ⎥ ⎣ Rn ⎦ ⎢ Rn ⎥
⎣ n ⎦ ⎣ ⎦
(3) if n > 1 and two adjacent rows are equal, then the determinant is zero.
That is,
⎡ R ⎤
⎢ ⃗1 ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⃗ ⎥
⎢ R ⎥
det ⎢⎢ ⃗ ⎥⎥ = 0
⎢ R ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⃗ ⎥
⎢ R
⎣ n ⎦
38 T.R. RAMADAS
Proof. This is clear for n = 1, 2. (We could start the induction with n = 1,
but we have to face the headache of interpreting (3) for 1 × 1 matrices.) We
proceed by induction. For example, consider the part (3) of the Theorem.
If two adjacent rows are equal R ⃗j = R
⃗ j+1 = R,
⃗ and the theorem has been
verified for matrices of size up to (n − 1) × (n − 1), the determinant equals
a11 det A−11 − a21 det A−21
+ ⋅ ⋅ ⋅ ± {aj1 det A−j1 − a(j+1)1 det A−(j+1)1 } ± . . .
± det A−n1
Now the submatrices A−1k are of size (n − 1) × (n − 1) and have two equal
adjacent rows as long as k ≠ j or k ≠ j + 1. By induction they vanish. This
leaves us with two terms
±{aj1 det A−j1 − a(j+1)1 det A−(j+1)1 } = ±a{det A−j1 − det A−(j+1)1 }
where a ≡ aj1 = a(j+1)1 . You can check that the matrices A−j1 and A−(j+1)1
are equal. This finishes the inductive step. We leave the proofs of parts (1)
and (2) of the Theorem as exercises for the reader.
We prove next:
Proposition 8.3. If any function d ∶ Mn×n → R satisfies the properties
(1)-(3) of the theorem, it also satisfies the following additional properties:
(1) If a multiple of one row is added to an adjacent row, the value of d is
unchanged.
(2) If two adjacent rows are interchanged d changes sign.
(3) If any two rows are equal, d vanishes.
(4) If a multiple of one row is added to another row, the value of d is
unchanged.
(5) If two rows are interchanged d changes sign.
(6) If any row is zero13, d vanishes.
(7) On elementary n×n matrices14 E, the function d takes values as follows:
(a) d(E) = 1 if E is of the first kind.
(b) d(E) = −1 if E of the second kind.
(c) d(E) = c if E is of the third kind: E = In + (c − 1)eii .
(2) Again, the equality below holds by part (2) of the above Theorem and
the second term on the right vanishes by part (3):
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R
. ⎥⎞ ⎛⎢
⎥ ⎢
R
. ⎥⎞
⎥ ⎛⎢⎢ R
. ⎥⎞ ⎛⎢
⎥ ⎢
R
. ⎥⎞
⎥ ⎛⎢⎢ R
. ⎥⎞
⎥
⎜⎢ ⃗j ⎥⎟ ⎜⎢ ⃗ j+1 ⎥⎟ ⎜⎢ ⃗j + R ⃗ j+1 ⎥⎟ ⎜⎢ ⃗ j+1 + Rj ⎥⎟ ⎜⎢ ⃗ j+1 + Rj ⎥⎟
d ⎜⎢ R
⃗ j+1 ⎥⎟+d ⎜⎢ R
⃗j ⎥⎟ = d ⎜⎢ R
⃗ j+1 ⎥⎟+d ⎜⎢ R
⃗j ⎥⎟ = d ⎜⎢ R
⃗ j+1 + R
⃗j ⎥⎟ = 0
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎝⎢⎢ .
⃗n
⎥⎠ ⎝⎢
⎥ ⎢ .
⃗n
⎥⎠
⎥ ⎝⎢⎢ .
⃗n
⎥⎠ ⎝⎢
⎥ ⎢ .
⃗n
⎥⎠
⎥ ⎝⎢⎢ .
⃗n
⎥⎠
⎥
⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦
(3) We can move the identical rows together by interchanging one of them
repeatedly with an adjacent one, each time changing the sign of d. Here is
the first step:
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞
⎥
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ R⃗ ⎥⎟ ← j th place → ⎜⎢ ⃗
Rj+1 ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ j+1 ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
d ⎜⎢ ⎥⎟ − d ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ = ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠
⎥
⎣ ⎦ ⎣ ⎦
Eventually we have identical adjacent rows and part (2) of the Theorem
applies.
(4) The equality below holds by part (2) of the above Theorem and the
second term on the right vanishes by part (3) of the Proposition, already
40 T.R. RAMADAS
proved:
⎛⎡⎢ ⃗1
R ⎤⎞ ⃗ 1 ⎤⎞
⎛⎡⎢ R ⎛⎡⎢ ⃗1
R ⎤⎞ ⎛⎡⎢ ⃗1
R ⎤⎞
⎥ ⎥ ⎥ ⎥
⎜⎢⎢ ⎥ ⎢ ⎥ ⎜⎢⎢ ⎥⎟ ⎜⎢⎢ ⎥⎟
⎜⎢ . ⎥⎟⎟
⎜⎢ . ⎥⎟
⎜⎢ ⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ R ⎥ ⎟ ⎜ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ j + λ R ⃗ k ⎥⎟
⎥⎟ ⎜ ⎢
⃗ j ⎥⎟
⎢ R
⎥⎟ ⎜⎢ ⃗k
R ⎥⎟
⎥⎟ ⎜⎢ ⃗j
R ⎥⎟
⎥⎟
⎜⎢ ⎥⎟ ⎜ ⎜⎢ ⎜⎢
⎜⎢ . ⎥ ⎜⎢⎢ . ⎥⎥⎟ ⎜⎢ . ⎥⎟
⎥⎟ = d ⎜ . ⎥⎟
d ⎜⎢ ⎥⎟⎟ = ⎜ ⎟ + ⎜⎢ ⎜⎢⎢ ⎥⎟
⎜⎢⎢ . ⎥⎥⎟ ⎥⎟ ⎥⎟
d λd
⎜⎢ . ⎥ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟⎟ ⎜ ⎢ ⃗ ⎥⎟ ⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ ⎥ ⎜ ⎢ ⎥ ⎜⎢ ⃗k ⎥⎟ ⎜⎢ ⃗k ⎥⎟
⎜⎢ R k ⎥⎟⎟ ⎜ ⎢ R k ⎟
⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎜⎢ ⎥⎟ ⎜ ⎜⎢ ⎜⎢
⎜⎢ . ⎥ ⎜⎢⎢ . ⎥⎥⎟ ⎜⎢ . ⎥⎟
⎥ ⎜⎢ . ⎥⎟
⎥
⎝⎢⎣ R⃗n ⎥⎠
⎦ ⎝⎢⎣ R
⃗ n ⎥⎦⎠ ⎝⎢⎣ ⃗n
R ⎥⎠
⎦ ⎝⎢⎣ ⃗n
R ⎥⎠
⎦
(5) Use part (2) of the proposition. Suppose row j and row k are inter-
changed, with k > j. Move the vector in the k th row up successively past
each row till it is just below the j th row, and then move the vector in the
j th row down to the k th place. Check that an odd number of exchanges are
involved. Then use part (2) of the Proposition.
(6) We have
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞
⎥
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⃗ ⃗ ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ 0⃗ ⎥⎟ ⎜⎢ 0 + Rj ⎥⎟ ⎜⎢ Rj ⎥⎟
d⎜
⎜⎢
⎢
⃗j
⎥⎟ = d ⎜⎢
⎥⎟ ⎜⎢ R
⎥⎟ = d ⎜⎢
⎥⎟ ⎜⎢
⎥⎟ = 0
⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ ⃗ j ⎥⎟ ⎜⎢ ⃗j
R ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ R⃗n
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠
⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
(9) This follows from part (6) of the Proposition and the fact (already stated
in an Exercise) that if a n×n matrix A is invertible and in row-echelon form,
then A = In .
Corollary 8.4. The determinant is characterised by the properties (1)-(3)
listed in Theorem 8.2.
The same reasoning, together with part (9) of the Proposition, gives:
Corollary 8.5. A square matrix A is invertible iff det A ≠ 0.
Proof. In contrast to the proof in Artin, ours will mix “abstract” and “matrix-
theoretic arguments”. We consider two cases:
(1) Either det A = 0 or det B = 0. Suppose det A = 0. Then A is not
invertible so nor is AB. This forces det AB = 0.
42 T.R. RAMADAS
LINEAR ALGEBRA 43
9.1. Determinant of a linear map. We have seen that the space L(V, V )
of linear maps T ∶ V → V is a vector space. It is also a ring with identity,
with multiplication being given by composition, and the identity map IdV
being the identity of the ring. In other words, S ○ IdV = Idv ○ S = S for any
S ∈ L(V, V ). Multiplication is associative since S ○ (T ○ U ) = (S ○ T ) ○ U but
is not commutative unless dim V = 1.
If all this sounds familiar because we have seen another version in matrix
language. Namely, the space of n × n matrices Mn×n is a ring with iden-
tity In , with matrix multiplication defining the product. (In this section,
we consider matrices with entries from k so that Mn×n is short-hand for
Mn×n (k).) The two rings are isomorphic, but not “canonically”, because
the isomorphism depends on a choice. We will now make this explicit be-
cause it is important to be clear about this, and for the immediate purpose
of defining the determinant of any  ∈ L(V, V ).
The choice we have to make is that of an ordered basis B = (⃗ e1 , . . . , e⃗n ) for
V . Once we do this, we can associate to each  ∈ L(V, V ) a matrix A as we
15In other words, a non-constant polynomial function (the term will be defined later)
p ∶ C → C necessarily has a root, i.e., must vanish somewhere.
16This is not by definition, but a fact to be proved.
44 T.R. RAMADAS
These scalars are determined by  and in turn determine Â, just as any
linear map from column vectors to column vectors is given by a unique
matrix.
Let us make the connection between A and  precise as follows.
The basis B defines an isomorphism φB ∶ Mn×1 → V :
⎡ ⎤
⎛⎢⎢ v1 ⎥⎥⎞
⎜⎢ 2 ⎥⎟ n
v
⎜⎢ ⎥⎟
φB ⎜⎢ . ⎥⎟ = ∑ vi e⃗i
⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ i=1
⎢
⎝⎢ vn ⎥⎥⎠
⎣ ⎦
Note that φB is an isomorphism of vector spaces, and not the isomorphism
of algebras L(V, V ) → Mn×n that we seek. We have
⎡ ⎤ ⎡ ⎤
⎛ ⎢⎢ v1 ⎥⎞
⎥ ⎛⎢⎢ ∑j a1j vj ⎥⎥⎞
⎜ ⎢ v2 ⎥⎟ ⎜⎢ ∑j a2j vj ⎥⎟
⎜ ⎢ ⎥⎟ ⎜⎢ ⎥⎟
φB ⎜A ⎢ . ⎥⎟ = φB ⎜⎢ . ⎥⎟ = ∑ ∑ aij vj e⃗i = ∑ vj ∑ aij e⃗i
⎜ ⎢ ⎥⎟ ⎜⎢ ⎥⎟ i j
⎜ ⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ j i
⎝ ⎢⎢ vn
⎥⎠
⎥
⎢
⎝⎢ ∑ anj vj ⎥⎥⎠
⎣ ⎦ ⎣ j ⎦
⎡ ⎤
⎛⎢⎢ v1 ⎥⎥⎞
⎜⎢ v2 ⎥⎟
⎜⎢ ⎥⎟
ej ) = Â(∑ vj e⃗j ) = Â(φB ⎜⎢ . ⎥⎟)
= ∑ vj Â(⃗
⎜ ⎢ ⎥⎟
j j ⎜⎢ . ⎥⎟
⎢
⎝⎢ vn ⎥⎥⎠
⎣ ⎦
If we let v denote a column vector, we can write this in more compact form:
φB (Av) = Â(φB (v))
In other words, A is the unique matrix such that
Av = φ−1
B ○ Â ○ φB (v), v ∈ Mn×1
= φ−1
B (Â(φB ○ φB ○ φB′ (v))) = AφB ○ φB′ (v)
−1 −1
defined TB,B′ to be the matrix such that for any v⃗ ∈ Mn×1 , we have
v ) = TB,B′ v⃗
B ○ φB′ (⃗
φ−1
Equivalently,
φB′ (⃗v ) = φB (TB,B′ v⃗)
If we take v⃗ = v⃗j , the column vector with 1 in the j th place, this gives:
⎡ ⎤
⎛⎢⎢ t1j ⎥⎥⎞
⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟
vj ) = φB ⎜⎢ tij ⎥⎟ = ∑ tij φB (⃗
e⃗′j = φB′ (⃗ vi ) = ∑ tij e⃗i
⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ i i
⎝⎢⎢ tnj ⎥⎥⎠
⎣ ⎦
th
where tij is the ij entry of TB,B′ . Thus TB,B′ is the matrix that implements
the “change of basis from B to B ′ ”.
46 T.R. RAMADAS
17This is because of examples such as P (x) = xq −x, where q is a prime power. If the field
is the finite field Fq , the corresponding function P ∶ Fq → Fq vanishes identically. For the
same reason when we are dealing with finite fields we cannot use the terms “nonconstant
polynomial” and ’“polynomial with degree ≥ 1” interchangeably.
LINEAR ALGEBRA 47
Clearly
— if A = [a] is a 1 × 1 matrix we set pA (t) = t − a. Clearly pA has degree 1,
and one root a.
— If A is a 2 × 2 matrix,
a11 a12
A=[ ]
a21 a22
then pA (t) = t2 −(a11 +a22 )t+a11 a22 −a21 a22 . This has degree 2. If the field of
scalars is the reals this is a polynomial with real coefficients and may or may
48 T.R. RAMADAS
not have a (real) root. If the field of scalars is complex pA (t) = (t−λ1 )(t−λ2 )
where λi are the roots. Note that pA can be written
pA (t) = t2 − tr(A)t + det A
where tr(A) is the trace of A, defined to be sum of the diagonal entries:
tr(A) = a11 + a22 .
— if n > 2, first prove (by induction on n) that given any n × n matrix B
(with (ij)th entry bij ), we have the “complete expansion of the determinant”
det B = ∑ sign(σ) × bσ(1)1 × ⋅ ⋅ ⋅ × bσ(n)n
σ
where the sum runs over all permutations σ of the set {1, . . . , n} and sign(σ) =
±1 is the signature of σ. If you are not familiar with permutations and the
notion of a signature, note that permutation of a set (by definition) is a
bijection of the set with itself, and prove the statement:
det B = ∑ ±bσ(1)1 × ⋅ ⋅ ⋅ × bσ(n)n .
σ
The proof is immediate. We have p (λ0 ) = det (λ0 IV − Â) = 0 iff the map
λ0 IV − Â ∶ V → V is not invertible. This holds18 iff ker(λ0 IV − Â) ≠ {0V }.
This in turn means precisely that there exists a nonzero vector v0 ∈ V such
that Âv0 = λ0 v0 .
This brings us to the important
(1) The definition of Vλ0 makes sense for any λ0 . The point is that when λ0
is an eigenvalue the subspace is positive-dimensional (contains nonzero vec-
tors) and in that case, any nonzero vector belonging to Vλ0 is an eigenvector
corresponding to λ0 .
(2) Note that Vλ0 consists of eigenvectors together with the zero vector 0V .
The set of eigenvectors by itself is not a subspace.
(3) It is the eigenvector that is required to be nonzero. The eigenvalue
might well be zero.
18We are using the fact that a linear map from a finite-dimensional vector space to
itself is bijective iff it is injective.
50 T.R. RAMADAS
Here is an immediate
Corollary 9.6. Let  ∶ V → V be a linear map. If dim V = n and v⃗1 , . . . , v⃗n
are eigenvectors corresponding to pairwise distinct eigenvalues λ1 . . . , λn ,
v1 , . . . , v⃗n ) is an ordered basis for V consisting of eigenvectors. The
then (⃗
matrix A with respect to this ordered basis is
⎡ λ1 0 0 0 ⎤⎥
⎢
⎢ 0 λ 0 ⎥⎥
⎢ 2 .
⎢ ⎥
A=⎢ 0 . . 0 ⎥
⎢ ⎥
⎢ 0 . λn−1 0 ⎥⎥
⎢
⎢ 0 . 0 λn ⎥⎦
⎣
Note that in the above Corollary, the subspace V ′ is invariant under the
linear map Â. That is, Â(⃗ v ′ ) ∈ V ′ if v⃗′ ∈ V ′ . We will see below that even if
the set {λ1 . . . , λl } contains all eigenvalues of Â, it can happen that V ′ is a
proper subspace of V .
Definition 9.8. We say that a linear map  = V → V is diagonalisable if
V = ⊕ri=1 Vλi
where {λ1 . . . , λr } is the set of all eigenvalues of Â.
10. Examples
⎡ λi . . 0 ⎤
⎢ ⎥
⎢ ⎥
⎢ . λi . . ⎥
⎢ ⎥
Ai = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥
⎢ ⎥
⎢ 0 . . λi ⎥
⎣ ⎦
⎡ A1 . . 0 ⎤⎥
⎢
⎢ . . ⎥⎥
⎢ . A2
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥⎥
⎢
⎢ 0 . . Ar ⎥⎦
⎣
It is easy to check that pA (t) = ∏i (t − λi )mi , and that the standard basis of
the space of column vectors is a set of eigenvectors. Specifically,
⎡ A1 . . 0 ⎤⎥ ⎡⎢ 0 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ . . ⎥⎥ ⎢⎢ ⎥ ⎢ ⎥
⎢ . A2 . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ↰ ⎢ ⎥
⎢ . . . . ⎥⎢ 1 ⎥ = λi(j) ⎢ 1 ⎥
⎢ ⎥⎢ ⎥ th ⎢ ⎥
⎢ . . . . ⎥⎥ ⎢⎢ . ⎥ j place ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 . . Ar ⎥⎦ ⎢⎣ 0 ⎥ ⎢ 0 ⎥
⎣ ⎦ ⎣ ⎦
LINEAR ALGEBRA 53
i(j)−1 i(j)
provided ∑i=1 mi < j ≤ ∑i=1 mi . You can check that
⎧ ⎡ 0 ⎤ ⎫
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎪
⎪ ⎢ . ⎥ ⎪
⎪
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎪
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎪
⎪
⎪ ⎢ ∗ ⎥ ⎪
⎪
⎪
⎪ ⎢ ⎥ R
R ⎪
⎪
⎪⎢⎢ ∗ ⎥⎥ RRR
i(j)−1 i(j)m i
⎪
Vλi = ⎨⎢ ⎥ RRRthe nonzero entries ∗ are in the range ∑ mi < − − − ≤ ∑ ⎬
⎪
⎪
⎪ ⎢ ∗ ⎥ RRR i=1 ⎪ ⎪
⎪
⎪
⎪ ⎢ . ⎥ i=1
⎪
⎪
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎪
⎪
⎪ ⎢ . ⎥ ⎪
⎪
⎪
⎪
⎪ ⎢ ⎥ ⎪
⎪
⎢
⎩⎣ ⎦0 ⎥ ⎭
This shows that the hypothesis of distinct eigenvalues in the statement of
Corollary 9.6 is sufficient but not necessary.
What if dim V = n > 1 and  ∶ V → V is such that p (t) = (t − λ)n ?
(3) Let us consider the simplest case n = 2. Since det (tIV − Â) = 0 when
t = λ, the map λIV − Â ∶ V → V has a nontrivial kernel. We have two
possibilities:
– (a) dim ker(λIV − Â) = 2 in which case  = λIV , and
– (b) dim ker(λIV − Â) = 1 which case we analyse in detail now. There is
a nonzero vector v⃗1 (unique up to multiplication by a nonzero scalar) such
v1 = λ⃗
that Â⃗ v1 , v⃗2 } of V . We have
v1 . Complete to a basis {⃗
v2 ) = a⃗
Â(⃗ v2 + b⃗
v1
v1 , v⃗2 ) is
for some scalars a, b. the matrix of  w.r.to the ordered basis (⃗
λ b
A=[ ]
0 a
The characteristic polynomial is
t−λ 0
p (t) = pA (t) = det ([ ]) = (t − λ)(t − a)
def b t−a
(4) Exercise: What happens if n = 3? Prove that exactly one of the following
holds:
(a) dim ker(λIV − Â) = 3. W.r.to v1 , v⃗2 , v⃗3 } the matrix of  is
any basis {⃗
⎡ λ 0 0 ⎤⎥
⎢
⎢ ⎥
A=⎢ 0 λ 0 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥⎦
⎣
and λIV − Â = 0.
(b) dim ker(λIV − Â) = 2. There exists a v1 , v⃗2 , v⃗3 } w.r.to which the
basis {⃗
matrix of  is
⎡ λ 1 ⎤
⎢ 0 ⎥
⎢ ⎥
A=⎢ 0 λ 0 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥
⎣ ⎦
and (λIV − Â)2 = 0, but λIV − Â ≠ 0.
(c) dim ker(λIV − Â) = 1. There exists a v1 , v⃗2 , v⃗3 } w.r.to which the
basis {⃗
matrix of  is
⎡ λ 1 ⎤
⎢ 0 ⎥
⎢ ⎥
A=⎢ 0 λ 1 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥
⎣ ⎦
and (λIV − Â)3 = 0, but (λIV − Â)2 ≠ 0.
(5) For 0 ≤ θ < 2π let
cos θ − sin θ
A=[ ]
sin θ cos θ
The corresponding linear map  ∶ R2 → R2 “rotates a vector by angle θ in
the anti-clockwise direction”. From this description it is clear that  has
no eigenvalues and eigenvectors unless θ = 0 or θ = π, i.e., A = ±I2 .. This is
consistent with the fact that the corresponding characteristic polynomial
p (t) = t2 − 2 cos θ + 1
has no (real) roots since 4 cos2 θ − 4 < 0 unless cos θ = ±1.
LINEAR ALGEBRA 55
(6) Let A be as above and consider this time the corresponding map  ∶
C2 → C2 . This has the same characteristic polynomial, but if θ ≠ 0, π we
can still consider the complex roots cos θ ± i sin θ = e±iθ . Since these are
distinct corresponding eigenvectors will be a basis for C2 . Let us find the
v ) = eiθ v⃗, we get
eigenvectors in turn. Writing out Â(⃗
cos θ v1 − sin θ v2 v
[ ] = eiθ [ 1 ]
sin θ v1 + cos θ v2 v2
Equating components we get
cos θ v1 − sin θ v2 = eiθ v1
sin θ v1 + cos θ v2 = eiθ v2
both of which can be solved by
1
v⃗ = [ ]
−i
We can similarly find an eigenvector with eigenvalue e−iθ . In conclusion, we
have
1 1
A[ ] = e±iθ [ ]
∓i ∓i
V ⊋ N̂ (V ) ⊋ N̂ 2 (V ) ⋅ ⋅ ⋅ ⊋ N̂ k0 −1 (V ) ⊋ {0V } = N̂ k0 (V ) = N̂ k0 +1 (V ) = . . .
N̂ k0 −1 (V ) ⊊ V0
N̂ (⃗
ei ) = 0, i = 1, . . . , dk0 −1
ei ) ∈< e⃗1 , . . . , e⃗dk0 −1 >, i = dk0 −1 + 1, . . . , dk0 −2
N̂ (⃗
etc.
In other words, N is “strictly upper triangular”. i.e., has zeros along the
diagonal and below the diagonal. Then it is an easy matter to check that
pN̂ (t) = pN (t) = tn
def
where where λ1 , . . . , λi are the distinct eigenvalues of  and Ṽλi is the gen-
eralised eigenspace corresponding to λi . (In other words, every v⃗ ∈ V can be
written uniquely as a sum ∑ri=1 v⃗i where v⃗i ∈ Ṽλi .)
where  =  ○ Â, etc. Note that in general P ( + B̂) ≠ P (Â) + P (B̂) and
2
P (Â)P (B̂) = P (Â) ○ P (B̂ ≠ P (Â ○ B̂). But if P = QR, it is true that
def
P (A) = Q(A)R(A) = R(A)Q(A).
We can now prove
Theorem 11.4. (Cayley-Hamilton Theorem) A linear map  ∶ V → V “sat-
isfies its own characteristic polynomial”. That is,
p (Â) = 0
where (as usual) B̂ 2 = B̂ ○ B̂, etc. Prove that both sequences stabilise even-
tually. That is, there exists i0 such that for i ≥ i0 , we have
kernel(B̂) ⊂ kernel(B̂ 2 ) ⊂ . . . ⊂ kernel(B̂ i ) = kernel(B̂ i+1 ) = . . . = K
def
which is
(1) bilinear,
⟨a1 v⃗1 + a2 v⃗2 , w⟩
⃗ = a1 ⟨⃗ ⃗ + a2 ⟨⃗
v1 , w⟩ ⃗ and
v2 , w⟩,
⟨⃗ ⃗1 + b2 w
v , b1 w ⃗2 ⟩ = b1 ⟨⃗ ⃗1 ⟩ + b2 ⟨⃗
v, w ⃗2 ⟩,
v, w
which yields the desired inequality. It is also clear that in case of equality,
there is one root t0 at which we have v⃗ + t0 w
⃗ = 0.
Exercise: Given a norm ∣∣.∣∣ on a vector space V , define a function f ∶ V ×V →
R by
1
f (⃗ ⃗ = {∣∣⃗
v , w) ⃗ 2 − ∣∣⃗
v + w∣∣ ⃗ 2}
v ∣∣2 − ∣∣w∣∣
2
Check that this is an inner product iff the parallelogram law holds:
∣∣⃗ ⃗ 2 + ∣∣⃗
v + w∣∣ ⃗ 2 = 2(∣∣⃗
v − w∣∣ ⃗ 2)
v ∣∣2 + ∣∣w∣∣
Proof. We will construct an orthonormal basis starting from any given basis
v1 , . . . , v⃗n ) by the Gram-Schmidt othogonalisation process.
(⃗
v1 , . . . , v⃗i }. We have
For i = 1, . . . , n, let Vi ⊂ V be the span of {⃗
{0V } ⊊ V1 ⊊ ⋅ ⋅ ⋅ ⊊ Vi−1 ⊊ Vi ⋅ ⋅ ⋅ ⊊ Vn ≡ V
Set
1 ′
⃗′1 = v⃗1 ;
u ⃗1 =
⃗
u
u
u′1 ∣∣ 1
∣∣⃗
This is an orthonormal basis for the one-dimensional subspace V1 . Suppose
by induction, that we have constructed an orthonormal basis (⃗ ⃗i−1 )
u1 , . . . , u
for Vi−1 . I claim that
(1) v⃗i − ∑i−1
j=1 ⟨⃗ ⃗j ⟩⃗
vi , u uj is a nonzero vector, and
(2) ⟨⃗
vi − ∑i−1
j=1 ⟨⃗ ⃗j ⟩⃗
vi , u ⃗k ⟩ = 0, k = 1, . . . , i − 1.
uj , u
The first claim is clear since v⃗i is not in the span of {⃗ ⃗i−1 }. The
u1 , . . . , u
second is a simple computation.
Now set
i−1
1 ′
⃗′i ≡ v⃗i − ∑ ⟨⃗
u ⃗j ⟩⃗
vi , u ⃗i =
uj ; u ⃗
u
j=1 u′i ∣∣ i
∣∣⃗
to continue the inductive construction.
12.6. All inner products on Rn . What are all possible inner products on
Rn ?
To answer this, it is convenient to identify Rn with Mn×1 . Let e⃗1 , . . . , e⃗n be
the standard basis. Given any inner product ⟨, .⟩, let P denote the matrix
with entries
ei , e⃗j ⟩
Pij = ⟨⃗
21see the exercise immediately below.
LINEAR ALGEBRA 67
Clearly,
⟨⃗ ⃗ = v⃗tr P w
v , w⟩ ⃗
Conversely, given any symmetric matrix P the map
(⃗ ⃗ ↦ v⃗tr P w
v , w) ⃗
is bilinear and symmetric. For this to define an inner product, we require
that
(4) v⃗tr P v⃗ ≥ 0 ∀ w
⃗ and v⃗tr P v⃗ = 0 only if v⃗ = 0 .
Then
v⃗tr P w
⃗ = ⟨∑ vi e⃗i , ∑ wj e⃗j ⟩
i j
= ∑ ∑ vi wj ∑ ∑ bli bmj ⟨⃗ ⃗m ⟩
ul , u
i j l m
= ∑ ∑ ∑ vi bli blj wj
i j l
= v⃗ B B w
⃗
tr tr
Proof. We have
v1 ), v⃗2 ⟩
⟨Â(⃗ = v1 , † (⃗
⟨⃗ v2 )⟩
definition of adjoint
= ⟨⃗ v2 )⟩
v1 , Â(⃗
since  is self-adjoint
(4) The left-hand side of the equation in Step (3) can be rewritten:
∑ v i Aij vj = ∑ vj {∑ Aij v i }
i,j j i
Proof. Let V1 = ⊕ri=1 Vλr . By Proposition 13.3 we know that this is an orthog-
onal direct sum. It suffices to show that V1 = V . Note that V1 is invariant
under Â, so by the above Lemma, so is V1⊥ . Now Â∣V1⊥ is self-adjoint. So
by Proposition 13.4 it must have an eigenvector v⃗′ , with a corresponding
eigenvalue λ′ , as long as V1⊥ ≠ {0V }. But λ′ must be one of the λ1 , . . . , ]λr ,
which means v⃗′ ∈ V1 , and this is a contradiction.
where µ1 , . . . , µn are real (but not necessarily distinct.) One can choose an
orthonormal basis (⃗ ⃗n ) such that
u1 , . . . , u
⃗i
ui ) = µi u
Â(⃗
Remark 13.8. It follows that
r n
p (t) = ∏(t − λi )mi = ∏(t − µi )
i=1 i=1
where mi = dim Vλi . It also follows that w.r.to the orthonormal basis
(⃗ ⃗n ) the matrix A of  has the structure:
u1 , . . . , u
⎡ A1 . . 0 ⎤
⎢ ⎥
⎢ ⎥
⎢ . A2 . . ⎥
⎢ ⎥
A = diag(µ1 , . . . , µn ) = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥
⎢ ⎥
⎢ 0 . . Ar ⎥
⎣ ⎦
where
⎡ λi . . 0 ⎤⎥
⎢
⎢ . λ . . ⎥⎥
⎢ i
⎢ ⎥
Ai = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥⎥
⎢
⎢ 0 . . λi ⎥⎦
⎣
provided the basis vectors are numbered suitably grouping vectors together
if they correspond to equal eigenvalues.
Exercise: Check that the following are equivalent, given a linear map Ô ∶
V → V , the following are equivalent:
(1) Ô is invertible, and Ô† = Ô−1 ,
(2) Ô preserves the inner product between vectors, i.e., ⟨Ô(⃗ ⃗ =
v ), Ô(w)⟩
⟨⃗ ⃗ for any pair of vectors v⃗, w.
v ), w⟩ ⃗
(3) Ô preserves lengths of vectors: ∣∣⃗ v )∣∣, for any v⃗.
v ∣∣ = ∣∣Ô(⃗
We will work on Rn , with the standard basis and inner product, and freely
identify a matrix A and the associated linear map Â. Recall that an orthog-
onal matrix O is one such that Otr O = In . By the above Exercise,
O−1 = Otr ⇐⇒ (O⃗ ⃗ = v⃗tr w
v )tr Ow ⃗ ⇐⇒ ∣∣⃗ v )∣∣ ∀ v⃗, w
v ∣∣ = ∣∣O(⃗ ⃗ ∈ Rn
(4) Let us briefly summarise the case when T ∈ SO(3, R). Then either
(a) T = I3 , or
(b) V is the orthogonal direct sum V+1 ⊕ V−1 , with V+1 one-dimensional, or
(c) the characteristic polynomial has roots 1, eiθ , e−iθ , with 0 < θ < π. In
this case, the eigenspace V+1 is one-dimensional, and “T̂1 is rotation in
the plane V1⊥ by an angle θ according to the right-hand corkscrew rule
with respect to u ⃗”, where u
⃗ is an appropriate choice from among the
two eigenvectors of unit length. To make this more precise we will need
to consider orientations, which we choose not to.
LINEAR ALGEBRA 79
which is
(1) linear in the first variable and “antilinear” in the second variable,
that is,
⟨a1 v⃗1 + a2 v⃗2 , w⟩
⃗ = a1 ⟨⃗ ⃗ + a2 ⟨⃗
v1 , w⟩ ⃗ and
v2 , w⟩,
⟨⃗ ⃗1 + b2 w
v , b1 w ⃗2 ⟩ = b1 ⟨⃗ ⃗1 ⟩ + b2 ⟨⃗
v, w ⃗2 ⟩,
v, w
where z denotes the complex conjugate of a complex number z,
(2) “conjugate-symmetric”, i.e., ⟨⃗ ⃗ = ⟨w,
v , w⟩ ⃗ v⃗⟩ (in particular ⟨⃗
v , v⃗⟩ is
real for any v⃗),
v , v⃗⟩ > 0 unless v⃗ = 0V .
(3) positive-definite, i.e., ⟨⃗
This is modelled on the Hermitian inner product of vectors in Cn :
n
v⃗ w
⃗ ≡ ∑ vi w i
i=1
Most of the statements we made about real inner product spaces hold for
Hermitian inner product spaces, and the proofs go through with minor mod-
ifications.
The Cauchy-Schwarz inequality holds:
∣⟨⃗ ⃗ ≤ ∣∣⃗
v , w⟩∣ ⃗ .
v ∣∣ × ∣∣w∣∣
To see this, first note that given v⃗, w,
⃗ we can find a complex number α
of modulus 1 such that ⟨α⃗ ⃗ is real. Now as seen in the last Exercise,
v , w⟩
(⃗ ⃗ ↦ Re⟨⃗
v , w) ⃗ is inner product on VR , So
v , w⟩
∣Re⟨α⃗ ⃗ =≤ ∣∣⃗
v , w⟩∣ ⃗
v ∣∣ × ∣∣w∣∣
But
∣Re⟨α⃗ ⃗ = ∣α⟨⃗
v , w⟩∣ ⃗ = ∣α∣∣⟨⃗
v , w⟩∣ ⃗ = ∣⟨⃗
v , w⟩∣ ⃗
v , w⟩∣
Exercise: The following are are analogues of results we have seen in the real
case.
(1) Let U = [⃗ ⃗n ] be a n×n matrix, where (⃗
u1 , . . . , u ⃗n ) is a sequence of
u1 , . . . , u
column vectors with complex entries. Then (⃗ ⃗n ) is an orthonormal
u1 , . . . , u
basis for Cn (with the standard Hermitian inner product) iff U is unitary.
(2) Given an invertible n×n matrix A with complex entries, we have a unique
decomposition A = QR where Q is unitary and R is upper triangular.
(3) First a definition: a hermitian n × n matrix P is said to be positive-
tr
definite if v⃗ P v⃗ > 0 for all v⃗ ∈ Cn . Check that
tr
⟨⃗
v , w⟩ ⃗ i Pij v⃗j )
⃗ P v⃗ (= ∑ w
⃗ =w
i,j
15.4. Spectral theorem; complex case. We now state the spectral the-
orem.
Theorem 15.2. Let  ∶ V → V be self-adjoint. Then all its eigenvalues are
⃗1 , . . . , u
real. There exists an orthonormal basis u ⃗n of eigenvectors.
In matrix form:
Theorem 15.3. Let A be an n × n hermitian matrix. There exists a unitary
matrix U such that
(U )tr AU = diag(µ1 , . . . , µn )
where the µi are real.
82 T.R. RAMADAS