0% found this document useful (0 votes)
69 views

LinearAlgebra-Ver1 4 PDF

This document contains lecture notes on linear algebra that are intended as a guide for an undergraduate course. The notes are divided into four parts: 1. Fields and vector spaces - introduces the concepts of fields and vector spaces over arbitrary fields. Defines subspaces and discusses direct sums and dual spaces. 2. Linear transformations on finite-dimensional vector spaces - covers determinants, traces, characteristic and minimal polynomials, primary decomposition theorem, and triangular form. 3. Inner product spaces - discusses real and complex inner product spaces, Gram-Schmidt process, Bessel's inequality, Cauchy-Schwarz inequality, and isometries. 4. Adjoints of linear transformations on inner product spaces - introduces

Uploaded by

Iulian Danciu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

LinearAlgebra-Ver1 4 PDF

This document contains lecture notes on linear algebra that are intended as a guide for an undergraduate course. The notes are divided into four parts: 1. Fields and vector spaces - introduces the concepts of fields and vector spaces over arbitrary fields. Defines subspaces and discusses direct sums and dual spaces. 2. Linear transformations on finite-dimensional vector spaces - covers determinants, traces, characteristic and minimal polynomials, primary decomposition theorem, and triangular form. 3. Inner product spaces - discusses real and complex inner product spaces, Gram-Schmidt process, Bessel's inequality, Cauchy-Schwarz inequality, and isometries. 4. Adjoints of linear transformations on inner product spaces - introduces

Uploaded by

Iulian Danciu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Draft 1.

4, Version of 13 November 2007

Notes on Linear Algebra


by Peter M Neumann (Queen’s College, Oxford)

Preface
These notes are intended as a rough guide to the course Further Linear Algebra which
is a part of the Oxford 2nd year undergraduate course in mathematics. Please do not
expect a polished account. They are lecture notes, not a carefully checked text-book.
Nevertheless, I hope they may be of some help.
The course is designed to build on 1st year Linear Algebra. The syllabus for that
course includes matrices, row reduction of matrices to echelon form, rank of a matrix,
solution of simultaneous linear equations, vector spaces and their subspaces, linear depen-
dence and independence, bases and dimension of a vector space, linear transformations
and their matrices with respect to given bases, and eigenvalues and eigenvectors of a
square matrix or of a linear transformation of a vector space to itself. In that first-
year work the coefficients of our matrices or linear equations are almost exclusively real
numbers; accordingly, the field over which our vector spaces are defined is R.
A significant change of attitude occurs now. Our first task is to examine what happens
when R is replaced by an arbitrary field F of coefficients. We find that the basic theory of
matrices, linear equations, vector spaces and their subspaces, linear transformations and
their matrices is unchanged. The theory of eigenvalues and eigenvectors does depend
on the field F , however. And when we come to the “metric” or “geometric” theory
associated with inner products, orthogonality, and the like, we find that it is natural to
return to vector spaces over R or C.
As a consequence the lecture course, and therefore also this set of notes, naturally
divides into four parts: the first is a study of vector spaces over arbitrary fields; then
we study linear transformations of a vector space to itself; third, a treatment of real or
complex inner product spaces; and finally the theory of adjoints of linear transformations
on inner product spaces.

It is a pleasure to acknowledge with warm thanks that these notes have benefitted
from comments and suggestions by Dr Jan Grabowski. Any remaining errors, infelicities
and obscurities are of course my own responsibility—I would welcome feedback.

ΠMN: Queen’s: 13.xi.2007

i
CONTENTS

Part I: Fields and Vector Spaces

Fields 1

Vector spaces 1

Subspaces 3

Quotient spaces 4

Dimension (Revision) 4

Linear transformations (Revision) 5

Direct sums and projection operators 6

Linear functionals and dual spaces 10

Dual transformations 13

Further exercises I 17

Part II: Some theory of a single linear transformation


on a finite-dimensional vector space 18

Determinants and traces 18

The characteristic polynomial and the minimal polynomial 19

The Primary Decomposition Theorem 23

Triangular form 26

The Cayley–Hamilton Theorem 29

Further exercises II 32

Part III: Inner Product Spaces 34

Real inner product spaces and their geometry 34

Complex inner product spaces 37

The Gram–Schmidt process 38

Bessel’s Inequality 40

The Cauchy–Schwarz Inequality 42

Isometries of inner product spaces 43

Representation of linear functionals 44

Further exercises III 45

ii
Part IV: Adjoints of linear transformations on
finite-dimensional inner product spaces 47

Adjoints of linear transformations 47

Self-adjoint linear transformations 49

Eigenvalues and eigenvectors of self-adjoint linear transformations 50

Diagonalisability and the spectral theorem for self-adjoint linear


transformations 51

An application: quadratic forms 53

Further exercises IV 58

iii
iv
Part I: Fields and Vector Spaces
As has been indicated in the preface, our first aim is to re-work the linear algebra
presented in the Mods course, generalising from R to an arbitrary field as domain from
which coefficients are to be taken. Fields are treated in the companion course, Rings
and Arithmetic, but to make this course and these notes self-contained we begin with a
definition of what we mean by a field.

Axioms for Fields

A field is a set F with distinguished elements 0, 1, with a unary operation −,


and with two binary operations + and × satisfying the axioms below (the ‘axioms of
arithmetic’). Conventionally, for the image of (a, b) under the function + : F × F → F
we write a + b; for the image of (a, b) under the function × : F × F → F we write a b;
and x − y means x + (−y), x + y z means x + (y z).

The axioms of arithmetic

(1) a + (b + c) = (a + b) + c [+ is associative]
(2) a+b=b+a [+ is commutative]
(3) a+0=a
(4) a + (−a) = 0

(5) a (b c) = (a b) c [× is associative]
(6) ab = ba [× is commutative]
(7) a1 = a
(8) a 6= 0 ⇒ ∃ x ∈ F : a x = 1

(9) a (b + c) = a b + a c [× distributes over +]

(10) 0 6= 1

Note 1: All axioms are understood to have ∀ a, b, . . . ∈ F in front.

Note 2: See my Notes on Rings and Arithmetic for more discussion.

Note 3: Examples are Q, R, C, Z2 or more generally Zp , the field of integers


modulo p for any prime number p.

Vector Spaces
Let F be a field. A vector space over F is a set V with distinguished element 0, with
a unary operation −, with a binary operation +, and with a function × : F × V → V
satisfying the axioms below. Conventionally: for the image of (a, b) under the function
+ : V × V → V we write a + b; for a + (−b) we write a − b; and for the image of (α, v)
under the function × : F × V → V we write α v .

1
Vector space axioms

(1) u + (v + w) = (u + v) + w [+ is associative]
(2) u+v =v+u [+ is commutative]
(3) u+0=u
(4) u + (−u) = 0

(5) α (β v) = (α β) v
(6) α (u + v) = α u + α v
(7) (α + β) v = α v + β v
(8) 1v = v

Note: All axioms are understood to have appropriate quantifiers ∀ u, v, . . . ∈ V


and/or ∀ α, β, . . . ∈ F in front.

Examples:
• F n is a vector space over F ;
• the polynomial ring F [x] (see Notes on Rings and Arithmetic) is a vector space
over F ;
• Mm×n (F ) is a vector space over F ;
• if K is a field and F a subfield then K is a vector space over F ;
• etc., etc., etc.

Exercise 1. Let X be any set, F any field. Define F X to be the set of all functions
X → F with the usual point-wise addition and multiplication by scalars (elements of F ).
Show that F X is a vector space over F .

Exactly as for rings, or for vector spaces over R, one can prove important “triviali-
ties”. Of course, if we could not prove them then we would add further axioms until we
had captured all properties that we require, or at least expect, of the algebra of vectors.
But the fact is that this set of axioms, feeble though it seems, is enough. For example:

Proposition: Let V be a vector space over the field F . For any v ∈ V and any
α ∈ F we have
(i) 0 v = 0;
(ii) α 0 = 0;
(iii) if α v = 0 then α = 0 or v = 0;
(iv) α (−v) = −(α v) = (−α) v ;

Proof. For v ∈ V , from Field Axiom (3) and Vector Space Axiom (7) we have

0 v = (0 + 0) v = 0 v + 0 v.

Then adding −(0 v) to both sides of this equation and using Vector Space Axioms (4)
on the left and (1), (4), (3) on the right, we get that 0 = 0 v , as required for (i). The
reader is invited to give a proof of (ii).

2
For (iii), suppose that α v = 0 and α 6= 0: our task is to show that v = 0. By Field
Axioms (8) and (6) there exists β ∈ F such that β α = 1. Then

v = 1 v = (β α) v = β (α v) = β 0

by Vector Space Axioms (8) and (5). But β 0 = 0 by (ii), and so v = 0, as required.
Clause (iv), like Clause (ii), is offered as an exercise:

Exercise 2: Prove Clauses (ii) and (iv) of the above proposition.

Subspaces
Let V be a vector space over a field F . A subspace of V is a subset U such that
(1) 0∈U and u + v ∈ U whenever u, v ∈ U and −u ∈ U whenever u ∈ U ;
(2) if u ∈ U , α ∈ F then α u ∈ U .

Note 1: Condition (1) says that U is an additive subgroup of V . Condition (2) is


closure under multiplication by scalars.

Note 2: We write U 6 V to mean that U is a subspace of V .

Note 3: Always {0} is a subspace; if U 6= {0} then we say that U is non-zero


or non-trivial. Likewise, V is a subspace; if U 6= V then we say that U is a proper
subspace.

Note 4: A subset U of V is a subspace if and only if U 6= ∅ and U is closed


under + (that is, u, v ∈ U ⇒ u + v ∈ U ) and under multiplication by scalars. The proof
is offered as an exercise:

Exercise 3: Suppose that U ⊆ V . Show that U 6 V if and only if


(1′ ) U 6= ∅ and u + v ∈ U whenever u, v ∈ U ;
(2) if u ∈ U , α ∈ F then α u ∈ U .

P
Examples: (1) Let L1 , . . . , Lm be homogeneous linear expressions cij xj with
coefficients cij ∈ F , and let

U := {(x1 , . . . , xn ) ∈ F n | L1 = 0, . . . , Lm = 0}.

Then U 6 F n .

(2) Let F [n] [x] := {f ∈ F [x] | f = 0 or deg f 6 n}. Then F [n] [x] 6 F [x].
(3) Upper triangular matrices form a subspace of Mn×n (F ).

Exercise 4. With F X as in Exercise 1, for f ∈ F X define the support of f by


supp(f ) := {x ∈ X | f (x) 6= 0}. Define U := {f ∈ F X | supp(f ) is finite}. Show that
U is a subspace of F X .

3
Exercise 5. Let U1 , U2 , . . . be proper subspaces of a vector space V over a field F
(recall that the subspace U is said to be proper if U 6= V ).

(i) Show that V 6= U1 ∪ U2 . [Hint: what happens if U1 ⊆ U2 or U2 ⊆ U1 ?


Otherwise, take u1 ∈ U1 \ U2 , u2 ∈ U2 \ U1 , and show that u1 + u2 6∈ U1 ∪ U2 .]
(ii) Show that if V = U1 ∪U2 ∪U3 then F must be the field Z2 with just 2 elements.
[Hint: show first that we cannot have U1 ⊆ U2 ∪ U3 , nor U2 ⊆ U1 ∪ U3 ; choose
u1 ∈ U1 \ (U2 ∪ U3 ) and u2 ∈ U2 \ (U1 ∪ U3 ); observe that if λ ∈ F \ {0} then
u1 + λu2 must lie in U3 and exploit this fact.]
(iii) Show that if F is infinite (indeed, if |F | > n − 1) then V 6= U1 ∪ U2 ∪ · · · ∪ Un .

Quotient spaces
Suppose that U 6 V where V is a vector space over a field F . Define the quotient
space V /U as follows:

set := {x + U | x ∈ V } [additive cosets]


0 := U
additive inverse: −(x + U ) := (−x) + U
addition: (x + U ) + (y + U ) := (x + y) + U
multiplication by scalars: α(x + U ) := αx + U

Exercise 6 (worth doing carefully once in one’s life, but not more than once—unless
an examiner offers marks for it): Check that −, + and multiplication by scalars are
well defined, and that the vector space axioms hold in V /U .

Note: The notion of quotient space is closely analogous with the notion of quotient
of a group by a normal subgroup or of a ring by an ideal. It is not in the Part A syllabus,
nor will it play a large part in this course. Nevertheless, it is an important and useful
construct which is well worth becoming familiar with.

Dimension (Revision)
Although technically new, the following ideas and results translate so simply from
the case of vector spaces over R to vector spaces over an arbitrary field F that I propose
simply to list headers for revision:

(1) spanning sets; linear dependence and independence; bases;

(2) dimension;

(3) dim V = d ⇒ V ∼
= Fd;

(4) any linearly independent set may be extended (usually in many ways) to a basis;

(5) intersection U ∩ W of subspaces; sum U + W ;

4
(6) dim (U + W ) = dim U + dim W − dim (U ∩ W );

Exercise 7. Prove that if V is finite-dimensional and U 6 V then

dim V = dim U + dim (V /U ).

Exercise 8. With F X as in Exercise 1, show that F X is finite-dimensional if and


only if X is finite, and then dim(F X ) = |X|.

Linear Transformations (Revision)


Let F be a field, V1 , V2 vector spaces over F . A map T : V1 → V2 is said to be
linear if

T 0 = 0, T (−x) = −T x, T (x + y) = T x + T y, and T (λ x) = λ (T x)

for all x, y ∈ V and all λ ∈ F .

Note 1: The definition is couched in terms which are intended to emphasise that
what should be required of a linear transformation (homomorphism of vector spaces) is
that it preserves all the ingredients, 0, +, − and multiplication by scalars, that go into
the making of a vector space. What we use in practice is the fact that T : V1 → V2 is
linear if and only if T (α x + β y) = α T x + β T y for all x, y ∈ V and all α, β ∈ F . The
proof is an easy exercise and is left to the reader.

Note 2: The identity I : V → V is linear; if T : V1 → V2 and S : V2 → V3 are linear


then S ◦ T : V1 → V3 is linear.

For a linear transformation T : V → W we define the kernel or null-space by Ker T :=


{x ∈ V | T x = 0}. We prove that Ker T 6 V , that Im T 6 W and we define

nullityT := dim Ker T, rankT := dim Im T.

Rank-Nullity Theorem: nullity T + rank T = dim V

Note: The Rank-Nullity Theorem is a version of the First Isomorphism Theorem


for vector spaces, which states that Im T ∼
= V /Ker T .

Exercise 9. Let V be a finite dimensional vector space over a field F and let
T : V → V be a linear transformation. For λ ∈ F define Eλ := {v ∈ V | T v = λv}.

(i) Check that Eλ is a subspace of V .


(ii) Suppose that λ1 , . . . , λm are distinct. For i = 1, . . . , m let vi ∈ Eλi \ {0}.
Show that v1 , . . . , vm are linearly independent.
(iii) Suppose further that S : V → V is a linear transformation such that ST = T S .
Show that S(Eλ ) ⊆ Eλ for each λ ∈ F .

5
Direct sums and projection operators

The vector space V is said to be the direct sum of its subspaces U and W , and we
write V = U ⊕ W , if V = U + W and U ∩ W = {0}.

Lemma: Let U , W be subspaces of V . Then V = U ⊕ W if and only if for every


v ∈ V there exist unique vectors u ∈ U and w ∈ W such that v = u + w .

Proof. Suppose first that for every v ∈ V there exist unique vectors u ∈ U and
w ∈ W such that v = u + w . Certainly then V = U + W and what we need to prove
is that U ∩ W = {0}. So let x ∈ U ∩ W . Then x = x + 0 with x ∈ U and 0 ∈ W .
Equally, x = 0 + x with 0 ∈ U and x ∈ W . But the expression x = u + w with u ∈ U
and w ∈ W is, by assumption, unique, and it follows that x = 0. Thus U ∩ W = {0},
as required.
Now suppose that V = U ⊕ W . If v ∈ V then since V = U + W there certainly
exist vectors u ∈ U and w ∈ W such that v = u + w . The point at issue therefore is:
are u, w uniquely determined by v ? Suppose that u + w = u′ + w′ , where u, u′ ∈ U
and w, w′ ∈ W . Then u − u′ = w′ − w . This vector lies both in U and in W . By
assumption, U ∩ W = {0} and so u − u′ = w′ − w = 0. Thus u = u′ and w = w′ , so the
decomposition of a vector v as u + w with u ∈ U and w ∈ W is unique, as required.

Note: What we are discussing is sometimes (but rarely) called the “internal” direct
sum to distinguish it from the natural construction which starts from two vector spaces
V1 , V2 over the same field F and constructs a new vector space whose set is the product
set V1 × V2 and in which the vector space structure is defined componentwise—compare
the direct product of groups or of rings. This is (equally rarely) called the “external”
direct sum. These are two sides of the same coin: the external direct sum of V1 and V2
is the internal direct sum of its subspaces V1 × {0} and {0} × V2 ; while if V = U ⊕ W
then V is naturally isomorphic with the external direct sum of U and W .

Exercise 10. Let V be a finite-dimensional vector space over R and let U be a


non-trivial proper subspace. Prove that there are infinitely many different subspaces W
of V such that V = U ⊕ W . [Hint: think first what happens when V is 2-dimensional;
then generalise.] How far can this be generalised to vector spaces over other fields F ?

We come now to projection operators. Suppose that V = U ⊕ W . Define P : V → V


as follows. For v ∈ V write v = u + w where u ∈ U and w ∈ W and then define
P v := u. Strictly P depends on the ordered pair (U, W ) of summands of V , but to
keep things simple we will not build this dependence into the notation.

Observations:
(1) P is well-defined;
(2) P is linear;
(3) Im P = U , Ker P = W ;
(4) P2 =P.

6
Proofs. That P is well-defined is an immediate consequence of the existence and
uniqueness of the decomposition v = u + w with u ∈ U , v ∈ V .
To see that P is linear, let v1 , v2 ∈ V and α1 , α2 ∈ F (where, as always, F is the
field of scalars). Let v1 = u1 + w1 and v2 = u2 + w2 be the decompositions of v1 and v2 .
Then P v1 = u1 and P v2 = u2 . What about P (α1 v1 + α2 v2 )? Well,
α1 v1 + α2 v2 = α1 (u1 + w1 ) + α2 (u2 + w2 ) = (α1 u1 + α2 u2 ) + (α1 w1 + α2 w2 ).
Since α1 u1 + α2 u2 ∈ U and α1 w1 + α2 w2 ∈ W it follows that P (α1 v1 + α2 v2 ) =
α1 u1 + α2 u2 . Therefore
P (α1 v1 + α2 v2 ) = α1 P (v1 ) + α2 P (v2 ),
that is, P is linear.
For (3) it is clear from the definition that Im P ⊆ U ; but if u ∈ U then u = P u,
and therefore Im P = U . Similarly, it is clear that W ⊆ Ker P ; but if v ∈ Ker P then
v = 0 + w for some w ∈ W , and therefore Ker P = W .
Finally, if v ∈ V and we write v = u + w with u ∈ U and w ∈ W then
P 2 v = P (P v) = P u = u = P v ,
and this shows that P 2 = P , as required.

Terminology: the operator (linear transformation) P is called the projection of V


onto U along W .

Note 1. Suppose that V is finite-dimensional. Choose a basis u1 , . . . , ur for


U and a basis w1 , . . . , wm for W . Then the matrix of P with respect to the basis
u1 , . . . , ur , w1 , . . . , wm of V is µ ¶
Ir 0
.
0 0

Note 2. If P is the projection onto U along W then I − P , where I : V → V is


the identity transformation, is the projection onto W along U .

Note 3. If P is the projection onto U along W then u ∈ U if and only if P u = u.


The fact that if u ∈ U then P u = u is immediate from the definition of P , while if
P u = u then obviously u ∈ Im P = U .

Our next aim is to characterise projection operators algebraically. It turns out that
Observation (4) above is the key:

Terminology: An operator T such that T 2 = T is said to be idempotent.

Theorem. A linear transformation is a projection operator if and only if it is


idempotent.

Proof. We have seen already that every projection is idempotent, so the problem
is to prove that an idempotent linear transformation is a projection operator. Suppose
that P : V → V is linear and idempotent. Define
U := Im P, W := Ker P.

7
Our first task is to prove that V = U ⊕ W . Let v ∈ V . Then v = P v + (v − P v).
Now P v ∈ U (obviously), and P (v − P v) = P v − P 2 v = 0, so v − P v ∈ W . Thus
V = U + W . Now let x ∈ U ∩ W . Since x ∈ U there exists y ∈ V such that x = P y ,
and since x ∈ W also P x = 0. Then x = P y = P 2 y = P x = 0. Thus U ∩ W = {0},
and so V = U ⊕ W .
To finish the proof we need to convince ourselves (and others—such as examiners,
tutors and other friends) that P is the projection onto U along W . For v ∈ V write
v = u + w where u ∈ U and w ∈ W . Since u ∈ U there must exist x ∈ V such that
u = P x. Then

P v = P (u + w) = P u + P w = P 2 x + 0 = P x = u ,

and therefore P is indeed the projection onto U along W , as we predicted.

Exercise 11. Let V be a vector space (over some field F ), and let E1 and E2 be
projections on V .

(i) Show that E1 + E2 is a projection if and only if E1 E2 + E2 E1 = 0.


(ii) Prove that if charF 6= 2 (that is if 1 + 1 6= 0 in F ) then this happens if and
only if E1 E2 = E2 E1 = 0. [Hint: Calculate E1 E2 E1 in two different ways.]
(iii) Now suppose that E1 + E2 is a projection. Assuming that charF 6= 2, find its
kernel and image in terms of those of E1 and E2 .
(iv) What can be said in (ii) and (iii) if charF = 2?

The next result is a theorem which turns out to be very useful in many situations.
It is particularly important in applications of linear algebra in Quantum Mechanics.

Theorem. Let P : V → V be the projection onto U along W and let T : V → V be


any linear transformation. Then P T = T P if and only if U and W are T -invariant
(that is T U 6 U and T W 6 W ).

Proof. Suppose first that P T = T P . If u ∈ U , so that P u = u, then T u =


T P u = P T u ∈ U , so T U 6 U . And if w ∈ W then P (T w) = T P w = T 0 = 0, and
therefore T W 6 W .
Now suppose conversely that U and W are T -invariant. Let v ∈ V and write
v = u + w with u ∈ U and w ∈ W . Then

P T v = P T (u + w) = P (T u + T w) = T u ,

since T u ∈ U and T w ∈ W . Also,

T P v = T P (u + w) = T u .

Thus P T v = T P v for all v ∈ V and therefore P T = T P , as asserted.

We turn now to direct sums of more than two subspaces. The vector space V is
said to be the direct sum of subspaces U1 , . . . , Uk if for every v ∈ V there exist unique
vectors ui ∈ Ui for 1 6 i 6 k such that v = u1 + · · · + uk . We write V = U1 ⊕ · · · ⊕ Uk .

8
Note 1: If k = 2 then this reduces to exactly the same concept as we have just been
studying. Moreover, if k > 2 then U1 ⊕ U2 ⊕ · · · ⊕ Uk = (· · · ((U1 ⊕ U2 ) ⊕ U3 ) ⊕ · · · ⊕ Uk ).

Note 2: If Ui 6 V for 1P6 i 6 k then V = U1 ⊕ · · · ⊕ Uk if and only if V =


U1 + U2 + · · · + Uk and Ur ∩ ( i6=r Ui ) = {0} for 1 6 r 6 k . It is NOT sufficient that
Ui ∩ Uj = {0} whenever i 6= j . Consider, for example, the 2-dimensional space F 2 of
pairs (x1 , x2 ) with x1 , x2 ∈ F . Its three subspaces

U1 := {(x, 0) | x ∈ F }, U2 := {(0, x) | x ∈ F }, U3 := {(x, x) | x ∈ F }

satisfy
U1 ∩ U2 = U1 ∩ U3 = U2 ∩ U3 = {0}
and yet it is clearly not true that F 2 is their direct sum.

Note 3: If V = U1 ⊕ U2 ⊕ · · · ⊕ Uk and
Pk Bi is a basis of Ui then B1 ∪ B2 ∪ · · · ∪ Bk
is a basis of V . In particular, dim V = i=1 dim Ui . The proof, which is not deep, is
offered as an exercise:

Exercise 12. Prove that if V = U1 ⊕ U2 ⊕ · · · ⊕ Uk and Bi is a basis of Ui then


B1 ∪ B2 ∪ · · · ∪ Bk is a basis of V .

Let P1 , . . . , Pk be linear mappings V → V such that Pi2 = Pi for all i and Pi Pj = 0


whenever i 6= j . If P1 + · · · + Pk = I then {P1 , . . . , Pk } is known as a partition of the
identity on V .

Example: If P is a projection then {P, I − P } is a partition of the identity.

L Theorem. If V = U1 ⊕ · · · ⊕ Uk and Pi is the projection of V onto Ui along


j6=i Uj then {P1 , . . . , Pk } is a partition of the identity on V .
Conversely, if {P1 , . . . , Pk } is a partition of the identity on V and Ui := Im Pi then
V = U1 ⊕ · · · ⊕ U k .

Proof. LSuppose first that V = U1 ⊕ · · · ⊕ Uk . Let Pi be the projection of V onto


2
Ui along j6=i Uj . Certainly Pi = Pi , and if i 6= j then Im Pj 6 Ker Pi so Pi Pj = 0.
If v ∈ V then there are uniquely determined vectors ui ∈ Ui for 1 6 i 6 k such that
v = u1 + · · · + uk . Then
L Pi v = ui by definition of what it means for Pi to be projection
of V onto Ui along j6=i Uj . Therefore

I v = v = P1 v + · · · + Pk v = (P1 + · · · + Pk ) v .

Since this equation holds for all v ∈ V we have I = P1 + · · · + Pk . Thus {P1 , . . . , Pk }


is a partition of the identity.
To understand the converse, let {P1 , . . . , Pk } be a partition of the identity on V and
let Ui := Im Pi . For v ∈ V , defining ui := Pi v we have

v = I v = (P1 + · · · + Pk ) v = P1 v + · · · + Pk v = u1 + · · · + uk .

9
Suppose that v = w1 + · · · + wk where wi ∈ Ui for 1 6 i 6 k . Then Pi wi = wi since Pi
is a projection onto Ui . And if j 6= i then Pj wi = Pj (Pi wi ) = (Pj Pi ) wi , so Pj wi = 0
since Pj Pi = 0. Therefore

Pi v = Pi (w1 + · · · + wk ) = Pi w1 + · · · + Pi wk = wi ,

that is, wi = ui . This shows the uniqueness of the decomposition v = u1 + · · · + uk with


ui ∈ Ui and so V = U1 ⊕ · · · ⊕ Uk , and the proof is complete.

Linear functionals and dual spaces


A linear functional on the vector space V over the field F is a function f : V → F
such that
f (α1 v1 + α2 v2 ) = α1 f (v1 ) + α2 f (v2 )
for all α1 , α2 ∈ F and all v1 , v2 ∈ V .

Note: A linear functional, then, is a linear transformation V → F , where F is


construed as a 1-dimensional vector space over itself.

Example. If V = F n (column vectors) and y is a 1 × n row vector then the map


v 7→ y v is a linear functional on V .

The dual space V ′ of V is defined as follows:

Set := set of linear functionals on V


0 := zero function [v 7→ 0 for all v ∈ V ]
(−f )(v) := −(f (v))
(f1 + f2 )(v) := f1 (v) + f2 (v) [pointwise addition]
(λf )(v) := λf (v) [pointwise multiplication by scalars]

Note: It is a matter of important routine to check that the vector space axioms are
satisfied (see the exercise below). It is also important that, when invited (for example
by an examiner) to define the dual space of a vector space, you specify not only the set,
but also the operations which make that set into a vector space.

Exercise 13 (worth doing carefully once in one’s life, but not more than once—unless
an examiner offers marks for it). Check that the vector space axioms are satisfied, so
that V ′ defined as above really is a vector space over F .

Note: Some authors use V ∗ or Hom(V, F ) or HomF (V, F ) for the dual space V ′ .

Theorem. Let V be a finite-dimensional vector space over a field F . For every


basis v1 , v2 , . . . , vn of V there is a basis f1 , f2 , . . . , fn of V ′ such that
(
1 if i = j ,
fi (vj ) =
0 if i 6= j .

10
In particular, dim V ′ = dim V .
Proof. Define fi as follows. For v ∈ V we set fi (v) := αi where α1 , . . . , αn ∈ F are
such that v = α1 v1 + · · · + αn vn . This definition is acceptable because v1 , . . . , vn span
V and so such scalars α1 , . . . , αn certainly exist; moreover, since v1 , . . . , vn are linearly
independent the coefficients α1 , . . . , αn are uniquely determined by v . If w ∈ V , say
w = β1 v1 + · · · + βn vn , and λ, µ ∈ F then
³ X X ´
fi (λ v + µ w) = fi λ αj vj + µ βj vj
j j
³X ´
= fi (λ αj + µ βj ) vj
j
= λ αi + µ βi
= λ fi (v) + µ fi (w) ,

and so fi ∈ V ′ . We have thus found elements f1 , . . . , fn of V ′ such that


(
1 if i = j ,
fi (vj ) =
6 j.
0 if i =

To finish the proof we must show that they form a basis of V ′ .


P
To see that they are independent suppose that µj fj = 0, where µ1 , . . . , µn ∈ F .
Evaluate at vi : ³X ´ X
0= µj fj (vi ) = µj fj (vi ) = µi .
j j

Thus µ1 = · · · = µn = 0 and so f1 , . . . , fn are linearly independent.


To see that they span V ′ let f ∈ V ′ and define g := j f (vj ) fj . Then also g ∈ V ′
P
and for 1 6 i 6 n we have
³X ´ X
g(vi ) = f (vj ) fj (vi ) = f (vj ) fj (vi ) = f (vi ) .
j j

Since f and g are linear and agree on a basis of V we have f = g , that is, f =

P
j f (v j ) fj . Thus f1 , . . . , fn is indeed a basis of V , as the theorem states.

Note. The basis f1 , f2 , . . . , fn is known as the dual basis of v1 , v2 , . . . , vn .


Clearly, it is uniquely determined by this basis of V .

Example. If V = F n (n × 1 column vectors) then we may identify V ′ with the


space of 1 × n row vectors. The canonical basis e1 , e2 , . . . , en then has dual basis
e′1 , e′2 , . . . , e′n , the canonical basis of the space of row vectors.

Exercise 14. Let F be a field with at least 4 elements and let V be the vector
space of polynomials c0 + c1 x + c2 x2 + c3 x3 of degree 6 3 with coefficients from F .
(i) Show that for a ∈ F the map fa : V → F given by evaluation of polynomial p
at a (that is, fa (p) = p(a) ) is a linear functional.
(ii) Show that if a1 , a2 , a3 , a4 are distinct members of F then {fa1 , fa2 , fa3 , fa4 }
is a basis of V ′ , and find the basis {p1 , p2 , p3 , p4 } of V of which this is the dual
basis.

11
(ii) Generalise to the vector space of polynomials of degree 6 n over F .

Let V be a vector space over the field F . For a subset X of V the annihilator is
defined by
X ◦ := {f ∈ V ′ | f (x) = 0 for all x ∈ X } .

Note: For any subset X the annihilator X ◦ is a subspace. For, if f1 , f2 ∈ X ◦ and


α1 , α2 ∈ F then for any x ∈ X
(α1 f1 + α2 f2 )(x) = α1 f1 (x) + α2 f2 (x) = 0 + 0 = 0,
and so α1 f1 + α2 f2 ∈ X ◦ .

Note: X ◦ = {f ∈ V ′ | X ⊆ Ker f }.

Theorem. Let V be a finite-dimensional vector space over a field F and let U be


a subspace. Then
dim U + dim U ◦ = dim V .

Proof. Let u1 , . . . , um be a basis for U and extend it to a basis u1 , . . . , um ,


um+1 , . . . , un for V . Thus dim U = m and dim V = n. Let f1 , . . . , fn be the dual
basis of V ′ . We’ll prove that fm+1 , . . . , fn is a basis of U ◦ . Certainly, if m + 1 6 j 6 n
then fj (ui ) = 0 for i 6 m and so fj ∈ U ◦ P since u1 , . . . , um span U . Now let f ∈ U ◦ .
There exist α1 , . . . , αn ∈ F such that f = j αj fj . Then
X
f (ui ) = αj fj (ui ) = αi ,
j

and so αi = 0 for 1 6 i 6 m, that is, f is a linear combination of fm+1 , . . . , fn . Thus


fm+1 , . . . , fn span U ◦ and so they form a basis of it. Therefore dim U ◦ = n − m, that
is, dim U + dim U ◦ = dim V .

A worked example—part of an old FHS question (and useful to know). Let V


be a finite-dimensional vector space over a field F . Show that if U1 , U2 are subspaces
then (U1 + U2 )◦ = U1◦ ∩ U2◦ and (U1 ∩ U2 )◦ = U1◦ + U2◦ .

Response. If f ∈ U1◦ ∩ U2◦ then f (u1 + u2 ) = f (u1 ) + f (u2 ) = 0 + 0 = 0 for any


u1 ∈ U1 and any u2 ∈ U2 . Therefore U1◦ ∩ U2◦ ⊆ (U1 + U2 )◦ . On the other hand,
U1 ⊆ U1 + U2 and U2 ⊆ U1 + U2 and so if f ∈ (U1 + U2 )◦ then f ∈ U1◦ ∩ U2◦ , that is
(U1 ∩U2 )◦ ⊆ U1◦ +U2◦ . Therefore in fact (U1 ∩U2 )◦ = U1◦ +U2◦ . Note that the assumption
that V is finite-dimensional is not needed here.
For the second part, clearly U1◦ ⊆ (U1 ∩ U2 )◦ and U2◦ ⊆ (U1 ∩ U2 )◦ and so U1◦ + U2◦ ⊆
(U1 ∩ U2 )◦ . Now we compare dimensions:
dim (U1◦ + U2◦ ) = dim U1◦ + dim U2◦ − dim (U1◦ ∩ U2◦ )
= dim U1◦ + dim U2◦ − dim (U1 + U2 )◦ [by the above]
= (dim V − dim U1 ) + (dim V − dim U2 ) − (dim V − dim (U1 + U2 ))
= dim V − (dim U1 + dim U2 − dim (U1 + U2 ))
= dim V − dim (U1 ∩ U2 )
= dim (U1 ∩ U2 )◦ .

12
Therefore U1◦ + U2◦ = (U1 ∩ U2 )◦ , as required.

To finish this study of dual spaces we examine the second dual, that is, the dual of
the dual. It turns out that if V is a finite-dimensional vector space then the second dual
V ′′ can be naturally identified with V itself.

Theorem. Let V be a vector space over a field F . Define Φ : V → V ′′ by


(Φ v)(f ) := f (v) for all v ∈ V and all f ∈ V ′ . Then Φ is linear and one-one [in-
jective]. If V is finite-dimensional then Φ is an isomorphism.

Proof. We check linearity as follows. For v1 , v2 ∈ V and α1 , α2 ∈ F , and for any


f ∈ V ′,
Φ(α1 v1 + α2 v2 )(f ) = f (α1 v1 + α2 v2 )
= α1 f (v1 ) + α2 f (v2 )
= α1 (Φ v1 )(f ) + α2 (Φ v2 )(f )
= (α1 (Φ v1 ) + α2 (Φ v2 ))(f ),

and so Φ(α1 v1 + α2 v2 )α1 (Φ v1 ) + α2 (Φ v2 ).


Now
Ker Φ = {v ∈ V | Φ v = 0}
= {v ∈ V | f (v) = 0 for all f ∈ V ′ }
= {0}

and therefore Φ is injective. If V is finite-dimensional then

dim Im (Φ) = dim V = dim V ′ = dim V ′′

and so Φ is also surjective, that is, it is an isomorphism.

Exercise 15. Let V be a finite-dimensional vector space over a field F . If Y ⊆ V ′


we’ll use Y ◦ to denote {v ∈ V | f (v) = 0 for all f ∈ Y }. Prove that if U 6 V (that is,
U is a subspace of V ) then (U ◦ )◦ = U . Prove also that if X ⊆ V (that is, X is a subset
of V ) then (X ◦ )◦ is the subspace hXiF spanned by X .

Dual transformations
Let V and W be vector spaces over a field F , and let T : V → W be a linear
transformation. The dual transformation T ′ : W ′ → V ′ is defined by

T ′ (f ) := f ◦ T for all f ∈ W ′ .

Note that T ′ f is often written for T ′ (f ). Thus

(T ′ f )(v) = f (T v) for all v ∈ V .

Fact. This specification does define a map T ′ : W ′ → V ′ , which, moreover, is


linear.

13
Proof. We need first to show that if f ∈ W ′ then T ′ f ∈ V ′ . But if f ∈ W ′ then
f : W → F and f is linear, so, since T : V → W is linear and composition of linear
maps produces a linear map, also f ◦ T : V → F is linear, that is, T ′ f ∈ V ′ . Now let
f1 , f2 ∈ W ′ and α1 , α2 ∈ F . Then for any v ∈ V we have

T ′ (α1 f1 + α2 f2 )(v) = (α1 f1 + α2 f2 )(T v)


= α1 f1 (T v) + α2 f2 (T v)
= α1 T ′ f1 (v) + α2 T ′ f2 (v)
= (α1 T ′ f1 + α2 T ′ f2 )(v)

and so T ′ (α1 f1 + α2 f2 ) = (α1 T ′ f1 + α2 T ′ f2 ), that is, T ′ : W ′ → V ′ is linear.

Observations. Let V and W be vector spaces over a field F , and let T : V → W


be a linear transformation.
(1) If V = W and T = I then also T ′ = I .
(2) If S : U → V and T : V → W then (T ◦ S)′ = S ′ ◦ T ′ .
(3) Hence if T is invertible then (T −1 )′ = (T ′ )−1 .
(4) If T1 , T2 : V → W and α1 , α2 ∈ F then (α1 T1 + α2 T2 )′ = α1 T1′ + α2 T2′ .
(5) Hence if T : V → V and f (x) ∈ F [x] then f (T )′ = f (T ′ )

Proofs. Clause (1) is immediate from the definition since I ′ f = f ◦ I = f for all
f ∈ W ′ . Clauses (2), (3), (4) are straightforward routine and should be done as an
exercise (see below). For (5) we need to know what is meant by f (T ) when f is a
polynomial with coefficients from F and T is a linear transformation V → V . It means
precisely what you would expect it to mean: non-negative integral powers of T are
defined by T 0 := I , T n+1 := T ◦ T n for n > 0, and then if f (x) = a0 + a1 x + · · · + an xn
then f (T ) is the corresponding linear combination a0 I + a1 T + · · · + an T n of the powers
of T . The fact that f (T )′ = f (T ′ ) therefore follows from (1), (2) and (4) by induction
on the degree of f .

Exercise 16. Let U , V and W be vector spaces over a field F , and let S : U → V ,
T : V → W be linear transformations. Show that (T S)′ = S ′ T ′ . Deduce that if
T : V → V is an invertible linear transformation then (T −1 )′ = (T ′ )−1 .

Exercise 17. Let V and W be vector spaces over a field F and let T1 , T2 : V → W
be linear transformations. Show that if α1 , α2 ∈ F then (α1 T1 + α2 T2 )′ = α1 T1′ + α2 T2′

We ask now what can be said about the matrix of a dual transformation with respect
to suitable bases in W ′ and V ′ .

Theorem. Suppose that V , W are finite-dimensional vector spaces over F and


T : V → W is linear. Let v1 , v2 , . . . , vm be a basis of V , and w1 , w2 , . . . , wn a basis
of W . Let A be the matrix of T with respect to these bases; let B be the matrix of the
dual transformation T ′ with respect to the dual bases of W ′ , V ′ . Then B = Atr .

14
Proof. Let f1 , . . . , fm and h1 , . . . , hn be the relevant dual bases in V ′ and W ′
respectively, so that
( (
1 if p = i , 1 if q = j ,
fp (vi ) = hq (wj ) =
0 if p 6= i , 0 if q 6= j .

Let ai,j , bp,q be the (i, j)- and (p, q)-coefficients of A and B respectively. By definition
then
Xn m
X
T vi = aj,i wj and T ′ hq = bp,q fp .
j=1 p=1

Now
n
³X ´

(T hq )(vi ) = hq (T vi ) = hq aj,i wj = aq,i .
j=1

T ′ hq ′
P
It follows that = p aq,p fp
(compare the proof on p. 11 that the dual basis spans).
Therefore bp,q = aq,p for 1 6 p 6 m and 1 6 q 6 n, that is, B = Atr as the theorem
states.

Corollary. For a linear transformation T : V → W of finite-dimensional vector


spaces over any field F we have rankT ′ = rankT .

For we know that a matrix and its transpose have the same rank. By studying the
kernel and image of a dual transformation we get a more “geometric” understanding of
the corollary, however:

Theorem. Let T : V → W be a linear transformation of finite-dimensional vector


spaces over a field F . Then
Ker T ′ = (Im T )◦ and Im T ′ = (Ker T )◦ .

Proof. The first equation does not depend on finite-dimensionality—it is true in


complete generality:
Ker T ′ = {f ∈ W ′ | T ′ f = 0} = {f ∈ W ′ | f ◦ T = 0}
= {f ∈ W ′ | f (Im T ) = {0}}
= (Im T )◦ .
For the second, suppose that f ∈ Im T ′ , say f = T ′ g , where g ∈ W ′ , and let u ∈ Ker T .
Then
f (u) = (T ′ g)(u) = g(T u) = g(0) = 0 .
Thus Im T ′ 6 (Ker T )◦ . Again, this is true quite generally. But to get equality we use
finite-dimensionality:
dim (Im T ′ ) = dim W ′ − dim (Ker T ′ ) [Rank-Nullity Theorem]

= dim W − dim ((Im T ) ) [dim W ′ = dim W and first part]
= dim (Im T ) [theorem on dimension of annihilator]
= dim V − dim (Ker T ) [Rank-Nullity Theorem]

= dim ((Ker T ) ) [theorem on dimension of annihilator]

15
and therefore Im T ′ = (Ker T )◦ , as required.

A worked example—FHS 2000, Paper a1, Qn 1:

Let V be a finite-dimensional vector space over R and let P : V → V be a


linear transformation of V . Let V1 = Ker (P ) and V2 = Ker (IV − P ), where
IV : V → V is the identity map on V , and suppose that V = V1 ⊕ V2 . Prove
that P 2 = P .
Define the dual space V ′ of V and the dual transformation P ′ of P .
Show that (P ′ )2 = P ′ . Hence or otherwise show that V ′ = U1 ⊕ U2 where
U1 = Ker (P ′ ) and U2 = Ker (IV ′ − P ′ ).
Let E be a basis for V . Define the dual basis E ′ for V ′ and show that
it is indeed a basis. Suppose that E ⊆ V1 ∪ V2 . Show that E ′ ⊆ U1 ∪ U2 ,
and describe the matrices of P and P ′ with respect to the bases E and E ′
respectively.

Response. Note that in fact it is irrelevant that the field of coefficients is R, so we’ll
do this for vector spaces over an arbitrary field F .
For the first part let v ∈ V and write v = v1 + v2 where v1 ∈ V1 and v2 ∈ V2 . Note
that P v2 = v2 since (IV − P ) v2 = 0. Then P v = P v1 + P v2 = 0 + v2 = v2 , and
P 2 v = P v2 = v2 . Thus P 2 v = P v for all v ∈ V and so P 2 = P .
Defining dual spaces and dual transformations is bookwork treated earlier in these
notes. Then for f ∈ V ′ we have that

(P ′ )2 f = P ′ (P ′ (f )) = P ′ (f ◦ P ) = (f ◦ P ) ◦ P = f ◦ P 2 = f ◦ P = P ′ f

and so (P ′ )2 = P ′ .
The proof of the theorem on p. 7 above that idempotent operators are projections
includes a proof that V ′ = U1 ⊕ U2 where U1 = Ker (P ′ ) and U2 = Ker (IV ′ − P ′ ) [and
this bookwork is what the examiner would have expected candidates to expound here].
Definition of dual basis and proof that it really is a basis is bookwork treated earlier in
these notes (see p. 10 above). Now suppose that E ⊆ V1 ∪V2 . Thus E = {v1 , v2 , . . . , vn },
where we may suppose that v1 , . . . , vk ∈ V1 and vk+1 , . . . , vn ∈ V2 . Let f1 , . . . , fn be
the dual basis E ′ of V ′ , so that fi (vj ) = 0 if i 6= j and fi (vi ) = 1. Consider an
index i such that 1 6 i 6 k . Now (P ′ fi )(vj ) = fi (P vj ), and so if 1 6 j 6 k then
(P ′ fi )(vj ) = 0 since P vj = 0, while if k +1 6 j 6 n then (P ′ fi )(vj ) = 0 since P vj = vj
and fi (vj ) = 0. Thus (P ′ fi )(vj ) = 0 for all relevant j and therefore P ′ fi = 0, that is,
fi ∈ U1 . Similarly, if k + 1 6 i 6 n then fi ∈ U2 . That is, E ′ ⊆ U1 ∪ U2 as required.
′ ′
µ of¶P with respect to E is the same as the matrix
And now it is clear that the matrix
0 0
of P with respect to E , namely , where r := n − k , Ir is the r × r identity
0 Ir
matrix and the three entries 0 represent k ×k , k ×r and r ×k zero matrices respectively.

16
Further exercises I
Exercise 18. Let V be an n-dimensional vector space over a field F .

(i) Let U be an m-dimensional subspace of V and let

B(U ) := {T : V → V | T is linear and T (U ) 6 U } .

Show that B(U ) is a subspace of the space Hom(V, V ) of all linear transforma-
tions V → V and that dim B(U ) = n2 − m n + m2 .
(ii) A flag in V is an increasing sequence {0} = U0 < U1 < U2 < · · · < Uk = V of
subspaces beginning with {0} and ending with V itself. For such a flag F we
define

B(F) := {T : V → V | T is linear and T (Ui ) 6 Ui for 1 6 i 6 k } .

Show that B(F) is a subspace of Hom(V, V ) and calculate its dimension in


terms of n and the dimensions mi of the subspaces Ui .

Exercise 19. Let V be a vector space over a field F such that charF 6= 2, and let
E1 , E2 , E3 be idempotent linear transformations V → V such that E1 + E2 + E3 = I ,
where I : V → V is the identity transformation. Show that Ei Ej = 0 when i 6= j (that
is, {E1 , E2 , E3 } is a partition of the identity on V ). [Hint: recall Exercise 11 on p. 8.]
Give an example of four idempotent operators E1 , E2 , E3 , E4 operators on V such
that E1 + E2 + E3 + E4 = I but {E1 , E2 , E3 , E4 } is not a partition of the identity on V .

Exercise 20. Let F := Z2 , the field with just two elements 0 and 1, let

V := {f ∈ F N | supp(f ) is finite},

the subspace of F N defined in Exercise 4 (see p. 3). Thus V may be thought of as the
vector space of sequences (x0 , x1 , x2 , . . .) where each coordinate xi is 0 or 1 and all
Pmany of the coordinates are 0. For each subset S ⊆ N define ϕS : V → F
except finitely
by ϕS (f ) = n∈S f (n).
(i) Show that ϕS ∈ V ′ for all S ⊆ N.
(ii) Show that in fact V ′ = {ϕS | S ⊆ N}. [Hint: for each n ∈ N let en ∈ V be the
function such that en (n) = 1 and en (m) = 0 when m 6= n; then, given ϕ ∈ V ′
define S by S := {n ∈ N | ϕ(en ) = 1} and seek to show that ϕ = ϕS .]
(iii) Show that V is countable but V ′ is uncountable.

17
Part II: Some theory of a single linear transformation on
a finite-dimensional vector space

We turn now to studying linear transformations on a finite-dimensional vector space


to itself. The beginnings of this theory treat eigenvalues and eigenvectors, the charac-
teristic polynomial, the minimal polynomial, and the Cayley–Hamilton Theorem.
Throughout this part V is a finite-dimensional vector space over a field F and
T : V → V is linear. If v1 , . . . , vn is a basis of V then T corresponds to an n × n
matrix A over F . For, if 1 6 i 6 n then T vi can be expressed uniquely as a linear
combination of the members of the basis, and if
n
X
T vi = aji vj
j=1

then A = (aij ). Note that A is the transpose of the array of coefficients that would be
written out on the page if we did not use summation notation. That might seem a little
artificial, but this convention ensures that if S : V → V and S has matrix B then S ◦ T
has matrix A B . The point to remember is that with respect to a given basis of V there
is a one-to-one correspondence between linear transformations T : V → V and n × n
square matrices over F , where n := dim V .
For most of this section we will use linear transformations or n × n matrices over F
interchangeably, whichever is the more convenient for the job in hand. Therefore you are
advised to recall your facility for calculating with matrices. Here is an exercise to help.

µ 21.¶ Suppose that n = p + q . Partition n × n matrices over a field F into


Exercise
A B
the form , where A ∈ Mp×p (F ), B ∈ Mp×q (F ), C ∈ Mq×p (F ), D ∈ Mq×q (F ).
C D µ ¶ µ ¶
X11 X12 Y11 Y12
Show that if X is partitioned as and Y is partitioned as then
X21 X22 Y21 Y22
µ ¶
X11 Y11 + X12 Y21 X11 Y12 + X12 Y22
XY = .
X21 Y11 + X22 Y21 X21 Y12 + X22 Y22

Determinants and traces


Recall from Mods the definitions of detA and traceA where A is a square matrix.
If A is the n × n matrix (aij ) then

X n
X
parity(ρ)
detA = (−1) a
1 1ρ a
2 2ρ ··· a n nρ and traceA = aii ,
ρ∈Sym (n) i=1

where parity(ρ) is 0 or 1 occording to whether ρ is an even or odd permutation, and


iρ denotes the image of i under ρ. Thus detA is described as a sum of n! terms, each
of which is plus or minus the product of n coefficients of A, chosen in such a way that
there is just one from each row and just one from each column. In Mods the coefficients
aij were real (or perhaps complex) numbers. But for us they come from an arbitrary

18
field F (they could even come from a commutative ring). The basic properties are the
same, for example that det(AB) = (detA)(detB), that trace(AB) = trace(BA), and
that A is invertible (in the sense that there exists B such that A B = B A = I ) if and
only if detA 6= 0.
Now for our linear transformation T : V → V we define the determinant of T by
detT := detA, and we define the trace by traceT := traceA where A is the matrix of
T with respect to some basis of V . On the face of it these definitions might depend on
the particular basis that is used—in which case they would be of doubtful value. But in
fact they do not:

Observation. Determinant and trace of T depend only on T and do not depend


on the basis of V used to compute them.

Proof. We know from Mods that if B is the matrix representing T with respect to
another basis then B = U −1 A U where U is the matrix whose entries are the coefficients
needed to express the elements of one basis as linear combinations of the elements of the
other. Therefore
detB = (detU )−1 (detA) (detU ) = detA
and
traceB = trace((U −1 A) U ) = trace(U (U −1 A)) = traceA,
as required.

The characteristic polynomial and the minimal polynomial


The characteristic polynomial of an n × n matrix A is defined by

cA (x) := det(xI − A) .

For our linear transformation T : V → V , the characteristic polynomial of T is defined


by cT (x) := cA (x) where A represents T with respect to some basis of V . By what has
just been shown, cT (x) is well defined—that is, it is independent of the basis used to
calculate it.

Note: If n := dim V then cT (x) is a monic polynomial (monic means leading


coefficient = 1), and it is of degree n. In fact

cT (x) = xn − c1 xn−1 + c2 xn−2 − · · · + (−1)n cn ,

where
c1 = traceT , cn = detT, etc.
Here ‘etc.’ hides a great deal of detailed information. The other coefficients cr are im-
portant but more complicated functions of A—in fact cr is the sum of the determinants
of all the so-called r × r principal submatrices of A, that is, square submatrices of A of
which the diagonals coincide with part of the diagonalµof A. For¶example, c2 is the sum
aii aij
of the determinants of all the 21 n(n − 1) submatrices for 1 6 i < j 6 n.
aji ajj

19
Just as in the case of linear algebra over R, we define eigenvalues and eigenvectors of
T as follows: a scalar λ ∈ F is said to be an eigenvalue of T if there exists a non-zero
vector v ∈ V such that T v = λ v ; and a vector v ∈ V is said to be an eigenvector of T
if v 6= 0 and there exists a scalar λ ∈ F such that T v = λ v .

Theorem. The characteristic polynomial of a linear transformation T : V → V


has the following properties:
(1) if S = U −1 T U , where U : V → V is linear and invertible, then cS (x) = cT (x) ;
(2) a scalar λ ∈ F is an eigenvalue of T if and only if cT (λ) = 0 ;

Proof. If S = U −1 T U then cS (x) = det(xI − U −1 T U ) = det(U −1 (xI − T ) U ) =


det(xI − T ) = cT (x), and this proves (1).
For (2), λ is an eigenvalue if and only if there exists v ∈ V \ {0} such that
(λ I − T ) v = 0, that is, if and only if Ker (λ I − T ) 6= {0}. But we know that this
holds if and only if λ I − T is not invertible, that is, if and only if det(λ I − T ) = 0, so
cT (λ) = 0, as required.

We turn now to the so-called minimal polynomial of T . We saw on p. 14 how f (T )


is defined for a polynomial f ∈ F [x]: if f (x) = a0 + a1 x + · · · + ak xk ∈ F [x] then
f (T ) := a0 I + a1 T + · · · + ak T k , where I : V → V is the identity map. If the matrix
representing T with respect to a given basis of V is A, then the matrix representing
f (T ) with respect to this basis will be f (A).

Exercise 22. Show that if U : V → V is invertible and S := U −1 T U then


f (S) = U −1 f (T ) U for any polynomial f ∈ F [x].

Now we come to an important definition. A monic polynomial f ∈ F [x] \ {0} of least


degree such that f (T ) = 0 is known as the minimal polynomial of T . Similarly, for an
n × n matrix A, a monic polynomial f ∈ F [x] \ {0} of least degree such that f (A) = 0
is known as the minimal polynomial of A.
For these definitions to make sense it must be the case that there exist non-zero
polynomials f ∈ F [x] such that f (T ) = 0, or f (A) = 0 respectively. As preparation for
the proof of this we need a lemma:

Lemma. Let n := dim V . The set of all linear transformations S : V → V forms a


vector space of dimension n2 over F .

Proof. That the set of all linear transformations V → V forms a vector space should
be clear since we can add linear transformations and multiply them by scalars, and the
vector space axioms can easily be checked. The correspondence of linear transformations
with matrices is obviously a vector-space isomorphism, and the space of n × n matrices
has dimension n2 since the matrices Ep q , where Ep q has 1 as its (p, q) entry and 0
elsewhere, form a basis.

Corollary. There is a polynomial f ∈ F [x] \ {0} such that f (T ) = 0. Similarly


for n × n matrices over F .

20
Proof. Since the set of all linear transformations V → V forms a vector space of
2
dimension n2 , the transformations I , T , T 2 , . . ., T n , of which there are n2 + 1, must
be linearly dependent. Therefore for 0 6 i 6 n2 there exist ci ∈ F , not all 0, such that
Pn2 i
Pn2 i
i=0 ci T = 0. So if f (x) := i=0 ci x then f (x) ∈ F [x] \ {0} and f (T ) = 0.

It follows immediately that the minimal polynomial of our linear transformation


T : V → V or of an n × n matrix A over F is well-defined.

Observation. The minimal polynomial of T or of an n × n matrix A over F is


unique.

For, if f1 , f2 ∈ F [x] are minimal polynomials of T (or of A), then f1 , f2 are monic
and of the same degree, say m. Therefore if g := f1 − f2 then either g = 0 or deg g < m.
But g(T ) = f1 (T )−f2 (T ) = 0, and so since m is the least degree of non-zero polynomials
which annihilate T , it must be that g = 0: that is, f1 = f2 .

Notation. We’ll write mT (x) (or mA (x) when A is an n × n matrix over F ) for
the minimal polynomial of T (or of A). Note that if A ∈ Mn×n (F ) and A represents T
with respect to some basis of V then mT (x) = mA (x).

Observation. If S = U −1 T U , where U : V → V is linear and invertible, then


mS (x) = mT (x).

For, if f ∈ F [x], then f (S) = U −1 f (T ) U (see Exercise 21), and so f (S) = 0 if and
only if f (T ) = 0.

Examples: • mI (x) = x − 1;

• m0 (x) = x;
 
1 0 0
• if A =  0 2 0  then mA (x) = (x2 − 3x + 2).
0 0 2

Exercise 23. Let T : V → V be a linear transformation of a finite-dimensional


vector space over a field F . Show that T is invertible if and only if the constant term
of mT (x) is non-zero.

Theorem. For f ∈ F [x], f (T ) = 0 if and only if mT (x) divides f (x) in F [x].


Similarly for A ∈ Mn×n (F ).

Proof. Let f ∈ F [x]. Since F [x] has a Division Algorithm we can find q, r ∈ F [x]
such that f (x) = q(x) mT (x)+r(x) and either r = 0 or deg r < deg mT . Now mT (T ) = 0
and so f (T ) = 0 if and only if r(T ) = 0. It follows from the minimality of deg mT that
f (T ) = 0 if and only if r = 0. That is f (T ) = 0 if and only if mT (x) divides f (x) in
F [x], as required.

21
Exercise 24. Let Ann(T ) := {f ∈ F [x] | f (T ) = 0}, the so-called annihilator of
T in F [x]. Show that Ann(T ) is an ideal in F [x], and that mT is a generator—that is,
Ann(T ) is the principal ideal (mT ) in F [x].

Next we examine the roots of the minimal polynomial: it turns out that they are
precisely the eigenvalues of T in F :

Theorem. For λ ∈ F , mT (λ) = 0 if and only if cT (λ) = 0. Similarly for


A ∈ Mn×n (F ).

Proof. Suppose first that cT (λ) = 0, so that λ is an eigenvalue. Let v be a non-


zero vector such that T v = λ v . A simple inductive argument shows that T n v = λn v
for all n > 0, and then forming linear combinations we see that f (T ) v = f (λ) v for
any polynomial f ∈ F [x]. In particular, mT (λ) v = mT (T ) v = 0. But v 6= 0 and so
mT (λ) = 0.
For the converse, let λ ∈ F and suppose that mT (λ) = 0. We know then that there
exists g ∈ F [x] such that mT (x) = (x − λ) g(x). Now deg g < deg m and so g(T ) 6= 0.
Therefore there exists w ∈ V such that g(T ) w 6= 0. Define v := g(T ) w . Then v 6= 0
and (T − λ I)v = (T − λ I)g(T ) w = mT (T ) w = 0, so λ is an eigenvalue of T , that is,
cT (λ) = 0.
 
1 0 0
Example. Let A =  0 2 0  . Then
0 0 2

cA (x) = (x − 1)(x − 2)2


mA (x) = (x − 1)(x − 2) .

Note. In fact mT (x) and cT (x) always have the same irreducible factors in F [x].
   
0 1 0 0 1 1 0 1
0 0 1 0  −2 −1 −1 0
Exercise 25. For each of the matrices  1 0 0 0,
  
 0 0 2 −5 
0 0 0 1 0 0 1 −2
find the characteristic polynomial and the minimal polynomial.

Exercise 26. [Part of a former FHS question.] (i) Let A be a 3×3 matrix whose
characteristic polynomial is x3 . Show that there are exactly three possibilities for the
minimal polynomial of A and give an example of matrices of each type.

(ii) Let V be a finite-dimensional vector space over some field F , and let T : V → V
be a linear transformation whose minimal polynomial is xk . Prove that
{0} < Ker T < Ker T 2 < · · · < Ker T k = V

and deduce that dim V > k .

22
The Primary Decomposition Theorem
The Primary Decomposition Theorem is a very useful result for understanding linear
transformations and square matrices. Like many of the best results in mathematics it is
more a theory than a single cut and dried theorem and I propose to present three forms
of it. The first is the basic, all-purpose model which contains the main idea; the second
gives some detail about the minimal polynomial; and the third is designed to extract a
considerable amount of detail from the prime factorisation of the minimal polynomial.
Throughout this section notation is as before: F is a field, V is a finite-dimensional
vector space over F , and T : V → V is linear. Recall that a subspace U is said to be
T -invariant if T (U ) 6 U , that is, T u ∈ U for all u ∈ U . When this is the case we write
T |U for the restriction of T to U . Thus T |U : U → U and TU u = T u for all u ∈ U .

Reminder: although two linear operators do not usually commute, if they are poly-
nomials in one and the same operator T then they certainly do commute. For, powers
of T obviously commute and therefore so do linear combinations of powers of T .

Primary Decomposition Theorem (Mark 1). Suppose that f (T ) = 0, where


f ∈ F [x]. Suppose also that f (x) = g(x) h(x), where g, h ∈ F [x] and g , h are co-
prime. Then there are T -invariant subspaces U, W of V such that V = U ⊕ W and
g(T |U ) = 0, h(T |W ) = 0.

Proof. Our problem is to find subspaces U and W of V that have the specified
properties. Those properties include that g(T ) u = 0 for all u ∈ U and h(T ) w = 0 for
all w ∈ W . Therefore we know that we must seek U inside Ker g(T ) and W inside
Ker h(T ). In fact we define

U := Ker g(T ) and W := Ker h(T )

and prove that these subspaces do what is wanted. Certainly, if u ∈ U then g(T )(T u) =
T g(T ) u = T 0 = 0. Thus if u ∈ U then T u ∈ U , so U is T -invariant. Similarly, W
is T -invariant. And the facts that g(T |U ) = 0 and h(T |W ) = 0 are immediate from the
definitions of U and of W . It remains to prove therefore that V = U ⊕ W .
From the theory of polynomials rings over a field we know that, since g , h are
coprime, there exist a, b ∈ F [x] such that

a(x) g(x) + b(x) h(x) = 1.

Then
a(T ) g(T ) + b(T ) h(T ) = I,
where I : V → V is the identity as usual. For v ∈ V define

u := b(T ) h(T ) v and w := a(T ) g(T ) v .

Then v = u + w . Moreover, g(T ) u = g(T ) b(T ) h(T ) u = b(T ) f (T ) u = 0, and so u ∈ U .


Similarly w ∈ W . Thus V = U + W . Now let v ∈ U ∩ W . Then

v = a(T ) g(T ) v + b(T ) h(T ) v = a(T ) 0 + b(T ) 0 = 0 .

Thus U ∩ W = {0} and V = U ⊕ W as required.

23
Exercise 27. With the notation and assumptions of the Primary Decomposition
Theorem, let P be the projection of V onto U along W . Find p(x) ∈ F [x] such that
P = p(T ).

Primary Decomposition Theorem (Mark 2). Suppose that mT (x) = g(x)h(x)


where g, h ∈ F [x] are monic and co-prime. Let U , W be as in the previous theorem.
Then mT |U = g and mT |W = h .

Proof. Define f (x) := mT |U (x) × mT |W (x). For v ∈ V write v = u + w , where


u ∈ U and w ∈ W . Then

f (T ) v = mT |U (T ) mT |W (T ) (u + w)
= mT |W (T ) mT |U (T ) u + mT |U (T ) mT |W (T ) w
= 0 + 0 = 0.

Thus f (T ) v = 0 for all v ∈ V , that is f (T ) = 0. Therefore mT (x) divides f (x) in F [x].


Since g(T |U ) = 0 we know that mT |U (x) divides g(x) in F [x]. Similarly mT |W (x)
divides h(x) in F [x]. Therefore f (x) divides g(x) h(x) in F [x], that is f (x) divides
mT (x) in F [x]. It follows that mT (x) = c f (x) for some non-zero c ∈ F . Since mT (x)
and f (x) are both monic, in fact c = 1, that is mT (x) = f (x). And now, if we write
g(x) = c1 (x) mT |U (x) and h(x) = c2 (x) mT |W (x), where c1 , c2 ∈ F [x], then we see that
c1 (x) c2 (x) = 1, so both c1 and c2 are constant polynomials. Since all of g , h, mT |U ,
mT |W are monic, in fact c1 = c2 = 1. Thus mT |U = g and mT |W = h, as required.

Example. If mT (x) = x2 − x then (as we already know) there exist U, W 6 V


such that V = U ⊕ W , T |U = IU and T |W = 0W .

Primary Decomposition Theorem (Mark 3). If

mT (x) = f1 (x)a1 f2 (x)a2 · · · fk (x)ak ,

where f1 , f2 , . . . , fk are distinct monic irreducible polynomials over F , then

V = V1 ⊕ V 2 ⊕ · · · ⊕ V k ,

where V1 , V2 , . . . , Vk are T -invariant subspaces and the minimal polynomial of T |Vi is


fiai for 1 6 i 6 k .

Proof. We use induction on k . The result is trivially true if k = 1, that is, if mT (x)
is simply a power of some irreducible polynomial. Our inductive hypothesis is that if U
is a finite-dimensional vector space over F and if the minimal polynomial of S : U → U
factorises as a product of k − 1 powers of irreducible polynomials, then U decomposes
as a direct sum as described in the statement of the theorem (with S replacing T ).
So now suppose that mT (x) = f1 (x)a1 f2 (x)a2 · · · fk (x)ak , where f1 , f2 , . . . , fk are
distinct monic irreducible polynomials over F . Let
k−1
Y
g(x) := fi (x)ai and h(x) := fk (x)ak .
i=1

24
By the Primary Decomposition Theorem, Mark 2, V = U ⊕ W , where U , W are T -
invariant and mT |U = g , mT |W = h. Applying the induction hypothesis to U and T |U
we see that U = U1 ⊕ · · · ⊕ Uk−1 , where the subspaces Ui are T |U -invariant and the
minimal polynomial of the restriction of T |U to Ui is fiai for 1 6 i 6 k −1. But of course
this simply means that Ui is T -invariant and the minimal polynomial of the restriction
of T to Ui is fiai for 1 6 i 6 k − 1. Define Vi := Ui for 1 6 i 6 k − 1 and Vk := W to
complete the proof.

An important application of the Primary Decomposition Theorem is a criterion for


diagonalisability of a linear transformation or a square matrix. Our linear transformation
T is said to be diagonalisable if there is a basis of V consisting of eigenvectors of T .
Thus T is diagonalisable if and only if there is a basis of V with respect to which its
matrix is diagonal. Correspondingly therefore, a matrix A ∈ Mn×n (F ) is said to be
diagonalisable if there exists an invertible n × n matrix P over F such that P −1 A P is
diagonal.

Theorem. Our linear transformation T : V → V is diagonalisable if and only if


mT (x) may be factorised as a product of distinct linear factors in F [x].

Proof. Suppose first that mT (x) may be factorised as a product of distinct lin-
ear factors in F [x]. This means that there exist distinct scalars λ1 , . . . , λk such that
mT (x) = (x − λ1 ) · · · (x − λk ). By the Primary Decomposition Theorem (Mark 3),
V = V1 ⊕ · · · ⊕ Vk where Vi is T -invariant for 1 6 i 6 k and T |Vi − λi IVi = 0. Thus all
vectors v in Vi satisfy the equation T v = λi v . If Bi is a basis of Vi then Bi consists
of eigenvectors of T with eigenvalue λi , and so if B := B1 ∪ · · · ∪ Bk then B is a basis
of V consisting of eigenvectors. Thus T is diagonalisable.
Now suppose conversely that T is diagonalisable and let B be a basis of V consist-
ing of eigenvectors of T . Let λ1 , . . . , λk be the distinct members of F that occur as
eigenvalues for the vectors in B . Define f (x) := (x − λ1 ) · · · (x − λk ). We propose to
show that f (T ) = 0. Let v ∈ B . Then there exists i ∈ {1, . . . , k} such that T v = λi v .
Therefore (T − λi I) v = 0 and so f (T ) v = 0. Since f (T ) annihilates all members of a
basis of V its null-space (kernel) is V , that is, f (T ) = 0. It follows that mT divides
f in F [x]. That would be enough to show that mT (x) is a product of some of the
factors (x − λi ) and therefore factorises as a product of distinct linear factors in F [x].
But in fact we can go a little further—we know that every eigenvalue is a root of mT
and therefore in fact mT = f , that is, mT (x) = (x − λ1 ) · · · (x − λk ).

A worked example—Part of FHS 2001, Paper a1, Question 1:

State a criterion for the diagonalizability of a linear transformation in terms


of its minimum polynomial, and show that if two linear transformations S and
T of V are diagonalizable and ST = T S , then there is a basis of V with
respect to which both S and T have diagonal matrices.

Response. The first clause is the ‘bookwork’ we have just treated. For the second
part, let λ1 , . . . , λk be the distinct eigenvalues of T and for λ ∈ F define
Ui := {v ∈ V | T v = λi v}

(the so-called λi -eigenspace of T ). Since T is diagonalisable there is a basis of V

25
consisting of eigenvectors and this means that V = U1 ⊕ · · · ⊕ Uk . This is part of what
we have just proved; note that the fact that eigenvectors for distinct eigenvalues are
linearly independent was what you proved in Exercise 4 of the ‘Mods Revision’ Sheet.
Now we prove (see that same Exercise 4) that each subspace Ui is S -invariant. For,
if v ∈ Ui then T (S v) = S(T v) = S(λi v) = λi S v , and so S v ∈ Ui . Now the minimal
polynomial mS|U divides mS , and mS may be factorised as a product of distinct linear
i
factors in F [x], so the same is true of mS|U . It follows that there is a basis ui 1 , . . . , ui di
i
of Ui consisting of eigenvalues of S . Note, however, that since these are non-zero vectors
in Ui they are automatically eigenvectors also for T . And now if

B := {ui j | 1 6 i 6 k, 1 6 j 6 di }

then B , being the union of bases of Ui for 1 6 i 6 k , is a basis of V , and it consists of


vectors that are simultaneously eigenvectors of S and of T . Thus the matrices of both
S and T with respect to the basis B of V are diagonal.

Exercise 28. For each of the matrices in Exercise 23, thought of as matrices with
coefficients in C, find an invertible matrix Y over C such that Y −1 XY is diagonal, or
prove that no such Y exists. Can such matrices be found, whose coefficients are real?
 
0 1 0
Exercise 29. Let A :=  1 1 0  . Is A diagonalisable when considered as a
1 1 1
matrix over the following fields: (i) C ; (ii) Q ; (iii) Z2 ; (iv) Z5 ?

Exercise 30. Let T : V → V be a linear transformation of a finite-dimensional


vector space over a field F , and T ′ : V ′ → V ′ the dual transformation. Show that T and
T ′ have the same characteristic polynomial and the same minimal polynomial. Deduce
that T is diagonalisable if and only if T ′ is diagonalisable.

Exercise 31. Let T : V → V be a linear transformation of a finite-dimensional


vector space over a field F .
(i) Let mT (x) be its minimal polynomial and let g(x) be a polynomial which is
coprime with mT (x). Show that g(T ) is invertible. [Hint: use the existence of
u(x), v(x) ∈ F [x] such that u(x)mT (x) + v(x)g(x) = 1 .]
(ii) Using the Primary Decomposition Theorem, or otherwise, deduce that V =
V1 ⊕ V2 where V1 , V2 are T -invariant subspaces (that is T (Vi ) ⊆ Vi ) such that
the restriction T1 of T to V1 is invertible and the restriction T2 of T to V2 is
nilpotent (that is, T2m = 0 for some m ∈ N).

Triangular form
There are some linear transformations T and some square matrices A that cannot
be diagonalised—for example the
µ transformation
¶ T : F 2 → F 2 given by T : (x, y) 7→
1 1
(x + y, y) or the matrix A := . Nevertheless, one can choose bases with respect
0 1
to which the matrices are particularly simple or particularly well-adapted to show the

26
behaviour of T or of A. The most sophisticated of these give the so-called Rational
Canonical Form and the Jordan Normal Form of matrices, but these are quite a long
way beyond the scope of this course. Triangular form is a step in the direction of the
Jordan Normal Form, and although it is quite a small step, it is extremely useful.
As previously, throughout this section F is a field, V is a finite-dimensional vector
space over F , n := dim V , and T : V → V is a linear transformation.

An n × n matrix A with entries aij ∈ F is said to be upper triangular if aij = 0


when i > j . The following is almost trivial:
n
Y
Observation. If A is upper triangular then cA (x) = (x − aii ), and therefore
i=1
the eigenvalues of A are its diagonal entries a11 , . . . , ann .

Our transformation T is said to be triangularisable if there is a basis of V with


respect to which its matrix is upper triangular.

Observation. The matrix of T with respect to the basis v1 , . . . , vn of V is upper


triangular if and only if each subspace hv1 , . . . , vr i (for 1 6 r 6 n) is T -invariant.

Theorem. Our transformation T is triangularisable if and only if cT (x) may be


factorised as a product of (not necessarily distinct) linear factors in F [x].

Note the comparison with the diagonalisability theorem on p. 25. There it was the
minimal polynomial that mattered, and it had to factorise completely with distinct roots
in F . Here it is the characteristic polynomial that matters, and it must factorise comp-
letely over F but can have multiple roots.

Proof. This is clear one way round: we have already seen that if the matrix of
T is the upperQtriangular matrix aij with respect to a suitable basis then cT (x) =
det(x I − A) = (x − aii ).
For the converse we use induction on the dimension n. There is nothing to prove
if n = 0 or if n = 1. So suppose that n > 2 and that the theorem is known to hold
for
Qn linear transformations of smaller-dimensional vector spaces. Suppose that cT (x) =
i=1 (x − λi ), where λ1 , . . . , λn ∈ F . Let λ be any one of the λi . For the sake of
definiteness let’s define λ := λn . Let W := Im (T − λ I) and let m := dim W . Since λ is
an eigenvalue, dim (Ker (T −λ I)) > 1 and so (by the Rank-Nullity Theorem) m 6 n−1.
Let w1 , . . . , wm be a basis of W and extend this to a basis w1 , . . . , wm , vm+1 , . . . , vn
of V . Now W is T -invariant because if w ∈ W then w = T v − λ v for some v ∈ V ,
and so T w = T 2 v − λ T v = (T − λ I) T v ∈ W . Therefore there is an m × m matrix
(aij ) such that
Xm
T wj = aij wi for 1 6 j 6 m.
i=1
Also, since for m + 1 6 j 6 n we know that T vj − λ vj ∈ W there is an m × (n − m)
matrix (bij ) such that
m
X
T vj = λ vj + bij wi for m + 1 6 j 6 n.
i=1

27
µ ¶
A1 B1
The matrix A of T with respect to this basis has the partitioned form ,
0 λ In−m
where A1 = (aij ) (the matrix of T |W with respect to the basis w1 , . . . , wm of W ),
B1 = (bij ), and the matrix in the south-west corner is an (n − m) × m zero matrix. Now

cT (x) = det(x I − A) = det(x I − A1 ) (x − λ)n−m ,

and so cT |W (x) divides cA (x) in F [x]. Therefore cT |W (x) may be written as a product
of linear factors in F [x] and so by inductive hypothesis there is a basis v1 , . . . , vm of
W with respect to which the matrix A′ of T |W is upper triangular. µ ′ Then ¶the matrix
A B′
of T with respect to the basis v1 , . . . , vm , vm+1 , . . . , vn of V is , for some
0 λ In−m
m × (n − m) matrix B ′ , and this is upper triangular, as required.

Note. In particular, if F = C then every linear transformation V → V is triangu-


larisable. For, by the so-called Fundamental Theorem of Algebra (which is much more
a theorem in Analysis than in Algebra) every polynomial with complex coefficients can
be factorised as a product of linear factors in C[x].

The proof of the theorem gives a practical method for finding a basis with respect to
which the matrix of T is triangular. We find an eigenvalue λ of T ; then (T − λI)V is a
proper T -invariant subspace of V . Find a triangularising basis there, and extend to V .

Worked example. Find a triangular form of A where


 
6 2 3
A :=  −3 −1 −1  .
−5 −2 −2

Response. We find that cA (x) = x3 − 3 x2 + 3 x − 1 = (x − 1)3 . Now if V = F 3


(column vectors) and W := (A − I) V then W is spanned by the columns of A − I , that
is of  
5 2 3
 −3 −2 −1  .
−5 −2 −3
Restricted to this subspace W the only eigenvalue of A (that is, of the linear transfor-
mation produced by multiplication of column vectors by A) is 1, and so we consider the
space (A − I) W . This is spanned by the columns of (A − I)2 , which is
 
4 0 4
 −4 0 −4  .
−4 0 −4

Choose the vector (1, −1, −1)tr to span this one-dimensional space; then by inspec-
tion we find that W is spanned by this vector together with (0, 1, 0)tr , and so as
triangularising basis for V we can take
     
1 0 0
 −1  ,  1  ,  0  .
−1 0 1

28
And in fact, with respect to this basis the matrix becomes
 
1 2 3
 0 1 −2  .
0 0 1
µ ¶
4 9
Exercise 32. Let A := , construed as a matrix over an arbitrary
−1 −2
field F . Find an invertible 2 × 2 matrix P over F for which P −1 A P is triangular. Are
there any fields F over which P can be found such that P −1 A P is diagonal?

Exercise 33. For each of the matrices in Exercise 24, thought of as matrices with
coefficients in C, find an invertible matrix Y over C such that Y −1 XY is upper trian-
gular. Can such matrices be found, whose coefficients are real?

The Cayley–Hamilton Theorem


We come now to one of the classic and lovely theorems of linear algebra:

Cayley–Hamilton Theorem. Let V be a finite-dimensional vector space over a


field F , and let T : V → V be a linear transformation. Then cT (T ) = 0.

Equivalently: If A ∈ Mn×n (F ) then cA (A) = 0.

Equivalently: The minimal polynomial mT (x) divides the characteristic polyno-


mial cT (x) in F [x].

Equivalently: If A ∈ Mn×n (F ) then the minimal polynomial mA (x) divides the


characteristic polynomial cA (x) in F [x].

Note. Historically this was a theorem about square matrices rather than linear
transformations. Indeed, it was the second of the four assertions above. For reasons
both historical and practical, in these notes I propose to work with n × n matrices over
F . The translation to linear transformations T : V → V (for a finite-dimensional vector
space V over F ) is, however, absolutely routine.
There are many proofs of the theorem. It lies deeper than other parts of this course,
and some of those proofs give little insight into just why the theorem holds. They merely
prove it. In order to give you some insight into why it is true I propose to begin with three
simple but suggestive observations which lead directly to an easy proof of the theorem
over any subfield of C.

Observation 1. Let A, B ∈ Mn×n (F ). If B = P −1 A P , where P is invertible in


Mn×n (F ), then cA (A) = 0 if and only if cB (B) = 0.

Proof. Let P be an invertible n × n matrix over F and let B := P −1 A P . We know


then that if f (x) := cA (x) = det(x I − A) then also f (x) = cB (x) = det(x I − B). It is
an easy calculation (see Exercise 22 on p. 20) that

f (B) = f (P −1 A P ) = P −1 f (A) P

29
and so f (B) = 0 if and only if f (A) = 0. That is, cA (A) = 0 if and only if cB (B) = 0,
as required.

This can be thought of in another way. Choose a basis for an n-dimensional vector
space over V and let T : V → V be the linear transformation whose matrix is A with
respect to this basis. The correspondence between matrices and linear transformations
says that cA (A) = 0 if and only if cT (T ) = 0. But now if B = P −1 A P then B is
simply the matrix of the same linear transformation T with respect to a different basis,
so cB (B) = 0 if and only if cT (T ) = 0. Therefore cA (A) = 0 if and only if cB (B) = 0.

Observation 2. If A ∈ Mn×n (F ) and A is diagonalisable then cA (A) = 0.

For, by the previous observation this is true if and only if it is true for diagonal
matrices. Suppose then that, in an obvious notation, A = Diag (λ1 , λ2 , . . . , λn ). Then
cA (x) = (x−λ1 ) (x−λ2 ) · · · (x−λn ), and so cA (A) = (A−λ1 I) (A−λ2 I) · · · (A−λn I).
Each factor (A − λi I) is a diagonal matrix, and its ith diagonal entry is 0. Therefore
cA (A) is a diagonal matrix and its ith diagonal entry is 0 for every i; that is, cA (A) = 0.

As it happens, there is a strong sense in which ‘almost all’ matrices are diagonalisable,
and therefore these simple arguments have already proved the Cayley–Hamilton Theorem
for ‘most’ matrices (and therefore for ‘most’ linear transformations). We can however
take one further step and prove the theorem for triangularisable matrices.

Observation 3. If A ∈ Mn×n (F ) and A is triangularisable then cA (A) = 0.

Proof. Again, by the first observation above, we may assume that in fact A is
upper triangular. Then this observation was Qn 6 on the preliminary
µ ¶ exercise sheet
A1 B
(Mods Revision). We think of A as partitioned in the form where A1 is an
0 λn
(n − 1) × (n − 1) upper triangular matrix, B is a (n − 1) × 1 column vector, and 0
denotes the 1 × (n − 1) zero row vector. Then
det(x I − A) = det(x I − A1 ) × det(x − λn ),
that is, cA (x) = cA1 (x) (x − λn ). For any polynomial f ∈ F [x] we find that
µ ¶
f (A1 ) C
f (A) = ,
0 f (λn )
where C is some (n − 1) × 1 column vector, and 0 again denotes the 1 × (n − 1) zero row
vector. As inductive assumption we may assume that cA1 (A1 ) = 0. Then, for suitable
column vectors C , C ′ ,
cA (A) = cA1 (A) (A − λn I)
A1 − λn I C ′
µ ¶µ ¶
cA1 (A1 ) C
=
0 cA1 (λn ) 0 0
A1 − λn I C ′
µ ¶µ ¶
0 C
=
0 cA1 (λn ) 0 0
µ ¶
0 0
= .
0 0

30
Corollary. The Cayley–Hamilton Theorem holds for matrices over C and for
matrices over any subfield F of C, such as Q or R.

Proof. We know (see the note on p. 28) that every matrix over C is trangularisable,
and so the result follows immediately from Observation 3.
If A ∈ Mn×n (F ), where F is a subfield of C then we can think of A as a matrix
over C. As such it is known to be annihilated by its characteristic polynomial, which is
what we wanted to show.

It is worth a digression to take a brief look at the original source of the theorem. This
is A. Cayley, ‘A memoir on the theory of matrices’, Phil. Trans Roy. Soc. London, 148
(1858), 17–37, which is reprinted in The Collected Mathematical Papers of Arthur Cayley,
Vol. II, pp. 475–495. Hamilton’s connection with the theorem seems to have been more
tenuous and to have come from one of his theorems about his quaternions. Here is a
quotation from Cayley’s introduction to his paper:

I obtain the remarkable theorem that any matrix whatever satisfies an algeb-
raical equation of its own order, [ . . . ] viz. the determinant, formed out of the matrix
diminished by the matrix considered as a single quantity involving the matrix unity,
will be equal to zero.

And here is an extract from §§21–23 (I have tried to reproduce Cayley’s notation for
matrices faithfully, with the first row enclosed in round brackets, the rest enclosed in
vertical bars):

21. The general theorem before referred to will be best understood by a com-
plete development of a particular case. Imagine a matrix

( a, b ),
c, d

and form the determinant ˛ ˛


˛ a − M, b ˛
˛ ˛,
˛ c , d−M ˛
the developed expression of this determinant is

M 2 − (a + d) M 1 + (ad − bc) M 0 ;

the values of M 2 , M 1 , M 0 are

( a2 + bc , b(a + d) ), ( a, b ), ( 1, 0 ),
2
c(a + d), d + bc c, d 0, 1

and substituting these values the determinant becomes equal to the matrix zero,
[ . . . ].

[. . .]

23. I have verified the theorem, in the next simplest case of a matrix of the
order 3 , viz. if M be such a matrix, suppose

( a, b, c ),
d, e, f
g, h, i

31
then the derived determinant vanishes, or we have
˛ ˛
˛ a − M, b , c ˛ = 0,
˛ ˛
˛ d , e − M, f ˛
˛ ˛
˛ g , h , i−M ˛

or expanding

M 3 −(a+e+i) M 2 +(ei+ia+ae−f h−cg−bd) M −(aei+bf g+cdh−af h−bdi−ceg) = 0 ;

but I have not thought it necessary to undertake the labour of a formal proof of the
theorem in the general case of a matrix of any degree.

What a charming piece of 19th Century chutzpah: “I have not thought it necessary
to undertake the labour of a formal proof of the theorem in the general case of a matrix of
any degree.” It is very unlikely that the method which Cayley uses for 2×2 matrices, and
sketches for the 3 × 3 case could be practical for the n × n case: even if one could write
down explicitly the characteristic polynomial of an n × n matrix A, it seems unrealistic
to expect to write down the (i, j) coefficient of a general power Ak for all k up to n, and
then evaluate cA (A) in the way that it is possible in the 2×2 case. So the fact is, he can’t
really have had a proof. But there’s another fact: he may not have appreciated rigorous
thinking in the same way as Oxford students now do (the poor chap went to Cambridge
and missed out on the Oxford experience) but he did have a wonderful insight. He knew
that the theorem was right, and for him proof, though it would have been nice to have,
was less important than having an insight which he could use in all sorts of ways to solve
other mathematical problems.

That is as far as I am going to go with the Cayley–Hamilton Theorem. We have


a proof of it over any field F contained as a subfield of C. That proof can, once one
knows more about fields in general, be easily adapted to other fields. For a general
proof using adjoint matrices (a proof which, though pleasantly short, to me seems rather
unnatural and gives no insight into why the theorem holds) see, for example, T. S. Blyth
& E. F. Robertson, Basic Linear Algebra, p. 169 or Richard Kaye & Robert
Wilson Linear Algebra, p. 170. For a general proof using ‘rational canonical form’
see, for example, Peter J. Cameron, Introduction to Algebra, p. 154 or Charles
W. Curtis, Linear Algebra, p. 226.

Further exercises II
Exercise 35. Let V be a finite-dimensional vector space over a field F and let
T : V → V be a linear transformation.
(i) Suppose that F = C and that T 4 = T . Show that T is diagonalisable.
(ii) Now suppose that F = Z3 and that mT (x) = x4 − x. Is T diagonalisable?

Exercise 36 Let V be a finite-dimensional vector space and let S, T : V → V


be linear transformations. Let m1 and m2 denote the minimal polynomials of ST
and T S respectively. By considering relations such as T (ST )r S = (T S)r+1 show that
m2 (x) = xi m1 (x), where i ∈ {1, 0, −1}. Show that λ is an eigenvalue of ST if and only
if λ is an eigenvalue of T S .

32
Exercise 37 [FHS 1995, A1, 3]. Let V be a finite-dimensional vector space over a
field F , and let T : V → V be a linear transformation. Let v ∈ V , v 6= 0, and suppose
that V is spanned by {v, T v, T 2 v, . . . , T j v, . . .}.

(i) Show that there is an integer k > 1 such that v, T v, T 2 v, . . . , T k−1 v are linearly
independent but T k v = α0 v + α1 T v + · · · + αk−1 T k−1 v for some α0 , α1 , . . . ,
αk−1 ∈ F .
(ii) Prove that {v, T v, T 2 v, . . . , T k−1 v} is a basis for V .

Let T have minimum polynomial m(x) and characteristic polynomial c(x).

(iii) Prove that m(x) = xk − αk−1 xk−1 − · · · − α1 x − α0 .


(iv) By considering the matrix of T with respect to the basis {v, T v, T 2 v, . . . , T k−1 v},
prove (without using the Cayley–Hamilton Theorem) that m(x) = c(x).
 
1 1 0 1
0 1 0 1
Now let A be the matrix  0 0 1 1  . Show that there does not exist a column

0 0 0 1
vector v ∈ R4 such that R4 is spanned by {v, Av, A2 v, . . . , Aj v, . . .}.

33
Part III: Inner Product Spaces
In the first two parts of this course the focus has been on linear algebra—vector spaces,
linear transformations and matrices—over an arbitrary field F . We come now to that
part of linear algebra which is rather more geometric, the theory of inner products and
inner product spaces. Inner products are generalisations P of the familiar dot product u · v
of n-vectors with real coordinates, defined by u · v = xi yi (where u has coordinates
x1 , . . . , xn and v has coordinates y1 , . . . , yn ). Although this theory can be extended
to arbitrary fields, or at least parts of it can, its most important and most natural
manifestations are over R and C. Therefore from now on we restrict to these fields.

Real inner product spaces and their geometry


Let V be a vector space over R. An inner product on V is a function B : V ×V → F
such that for all u, v, w ∈ V and all α, β ∈ R ,

(1) B(α u + β v, w) = α B(u, w) + β B(v, w)


(2) B(u, v) = B(v, u) [B is symmetric]
(3) if u 6= 0 then B(u, u) > 0 [B is positive definite]

Note: From (1) and (2) follows

(1′ ) B(u, α v + β w) = α B(u, v) + β B(u, w)

A function V × V → R satisfying (1) and (1′ ) is said to be bilinear and called a bilinear
form. Thus an inner product on a real vector space is a positive definite symmetric
bilinear form. A real inner product space is a vector space over R equipped with an
inner product.

Notation. Often we find hu, vi or hu | vi used to denote what here is B(u, v). I
propose to use the former. In an inner product space we define
1
||u|| := hu, ui 2 .

Thus ||u|| > 0 and ||u|| = 0 if and only if u = 0; this is known as the length or as the
norm of the vector u.

Example 1: V = Rn and hu, vi = u · v = utr v . This example is our standard real


inner product space and we call it euclidean space.

Example 2: V = C[a, b] (continuous functions f : [a, b] → R) and


Z b
hf, gi = f (t) g(t) dt .
a

Note. From condition (1′ ) for an inner product it follows that hu, 0i = 0 for any
u ∈ V . Conversely, Suppose that u ∈ V and hu, vi = 0 for all v ∈ V . Then in particular
hu, ui = 0 and so u = 0. This property of an inner product is expressed by saying that
it is a non-degenerate or non-singular bilinear form.

34
Let V be a real inner product space. If u, v ∈ V and hu, vi = 0 then we say that
u, v are orthogonal. For X ⊆ V we define
X ⊥ := {v ∈ V | hu, vi = 0 for all u ∈ X }.
For u ∈ V we write u⊥ for {u}⊥ .

Lemma. Let V be a real inner product space and let X ⊆ V . Then X ⊥ is a


subspace of V .

Proof. We have already seen in the note above that 0 ∈ X ⊥ . Also, if v, w ∈ X ⊥


and α, β ∈ R then hu, α v + β wi = α hu, vi + β hu, wi = 0 for any u ∈ X , and so
α v + β w ∈ X ⊥ . Thus X ⊥ is a subspace.

Lemma. Let V be a real inner product space and let u ∈ V . Then


V = Span(u) ⊕ u⊥ .

Proof. If u = 0 there is nothing to prove since then u⊥ = V . So suppose that


u 6= 0. If u1 := ||u||−1 u then
||u1 || = hu1 , u1 i = ||u||−2 hu, ui = 1.
Moreover, Span(u) = Span(u1 ) and u⊥ = u⊥ 1 . Thus we may assume without loss of
generality that ||u|| = 1. Now for v ∈ V define α := hu, vi and w := v − α u. Then
hu, wi = hu, v − α ui = hu, vi − α hu, ui = hu, vi − α = 0,
and so w ∈ u⊥ . This shows that V = Span(u) + u⊥ . If x ∈ Span(u) ∩ u⊥ then x = λ u
for some λ ∈ R but also hu, xi = 0, so 0 = hu, λ ui = λ hu, ui = λ, that is, x = 0.
Therefore also Span(u) ∩ u⊥ = {0} and so V = Span(u) ⊕ u⊥ as we claimed.

Let V be a real inner product space. Vectors u1 , u2 , . . . , uk in V are said to form


an orthogonal set if hui , uj i = 0 whenever i 6= j .

Lemma. An orthogonal set of non-zero vectors in a real inner product space is lin-
early independent.

Proof. Let V be a real inner product space and let u1 , u2 , . . . , uk be an orthogonal


set in V \ {0}. Suppose that α1 u1 + α2 u2 + · · · + αk uk = 0 where α1 , α2 , . . . , αk ∈ R.
Then for 1 6 i 6 k we have
DX E X
0= αj uj , ui = αj huj , ui i = αi hui , ui i

by orthogonality, and so αi = 0 since hui , ui i =


6 0. Thus u1 , u2 , . . . , uk are linearly
independent.

We are particularly interested in orthogonal sets of vectors of length 1. Vectors


u1 , u2 , . . . , uk in a real inner product space V are said to form an orthonormal set if
(
1 if i = j ,
hui , uj i =
0 if i 6= j .

35
Theorem. Let V be a finite-dimensional real inner product space, let n := dim V ,
and let u ∈ V \{0}. There is an orthonormal basis v1 , v2 , . . . , vn in which v1 = ||u||−1 u.

Proof. This is trivially true if n = 0 or if n = 1. Now suppose that n > 1 and as


inductive hypothesis suppose it is true for real inner product spaces of dimension n − 1.
Define v1 := ||u||−1 u and V1 := v1⊥ . We know then that V = Span(v1 ) ⊕ V1 and, in
particular, dim V1 = n − 1. Obviously, the restriction of our inner product to V1 is an
inner product on V1 . Starting from any non-zero vector in V1 the inductive hypothesis
yields that there is an orthonormal basis v2 , . . . , vn of V1 . Then, since hv1 , vi i = 0 for
2 6 i 6 n, the vectors v1 , v2 , . . . , vn form an orthonormal basis of V1 .

Theorem. Let V be a real inner product space. If U is a finite-dimensional sub-


space then V = U ⊕ U ⊥ .

Proof. Certainly U ∩ U ⊥ = {0} because if u ∈ U ∩ U ⊥ then hx, xi = 0. What needs


to be proved therefore is that V = U + U ⊥ .
The restriction to U of our inner product on V is obviously an inner product on U .
Since U is finite-dimensional we know it has an orthonormal basis P u1 , . . . , un , say. For
v ∈ V define xi := hui , vi for 1 6 i 6 n, and then define u := i xi ui and w := v − u.
Now
X X
huj , wi = huj , v − xi ui i = huj , vi − xi huj , ui i = huj , vi − xj huj , uj i = 0 ,
i i

and so w ∈ U ⊥ since w is orthogonal to all elements of a basis of U . This shows that


v = u + w ∈ U + U ⊥ , and completes the proof.

Exercise 38. Let V be a real inner product space and let U be a subspace of V .
Show that (U ⊥ )⊥ = U .

Theorem. Let V be a finite-dimensional real inner product space and let v1 , . . . , vn


be an orthonormal basis of V . Let u, w ∈ V and suppose that

u = x1 v1 + · · · + xn vn , w = y1 v1 + · · · + y n vn ,

where xi , yj ∈ R for all i, j . Then


(1) xi = hu, vi i ;
³X ´1
2 2
(2) ||u|| = |xi | ;
X
(3) hu, wi = xi yi .
P P
Proof. First, hu, vi i = h j xj vj , vi i = j xj hvj , vi i = xi since the terms in which
j 6= i are 0 and hvi , vi i = 1.
Next, ||u||2 = h i xi vi , j xj vj i =
P P P P 2
i,j xi xj hvi , vj i = i xi and therefore
P 2 1/2
||u|| = ( i xi ) , as stated in (2).
P P P P
Finally, hu, wi = h i xi vi , j yj vj i = i,j xi yj hvi , vj i = i xi yj as stated in (3).

36
The significance of this theorem is twofold. First, it shows the value of an orthonormal
basis for computing norms and inner products. Secondly, it shows that if we use an
orthonormal basis to identify our inner product space with Rn then our abstract inner
product is identified with the familiar dot product:

Corollary. Let V be a real inner product space of dimension n. Then there is an


isomorphism (bijective linear transformation) ϕ : Rn → V such that

if ϕ((x1 , . . . , xn )tr ) = u and ϕ((y1 , . . . , yn )tr ) = v then hu, vi = xi yi .


P

If V , W are real inner product spaces, with inner products h , iV and h , iW


respectively, we say that V is isometric with W if there exists a linear isomorphism
ϕ : V → W such that hϕ u, ϕ viW = hu, viV for all u, v ∈ V . Thus the above corollary
says that any n-dimensional real inner product space is isometric with Rn equipped with
its usual scalar (dot) product.

Exercise 39. Let V be a vector space over R, and let hu, vi1 and hu, vi2 be
inner products defined on V . Prove that if hx, xi1 = hx, xi2 for all x ∈ V , then
hu, vi1 = hu, vi2 for all u, v ∈ V .

Complex inner product spaces


Let V be a vector space over C. An inner product on V is a function B : V × V → C
such that for all u, v, w ∈ V and all α, β ∈ C
C(1) B(α u + β v, w) = α B(u, w) + β B(v, w)
C(2) B(u, v) = B(v, u) [B is conjugate symmetric]
C(3) if u 6= 0 then B(u, u) > 0 [B is positive definite]

Note. From C(1) and C(2) follows


C(2′ ) B(u, α v + β w) = ᾱ B(u, v) + β̄ B(u, w).

We express this by saying that B is semilinear or conjugate linear in its second variable,
or that B is sesquilinear (one-and-a-half linear). A complex inner product is often called
a Hermitian form in honour of Charles Hermite (1822–1901). A complex inner product
space is a complex vector space equipped with an inner product in this sense.

Notation. As in the real case hu, vi or hu | vi are often used for inner products.
1
And as in the real case we define ||u|| := hu, ui 2 .

Example 1: V = Cn and hu, vi = utr v̄ . This example is our standard complex


inner product space.

Example 2: V = {f : [a, b] → C | f is continuous} and


Z b
hf, gi = f (t) g(t) dt .
a

37
Let V be a complex inner product space. Orthogonality is defined just as in real inner
product spaces. A finite-dimensional complex inner product space has an orthonormal
basis. If U 6 V and U is finite-dimensional then V = U ⊕ U ⊥ . The proofs of these
simple facts are almost exactly the same as in the real case and are offered as exercises
for the reader:

Exercise 40. Prove


(i) that if V is a finite-dimensional complex inner product space then V has an
orthonormal basis;
(ii) that if V is a complex inner product space and U is a finite-dimensional subspace
then V = U ⊕ U ⊥ .

Theorem. Let V be a finite-dimensional complex inner product space and let


v1 , . . . , vn be an orthonormal basis of V . Let u, w ∈ V and suppose that

u = x1 v1 + · · · + xn vn , w = y1 v1 + · · · + y n vn ,

where xi , yj ∈ C for all i, j . Then


(1) xi = hu, vi i ;
à !1
X 2
2
(2) ||u|| = |xi | ;
X
(3) hu, wi = xi yi .

Exercise 41. Prove this theorem.

Isometry of complex inner product spaces is defined exactly as in the real case and
we have the following important consequence of the theorem:

Corollary. Let V be a complex inner product space of dimension P n. Then V is


isometric with Cn with its standard inner product: hx, yi = xtr ȳ = xi yi where x is
the column vector (x1 , . . . , xn ) and y is the column vector (y1 , . . . , yn )tr .
tr

Exercise 42. Let V be a vector space over C, and let hu, vi1 and hu, vi2 be
inner products defined on V . Is it true that if hx, xi1 = hx, xi2 for all x ∈ V , then
hu, vi1 = hu, vi2 for all u, v ∈ V ? [Compare Exercise 39.]

The Gram–Schmidt process


Although I have introduced real and complex inner product spaces separately, we
have already seen that the two theories are very similar. Indeed, they have much in
common. Since complex conjugation leaves real numbers unchanged, one could in fact,
had one so wished, have used C(1), C(2) and C(3) to define inner products in the real
case as well as the complex case. I prefer not to do that (indeed, I’m inclined to think
that that would be ill-advised) because the real world and the complex world are quite

38
distinct entities, and although they are of course intimately related to each other it seems
best to keep them separate. Nevertheless, from now on I propose to treat their theories
together.

Theorem. Let V be a real or complex inner product space and let u1 , . . . , un be


linearly independent vectors in V . Then there exists an orthonormal set v1 , . . . , vn in
V such that
Span(v1 , . . . , vk ) = Span(u1 , . . . , uk ) for 0 6 k 6 n .

Proof. Since u1 6= 0 we can define v1 := ||u1 ||−1 u1 . Then ||v1 || = 1 and Span(v1 ) =
Span(u1 ). Suppose as inductive hypothesis that 1 6 k < n and we have found an
orthonormal set v1 , . . . , vk in V such that Span(v
P 1 , . . . , vk ) = Span(u1 , . . . , uk ). For
1 6 i 6 k define αi := huk+1 , vi i and w := i αi vi . (We should think geometrically
of w as the orthogonal projection of uk+1 into the subspace spanned by v1 , . . . , vk ,
which is the same as the subspace spanned by u1 , . . . , uk .) Now w ∈ Span(v1 , . . . , vk ) =
Span(u1 , . . . , uk ) whereas uk+1 ∈
/ Span(u1 , . . . , uk ). Therefore if v := uk+1 − w then
v 6= 0 and we can define vk+1 := ||v||−1 v . Then ||vk+1 || = 1. Also,

hv, vi i = huk+1 − w, vi i = huk+1 , vi i − hw, vi i


DX k E
= huk+1 , vi i − αj vj , vi
j=1
= huk+1 , vi i − αi = 0

and it follows that also hvk+1 , vi i = 0 for 1 6 i 6 k . Thus v1 , . . . , vk , vk+1 is an


orthonormal set. Finally,

Span(v1 , . . . , vk , vk+1 ) = Span(v1 , . . . , vk , v)


= Span(u1 , . . . , uk , uk+1 − w)
= Span(u1 , . . . , uk , uk+1 ),

where the last equation comes from the fact that w ∈ Span(u1 , . . . , uk ).

Note 1. The construction in the proof is known as the Gram–Schmidt orthogonal-


isation process.

Note 2 If the Gram–Schmidt process gives the orthonormal basis v1 , . . . , vn from


a basis u1 , . . . , un and T is the transition matrix from the latter to the former then T
is positive upper triangular—that is, upper triangular P with positive real diagonal entries.
For, in the course of the proof we found that vj = ji=1 βij ui where βjj = ||v||−1 for a
certain non-zero vector v , and therefore βjj > 0.

A worked example—FHS 1999, Paper a1, Qn 3:

Let V be a finite-dimensional vector space over R . Define what is meant by


saying that V has an inner product h , i over R .
Let {b1 , b2 , . . . , bn } be a basis for V . Prove that there exists an ortho-
normal basis {e1 , e2 , . . . , en } of V such that for each k = 1, 2, . . . , n the set

39
{e1 , e2 , . . . , ek } is a basis for the subspace spanned by {b1 , b2 , . . . , bk }. Deduce
that there are tij ∈ R with tii 6= 0 such that

b1 = t11 e1 , b2 = t12 e1 + t22 e2 , ... , bk = t1k e1 + · · · + tkk ek , ...


Pn
(for 1 6 k 6 n ). Show that hbi , bj i = k=1 hbi , ek ihbj , ek i (for 1 6 i, j 6 n ) .
Now let G be the n × n matrix with (i, j)th entry equal to hbi , bj i. Show
n
Y
that detG = tii2 and deduce that G is non-singular. Show also that detG 6
i=1
n
Y
hbi , bi i.
i=1

Response. The first three instructions ask for bookwork that has just been treated
PjFor the fourth note that since {e1 , e2 , . . . , en } is an orthonormal basis of V and
above.
bj = r=1 trj er we have hbj , ei i = tij . So we calculate as follows:
DX X E X n
X
hbi , bj i = tri er , tsj es = tri tsj her , es i = tki tkj ,
r s r,s k=1
Pn
since her , es i is 0 if r 6= s and is 1 if r = s = k . That is, hbi , bj i = k=1 hbi , ek ihbj , ek i,
as required.

Now let G be the matrix (hbi , bj i) [which is known as the Gram matrix of the inner
product with respect to the basis {b1 , b2 , . . . , bn }], and let T be the matrix (tij ). The
formula that has just been proved tells us that G = Q T tr T . Therefore detG = Qn(detT )2 .
But T is upper triangular and therefore detT = tii . Hence detG = i=1 tii as 2

required. Since tii 6= 0 for all relevant i we see that detG 6= 0 and so G is non-singular.

i = r t2ri and so, since the coefficients are real,


P P P
Finally, hbi , bi i = h r tri er , Qs tsi esQ
t2ii 6 hbi , bi i. Therefore detG = t2ii 6 hbi , bi i, as required.

Exercise 43. Let V beR the vector space of polynomials of degree 6 3 with real
1
coefficients. Define hf, gi := −1 f (t) g(t) dt. Show that this is an inner product on V .
Use the Gram–Schmidt process to find an orthonormal basis for V .
R1
Exercise 44. How does the answer to Exercise 43 change if hf, gi := 0 f (t) g(t) dt?

Bessel’s Inequality
Bessel’s Inequality. Let V be a real or complex inner product space and let
v1 , . . . , vm be an orthonormal set in V . If u ∈ V then
m
X
|hu, vi i|2 6 ||u||2 .
1

Equality holds if and only if u ∈ Span(v1 , . . . , vm ).

40
Proof. In the following calculation
Pm we shall use notation appropriate to the complex
case. For u ∈ V define w := u − i=1 hu, vi i vi . Then hw, wi > 0, but also
D m
X m
X E
hw, wi = u − hu, vi i vi , u − hu, vi i vi
i=1 i=1
D m
X E m
DX E m
DX m
X E
= hu, ui − u, hu, vi i vi − hu, vi i vi , u + hu, vi i vi , hu, vj i vj
i=1 i=1 i=1 j=1
m
X m
X m
X
= hu, ui − hu, vi i hu, vi i − hu, vi i hvi , ui + hu, vi i hu, vj ihvi , vj i.
i=1 i=1 i,j=1

Since hvi , ui = hu, vi i and hvi , vj i is 1 if i = j and 0 otherwise, we find that


m
X
hw, wi = hu, ui − |hu, vi i|2 .
i=1
Pm 2
Therefore hu, ui > i=1 |hu, vi i| as the theorem states.
Suppose that equality holds. The argument shows that then w = P 0 and so
u ∈ Span(v1 , . . . , vm ). Conversely, if u ∈ Span(v1 , . . . , vm ), so that u = xi vi for
scalars x1 , . . . , xmP then, as we have seen before, xi = hu, vi i and
suitable P
hu, ui = |xi |2 , so hu, ui = m 2
i=1 |hu, vi i| .

Note that in the real P case absolute values are not needed on the left side of the
m 2 2
inequality. It states that 1 hu, vi i 6 ||u|| . Also, complex conjugation is not needed
in the proof. Nor is it harmful, though if a friend (such as a tutor or an examiner)
specifically asks for a proof of Bessel’s Inequality for real inner product spaces then one
should expound the proof without it.

Example: V = Rn with the usual inner product; vk = ek for k = 1, 2, . . . , m ;


u = (x1 , x2 , . . . , xn )tr . In this case hu, vi i = xi and Bessel’s Inequality tells us that
m
X n
X
x2i 6 x2i ,
i=1 1

which is of course obvious, but gives us some insight into what the theorem is saying in
its general and abstract setting.

Example: V is the space of continuous functions f : [0, 1] → C;


Z 1

hf, gi = f (t) g(t)dt ; vk (t) = e2πikt for k ∈ Z.


0

If r, s ∈ Z then Z 1 ½
2πi(r−s)t 1 if r = s,
hvr , vs i = e dt =
0 0 if r 6= s,
so {vk (t) | k ∈ Z} is an orthonormal set in V . For f ∈ V define
Z 1
ck := hf, ek i = f (t) e−2πikt dt .
0

41
Then Bessel’s Inequality tells us that for m, n ∈ N
n
X Z 1
|ck |2 6 |f (t)|2 dt,
−m 0

a fact which is of immense importance in Analysis, in the theory of Fourier series in


particular.

The Cauchy–Schwarz Inequality


Cauchy–Schwarz Inequality. Let V be a real or complex inner product space
and let u, v ∈ V . Then
|hu, vi| 6 ||u||.||v|| .
Moreover, equality holds if and only if u, v are linearly dependent.

Proof. If v = 0 there is nothing to prove, so we may suppose that v 6= 0. Define


v1 := ||v||−1 v . Then, trivially, {v1 } is an orthonormal set in V . By Bessel’s Inequality
|hu, v1 i| 6 ||u||. But |hu, v1 i| = ||v||−1 hu, vi. Since ||v|| > 0 the inequality can be
multiplied by ||v|| and we get that |hu, vi| 6 ||u||.||v||, as required.
If v = 0 then equality holds and u, v are linearly dependent. Otherwise equality
holds if and only if u ∈ Span(v1 ), that is, if and only if u, v are linearly dependent.

Classic alternative proof for real inner product spaces. Suppose that V is a real
inner product space and let u, v ∈ V . If v = 0 the result holds for trivial reasons, so we
suppose that v 6= 0. For x ∈ R define

f (x) := ||u − x v||2 = hu − x v, u − x vi = ||u||2 − 2 x hu, vi + x2 ||v||2 .

Thus f (x) is a quadratic function of x and it is positive semi-definite in the sense that
f (x) > 0 for all x ∈ R. Therefore its discriminant is 6 0, and so hu, vi2 6 ||u||2 ||v||2 ,
that is, |hu, vi| 6 ||u||.||v||, as required.

Worked example [FHS 1998, Paper a1, Qn 3].

Let V be a vector space over the complex numbers. Define what is meant by
the statement that V has an inner product over C , and explain how this can be
used to define the norm ||v|| of a vector v in V .
Prove Bessel’s Inequality, that if {u1 , . . . , un } is an orthonormal set in V and
v ∈ V then
Xn
|hui , vi|2 6 ||v||2 ,
i=1

where h , i denotes the inner product. Deduce, or prove otherwise, the Cauchy–
Schwarz Inequality,

|hu, vi| 6 ||u|| ||v|| for any u, v ∈ V .

(i) Show that if a1 , . . . , an and b1 , . . . , bn are any complex numbers then


¯X ¯2 ³ X ´³X ´
ai bi ¯ 6 |ai |2 |bi |2 .
¯ ¯
¯

42
(ii) Show that if a1 , . . . , an are strictly positive real numbers then
³X ´³X 1 ´
ai > n2 .
ai

(iii) Show that if f, g are continuous real-valued functions on an interval [a, b] ⊆


R , then
Z b Z b ³Z b ´2
2 2
|f (x)| dx |g(x)| dx > f (x) g(x) dx .
a a a

Response. The first parts are ‘bookwork’ done above (but perhaps now is a good
moment for you to close this book and try to reconstruct the proofs for yourself).
For (i) take V := Cn , the space of 1 × n column vectors with its usual hermitian
)tr and v := (b1 ,P
inner product. Take u := (a1 , . . . , anP . . . , bn )tr in the Cauchy–Schwarz
tr 2 |ai |2 and ||v||2 =
P 2
Inequality. Since hu, viP= u v̄ =P ai bi , P ||u|| = |bi | , this
2 2 2
theorem tells us that | ai bi | 6 ( |ai | ) ( |bi | ), as required.
1/2 −1/2
Now for (ii) replace ai and bi in (i) with ai and ai respectively to see that
n
³X n
³X n
´³X ´
1)2 6 ai a−1
i ,
i=1 i=1 i=1

1/ai ) > n2 .
P P
that is, ( ai )
For (iii) take V to be the vector space of complex valued continuous functions on
Rb
[a, b] with inner product hf, gi := a f (t) g(t) dt. It needs to be checked that this does
define an inner product on V [bookwork]. Then the Cauchy–Schwarz inequality says
Rb Rb Rb
that for any f, g ∈ V , ( a |f (x)|2 dx) ( a |g(x)|2 dx) > ( a f (x) g(x) dx)2 . Specialising
to the case where f , g are real-valued, we get the required inequality
³Z b ´³ Z b ´ ³Z b ´2
2 2
f (x) dx g(x) dx > f (x) g(x) dx .
a a a

[Comment: note that in the statement of part (iii) of the question the modulus bars
are unnecessary. Also, it seems a bit odd to define V in this last part to consist of
complex-valued functions on [a, b] when the question asks about real-valued functions,
but since the first part asks for a proof of Bessel’s Inequality and the Cauchy–Schwarz
Inequality for complex inner product spaces, that is what we have available.]

Isometries of inner product spaces


Let V be a real or complex inner product space of finite dimension n. An isometry
of V is a linear transformation P : V → V such that hP u, P vi = hu, vi for all u, v ∈ V .

Note 1. An isometry is invertible. For, if P is an isometry of V and u ∈ Ker P


then hu, ui = hP u, P ui = 0 and so u = 0. Thus Ker P = {0} and so P is injective and
since V is finite-dimenisonal it must also be surjective, hence invertible.

Note 2. It follows that the isometries of our inner product space V form a group:
certainly I is an isometry; if P is an isometry then P −1 is an isometry; if P1 , P2 are
isometries then so is P2 ◦ P1 .

43
Note 3. In the real case an isometry is known as an orthogonal transformation;
the group is the orthogonal group denoted O(V ). The group O(Rn ) is often denoted
O(n), sometimes O(n, R) or On (R).

Note 4. In fact, O(Rn ) = {A ∈ Mn×n (R) | A−1 = Atr }. For,

hA u, A vi = (A u)tr (A v) = utr Atr A v

and this is utr v for all u, v ∈ Rn if and only if Atr A = I .

Note 5. Thus an n × n matrix A with real entries is orthogonal if and only if the
columns of A form an orthonormal basis for Rn .

Note 6. In the complex case an isometry is known as a unitary transformation; the


group is the unitary group U(V ). The group U(Cn ) is often denoted U(n), sometimes
U(n, C) or Un (C).

Note 7. U(Cn ) = {A ∈ Mn×n (C) | A−1 = Ātr }. The proof is similar to the real
case and is offered as an exercise:

Exercise 45. Prove that U(Cn ) = {A ∈ Mn×n (C) | A−1 = Ātr }.

Note 8. Thus an n × n matrix A with complex entries is unitary if and only if the
columns of A form an orthonormal basis for Cn .

Exercise 46. Let X be a non-singular n × n matrix over R.


(i) Use the Gram–Schmidt orthogonalisation process to show that there exist n × n
matrices P , U over R such that U is upper triangular with positive entries on its
main diagonal, P is orthogonal, and X = P U .
(ii) Suppose that X = P U = Q V , where U, V are upper triangular with positive
entries on their main diagonals, and P, Q are orthogonal. How must Q be related
to P , and V to U ?
(iii) What are the corresponding results for non-singular matrices over C?

Observation. the group O(n) is a closed bounded subset of Mn×n (R). Similarly,
U(n) is a closed bounded subset of Mn×n (C).

The group O(n) is closed in Mn×n (R) because if Am are orthogonal matrices and
A = limm→∞ Am then Atr A = lim Atr tr
m lim Am = lim(Am Am ) = lim I = I , which shows
that
P A is orthogonal. It is bounded because if A = (aij ) ∈ O(n) then for each i we have
a2 = 1 and so |a | 6 1 for all relevant i, j . The proof for U(n) is almost the same.
j ij ij

Representation of linear functionals on an inner product space


We come now to a very useful result often known as the Riesz Representation Lemma
(although that name more properly belongs to a theorem in functional analysis about

44
infinite dimensional spaces—so-called Hilbert spaces). This will be the foundation for
our treatment of adjoint transformations in the fourth and final part of these notes.

Theorem. Let V be a finite-dimensional real or complex inner product space. For


every f ∈ V ′ there exists vf ∈ V such that

f (u) = hu, vf i for all u ∈ V .

Moreover, vf is unique.

Proof. Let f ∈ V ′ . If f = 0 then we take vf := 0 as we clearly must. So suppose


now that v 6= 0. Where should we seek vf ? Well, let U := Ker f . Since f : V → F
(where F is R or C) is linear and non-trivial it is surjective and so Ker U = dim V − 1
by the Rank-Nullity Theorem. We certainly want that hu, vf i = 0 for all u ∈ U , that
is, we should seek vf in U ⊥ . Now U ⊥ is 1-dimensional. We can choose v ∈ U ⊥ such
that ||v|| = 1, and then we’ll want vf = λ v for a suitable scalar λ. And λ can be
determined from the requirement that f (v) = hv, vf i = hv, λ vi = λ̄hv, vi = λ̄. Thus
we define vf := f (v) v (but note that in the real case complex conjugation, though
harmless, is not needed). We check: if u ∈ V then, since V = Span(v) ⊕ U , there is
a unique vector w ∈ U and there is a unique scalar α such that u = α v + w ; then
f (u) = α f (v) + f (w) = α λ, while

hu, fv i = hα v + w, λ̄ vi = α hv, λ̄ vi + hw, λ̄ vi = α λhv, vi = α λ .

Thus f (u) = hu, fv i, as required.


For uniqueness note that if v1 , v2 ∈ V are such that f (u) = hu, v1 i = hu, v1 i for all
u ∈ V then hu, v0 i = 0 for all u ∈ V where v0 := v1 − v2 . In particular, hv0 , v0 i = 0
and so v0 = 0, that is v1 = v2 . Therefore the vector vf corresponding to the linear
functional f is unique.

Note. The map f 7→ vf from V ′ to V is linear in the real case, semilinear in the
complex case:

Exercise 47. Prove this. That is, show that if F = R then in the theorem above
vαf +βg = α vf + β vg for all f, g ∈ V ′ and all α, β ∈ R, and that if F = C then
vαf +βg = ᾱ vf + β̄ vg for all f, g ∈ V ′ and all α, β ∈ C.

Further exercises III


Exercise 48 [A former FHS question (corrected)]. Let V be a finite-dimensional
inner-product space over R. Prove that if w1 , . . . , wm is an orthonormal set of vectors
in V then m
X
||v||2 > (v, wi )2 for all v ∈ V . [Bessel’s inequality]
1

Now suppose that u1 , . . . , ul are unit vectors in V . Prove that


l
X
||v||2 = (v, ui )2 for all v ∈ V
1

45
if and only if u1 , . . . , ul is an orthonormal basis for V .

¯Xn ¯2 n
X
Exercise 49. Prove that if z1 , . . . , zn ∈ C then ¯ zi ¯ 6 n |zi |2 . When
¯ ¯
i=1 i=1
does equality hold?

Exercise 50 [Part of FHS 2005, AC2, 2]. Prove that for any positive real number a
k
X k
X
ai a−i > k 2 .
i=1 i=1

When does equality hold?

46
Part IV: Adjoints of linear transformations
on finite-dimensional inner product spaces

In this fourth and final part of these notes we study linear transformations of finite-
dimensional real or complex inner product spaces. What we are aiming for is the so-called
Spectral Theorem for self-adjoint transformations.

Adjoints of linear transformations


We begin with the definition of adjoints.

Theorem. Let V be a finite-dimensional inner product space. For each linear trans-
formation T : V → V there is a unique linear transformation T ∗ : V → V such that

hT u, vi = hu, T ∗ vi for all u, v ∈ V .

Proof. For each fixed v ∈ V the map u 7→ hT u, vi is a linear functional on V


because

hT (α1 u1 + α2 u2 ), vi = hα1 T u1 + α2 T u2 , vi = α1 hT u1 , vi + α2 hT u2 , vi.

By the representation theorem (see p. 45) there exists w ∈ V such that

hT u, vi = hu, wi for all u ∈ U .

This vector w depends on and is uniquely determined by v so we write it as T ∗ v , where


T ∗ : V → V . Thus to prove the theorem we need to prove that T ∗ is linear. Well,
let v1 , v2 ∈ V and let α1 , α2 ∈ F , where F is the field of scalars (R or C). Then for
any u ∈ V , using the definition of T ∗ and the bilinearity or sesquilinearity of the inner
product, we find that

hu, T ∗ (α1 v1 + α2 v2 )i = hT u, α1 v1 + α2 v2 i
= α1 hT u, v1 i + α2 hT u, v2 i
= α1 hu, T ∗ v1 i + α2 hu, T ∗ v2 i
= h u, α1 T ∗ v1 + α2 T ∗ v2 i.

We have seen before that if hu, w1 i = hu, w2 i for all u ∈ V then w1 = w2 . It follows
that
T ∗ (α1 v1 + α2 v2 ) = α1 T ∗ v1 + α2 T ∗ v2 ,
and therefore T ∗ is linear, as required.

Example. Let V := Rn with its usual inner product hu, vi = utr v . Suppose
T : V → V is given by T : v 7→ Av where A ∈ Mn×n (R). Then T ∗ : v 7→ Atr v . For,

hT u, vi = hA u, vi = (A u)tr v = (utr Atr )v = utr (Atr v) = hu, Atr vi.

47
Example. Let V := Cn with its usual inner product hu, vi = utr v̄ . Suppose
T : V → V , T : v 7→ Av where A ∈ Mn×n (C) . Then T ∗ : v 7→ Ā tr v . The proof
is very similar to that worked through in the real case.

Theorem. Let S : V → V , T : V → V be linear transformations of the real or


complex inner product space V . Then:

(i) (S + T )∗ = S ∗ + T ∗ ; (ii) (α T )∗ = ᾱ T ∗ ; (iii) (S T )∗ = T ∗ S ∗ ; (iv) S ∗∗ = S .

Note: Complex conjugation of α in (ii) is unnecessary, but harmless, in the real


case.

Proof. We deal with (ii) and (iii), leaving (i) and (iv) as exercises. For (ii), for all
u, v ∈ V ,

h(α T ) u, vi = α hT u, vi = α hu, T ∗ vi = hu, ᾱ T ∗ vi = hu, (ᾱ T ∗ ) vi,

and therefore (by uniqueness) (α T )∗ = ᾱ T ∗ .


For (iii), for all u, v ∈ V ,

h(S T ) u, vi = hT u, S ∗ vi = hu, T ∗ (S ∗ v)i = hu, (T ∗ S ∗ ) vi,

and therefore (S T )∗ = T ∗ S ∗ , as required.

Exercise 51. Show that if S : V → V , T : V → V are linear transformations of


the real or complex inner product space V then: (S + T )∗ = S ∗ + T ∗ and S ∗∗ = S .

Exercise 52. Show that if T : V → V is a linear transformation of a real or


complex inner product space V and f ∈ F [x] (where F is R or C appropriately)
then f (T ∗ ∗ and f (T )∗ = f (T ∗ ) in the complex case (where if
P) = if (T ) in the real
P case
f (x) = ai x then f (x) = ai xi ).

Theorem. Let V be a finite-dimensional real or complex inner product space, let


T : V → V be a linear transformation, let {e1 , e2 , . . . , en } be an orthonormal basis of
V , and let A, A∗ be the matrices of T and T ∗ respectively with respect to this basis.
Then A∗ = Ātr .
Note. Conjugation is of course not needed in the real case.

P The matrix A is (aij ),∗say, where the coefficients are defined by the equations
Proof.
T ej = i aij ei . PSimilarly, if A = (bij ) then the coefficients bij are defined by the
equations T ej = i bij ei . Now hT ep , eq i = hep , T ∗ eq i for all relevant p, q by definition

of the adjoint. But P


hT ep , eq i = h i aip ei , eq i = aqp ,
while
hep , T ∗ eq i = hep ,
P
i biq ei i = bpq .
Therefore bpq = aqp , that is, A∗ = Ātr , as the theorem states.

48
Next we investigate the kernel and the image of an adjoint transformation.

Theorem. Let V be a finite-dimensional real or complex inner product space and let
T : V → V be a linear transformation. Then Ker T ∗ = (Im T )⊥ and Im T ∗ = (Ker T )⊥ .

Proof. Recall that if v ∈ V then hu, vi = 0 for all u ∈ V if and only if v = 0.


Therefore
Ker T ∗ = {v ∈ V | T ∗ v = 0}
= {v ∈ V | hu, T ∗ vi = 0 for all u ∈ V }
= {v ∈ V | hT u, vi = 0 for all u ∈ V }
= {v ∈ V | hw, vi = 0 for all w ∈ Im T }
= (Im T )⊥ .

Now if v ∈ Im T ∗ then v = T ∗ w for some w ∈ V and so if u ∈ Ker T then

hu, vi = hu, T ∗ wi = hT u, wi = h0, wi = 0.

This shows that Im T ∗ 6 (Ker T )⊥ . Compare dimensions: if n := dim V then

dim Im T ∗ = n − dim (Ker T ∗ ) [Rank-Nullity theorem]



= n − dim (Im T ) [since Ker T ∗ = (Im T )⊥ ]
= dim (Im T ) [theorem on p. 36]
= n − dim (Ker T ) [Rank-Nullity theorem]

= dim (Ker T ) [theorem on p. 36]

and therefore Im T ∗ = (Ker T )⊥ , and the proof of the theorem is complete.

Exercise 53 [Part of an old FHS question]. Consider the vector space of real-
valued polynomials
R1 of degree 6 1 in a real variable t, equipped with the inner product
hf, gi = 0 f (t) g(t) dt. Let D be the operation of differentiation with respect to t. Find
the adjoint D∗ .

Exercise 54 [Compare FHS 2005, AC1, Qn R 1 2]. In the previous exercise, how does
D∗ change if the inner product is changed to −1 f (t) g(t) dt?

Self-adjoint linear transformations


The transformation T : V → V , where V is a finite-dimensional real or complex
inner product space, is said to be self-adjoint if T ∗ = T . So T is self-adjoint if and only
if hT u, vi = hu, T vi for all u, v ∈ V .

Important example. If S : V → V is any linear transformation then S S ∗ and


S∗S are self-adjoint. For (S S ∗ )∗ = (S ∗ )∗ S ∗ = S S ∗ and (S ∗ S)∗ = S ∗ (S ∗ )∗ = S ∗ S by
clauses (iii), (iv) of the theorem on p. 48.

Lemma. Let V be a finite-dimensional real or complex inner product space. If


T : V → V is self-adjoint and U is a T -invariant subspace (that is, recall, T U 6 U )
then also U ⊥ is T -invariant.

49
Proof. Let U be a T -invariant subspace of V and let v ∈ U ⊥ . Then for any u ∈ U

hu, T vi = hT u, vi [since T is self-adjoint]


=0 [since T u ∈ U and v ∈ U ⊥ ].

Thus T v ∈ U ⊥ and so U ⊥ is T -invariant.

Exercise 55 [Part of FHS 2003, A1, Qn 4]. Let V be a 3-dimensional complex


inner product space, let {e1 , e2 , e3 } be an orthonormal basis for V , and let S : V → V
be a linear transformation such that

S(e1 ) = e1 , S(e1 + e2 ) = 2e1 + 2e2 , S(e1 + e3 ) = 0 .

Is S self-adjoint? Justify your answer.

Exercise 56 [Part of FHS 1990, A1, Qn 4]. Let V be the vector space of all n × n
real matrices with the usual addition and scalar multiplication. For A, B in V , let
hA, Bi = Trace(AB tr ), where B tr denotes the transpose of B .
(i) Show that this defines an inner product on V .
(ii) Let P be an invertible n×n matrix and let θ : V → V be the linear transformation
given by θ(A) = P −1 AP . Find the adjoint θ∗ of θ .
(iii) Prove that θ is self-adjoint if and only if P is either symmetric or skew-symmetric
(P is skew-symmetric if P tr = −P ).

Eigenvalues and eigenvectors of self-adjoint linear transformations


Theorem. Let V be a finite-dimensional real or complex inner product space and
let T : V → V be a self-adjoint linear transformation. Then all eigenvalues of T are
real—that is, cT (x) may be factorised as a product of linear factors in R[x].

Proof. Let A be the matrix of T with respect to some orthonormal basis of V . We


know that the matrix of T ∗ with respect to that same basis is Ātr and since T = T ∗
it follows that A = Ātr . Now let λ be an eigenvalue of T , that is, a root of cT (x) or,
which is the same thing, of cA (x). Of course, by the so-called Fundamental Theorem
of Algebra, even if V is a real inner product space we’ll know only that λ ∈ C. But
then there exists v ∈ Cn \ {0} such that A v = λ v . Now we compute the standard inner
product of v and A v :

hv, A vi = v tr (A v) = v tr Atr v̄ = (A v)tr v̄ = λ v tr v̄ = λ ||v||2 .

On the other hand,

hv, A vi = hv, λ vi = v tr λ v = λ̄ v tr v̄ = λ̄ ||v||2 .

Therefore λ ||v||2 = λ̄ ||v||2 and since ||v||2 > 0 we must have λ = λ̄, that is, λ is real.

Theorem. Let V be a finite-dimensional real or complex inner product space and


let T : V → V be a self-adjoint linear transformation. If u, v are eigenvectors of T for
distinct eigenvalues λ, µ then hu, vi = 0.

50
Proof. We have T u = λ u, T v = µv and λ 6= µ. Therefore

λhu, vi = hT u, vi = hu, T vi = hu, µvi = µ̄ hu, vi.

We have just seen that µ̄ = µ and so we see that λhu, vi = µhu, vi. Since λ 6= µ we
must have hu, vi = 0 as the theorem states.

Diagonalisability and the spectral theorem for self-adjoint


linear transformations
One of the most important theorems in the area tells us that a self-adjoint transfor-
mation, or equivalently, a symmetric or conjugate-symmetric matrix, is diagonalisable,
and moreover, the diagonalisation can be accomplished with an orthogonal or unitary
transformation. We give this theorem in three different forms.

Theorem. Let V be a finite-dimensional inner product space over R or C and let


T : V → V be a self-adjoint linear transformation. There is an orthonormal basis of V
consisting of eigenvectors of T .

Proof. Let λ1 , . . . , λk be the distinct eigenvalues of T , for 1 6 i 6 k let

Vi := {v ∈ V | T v = λi v},

and let U := V1 + · · · + Vk , so that U is the subspace of V spanned by the eigenvectors


of T . We aim to show that U = V . To this end, let W := U ⊥ . We know that W is
T -invariant (see p. 49). If W were non-trivial then it would contain an eigenvector of T ,
but all of these lie in U . Therefore W = {0}, and so U = V .
We know from the previous theorems that if vi ∈ Vi then v1 , . . . , vk is an orthog-
onal set of vectors in V and is therefore linearly independent (see p. 35). Therefore
U = V1 ⊕ · · · ⊕ Vk . Now Vi is a finite-dimensional inner product space and so it has
an orthonormal basis Bi , and of course Bi consists of eigenvectors of T for eigen-
value λi . Thus, since members of Bi are orthogonal to members of Bj when i 6= j , if
B := B1 ∪ · · · ∪ Bk then B is an orthonormal basis of V consisting of eigenvalues of T .

Theorem.
(1) If A ∈ Mn×n (R) and Atr = A then there exists U ∈ O(n) and there exists
a diagonal matrix D ∈ Mn×n (R) such that U −1 A U = D . (Recall: U ∈ O(n)
means that U −1 = U tr .)
(2) If A ∈ Mn×n (C) and Ā tr = A then there exists U ∈ U(n) and there exists a
diagonal matrix D ∈ Mn×n (R) such that U −1 A U = D . (Recall: U ∈ U(n) means
that U −1 = Ū tr .)

The force of this theorem is that real symmetric matrices can be diagonalised by an
orthogonal change of basis—that is, by a rotation or a reflection (though in fact one can
always do it with a rotation). Similarly, a conjugate-symmetric complex matrix can be
diagonalised by a unitary transformation. This is simply the special case of the previous
theorem where V is Rn or Cn and T : v 7→ A v .

51
As preparation for our third version of the diagonalisablity theorem, the so-called
Spectral Theorem, we need to examine self-adjoint projection operators

Lemma. Let P : V → V be a projection operator (idempotent) where V is a finite-


dimensional real or complex inner product space. Then P is self-adjoint if and only if
Im P = (Ker P )⊥ .

Proof. Let U := Im P and W := Ker P , so that V = U ⊕W and P is the projection


onto U along W . Suppose first that P is self-adjoint. If u ∈ U , w ∈ W , then

hu, wi = hP u, wi = hu, P wi = hu, 0i = 0,

and so W 6 U ⊥ . Comparing dimensions we see that W = U ⊥ .


Now suppose that W = U ⊥ . For v1 , v2 ∈ V write v1 = u1 + w1 , v2 = u2 + w2 ,
where u1 , u2 ∈ U , w1 , w2 ∈ W . Then

hP v1 , v2 i = hu1 , u2 + w2 i = hu1 , u2 i = hu1 + w1 , u2 i = hv1 , P v2 i,

and this shows that P is self-adjoint.

The Spectral Theorem. Let V be a finite-dimensional real or complex inner


product space and let T : V → V be a self-adjoint linear transformation. If the distinct
eigenvalues of T are λ1 , . . . , λk then λi ∈ R for 1 6 i 6 k and there are uniquely
determined self-adjoint projection operators Pi : V → V such that Pi Pj = 0 whenever
i 6= j and X X
Pi = I and T = λi Pi .
i i

Note that P1 , . . . , Pk is a partition of the identity (see p. 9).

Again, this is simply a restatementLof previous theorems. The projection operator Pi


is the projection of V onto Vi along j6=i Vj , where Vi is the eigenspace for eigenvalue
λi as in the proof of the first of these three theorems.

Worked example [FHS 1999, Paper a1, Qn 4].

Let V be a finite-dimensional complex inner product space. Let A be a linear


transformation of V ; define the adjoint A∗ of A and show that it is unique. Show
also that Ker (A) = Ker (A∗ A). If A , B are linear transformations of V show that
(A B)∗ = B ∗ A∗ .
What does it mean to say that S is self-adjoint? Suppose S is self-adjoint.
Show that
(tr(S))2 6 r(S)tr(S 2 ) ,
where r denotes the rank, and tr denotes the trace of a linear transformation of
V . [You may use without proof that a self-adjoint linear transformation has only
real eigenvalues and that there exists an orthonormal basis of eigenvectors of V .]
Deduce that for an arbitrary linear transformation A

(tr(A∗ A))2 6 r(A)tr((A∗ A)2 ) .

52
Response. Definition of adjoint and proof that it is unique is bookwork done above.
To see that Ker (A) = Ker (A∗ A) note first that if u ∈ Ker (A) then certainly (A∗ A) u =
A∗ (A u) = 0, so Ker (A) 6 Ker (A∗ A). On the other hand, if u ∈ Ker (A∗ A) then

hA u, A ui = hu, A∗ A ui = hu, 0i = 0

and so A u = 0. This shows that Ker (A∗ A) 6 Ker (A) and so Ker (A∗ A) = Ker (A).
The definition of self-adjoint is bookwork given above. So now suppose that S is
self-adjoint. We know (and the examiner lets us quote) thatµthere¶is a basis of V
D 0
with respect to which the the matrix of S is partitioned as where D is a
0 0
diagonal matrix which in a self-explanatory notation may be written Diag (λ1 , . . . , λr )
where λ1 , . . . , λr are non-zero real numbers (not necessarily distinct).µ Thus¶ in this
D2 0
notation r(S) = r . With respect to that same basis the matrix of S 2 is . Since
0 0
D2 = Diag (λ12 , . . . , λr2 ) we see that tr(S 2 ) = λ2i . [In fact, this is true for any linear
P
transformation as one sees from consideration of its triangular form.] Taking vectors
u := (λ1 , . . . , λr )tr ∈ Rr and v := (1, . . . , 1)tr ∈ Rr and applying Pthe Cauchy–Schwarz
Inequality in the form hu, vi2 6 ||u||2 ||v||2 to them we see that ( λi )2 6 r
P 2
λi , that
2 2
is, (tr(S)) 6 r(S) tr(S ), as required.
For the last part let A : V → V be any linear transformation and define S := A∗ A.
Then S is self-adjoint (since S ∗ = A∗ A∗∗ = A∗ A = S ) and from the first part of
the question Ker S = Ker A. By the Rank-Nullity Theorem therefore r(S) = r(A).
Applying what has just been proved we see that (tr(A∗ A))2 6 r(A) tr((A∗ A)2 ), as
required.

Exercise 57 [FHS 1988, A1, Qn 3]. Let V be a finite-dimensional real inner prod-
uct space, T : V → V a linear transformation and T ∗ its adjoint. Prove the following:

(i) (im T )⊥ = ker T ∗


(ii) ker T ∗ T = ker T
(iii) dim ker (T T ∗ ) = dim ker (T ∗ T )
(iv) if v is an eigenvector of T ∗ T with eigenvalue λ 6= 0, then T v is an eigenvector of
T T ∗ with eigenvalue λ.

Deduce that there exists an orthogonal linear transformation P such that P −1 T T ∗ P =


T ∗ T . [You may assume that any self-adjoint linear transformation has an orthonormal
basis of eigenvectors.]

An application: quadratic forms


As an application of the theory of inner product spaces and self-adjoint linear trans-
formations we treat the beginnings of the theory of quadratic forms. A quadratic form
in
P n variables x1 , x2 , . . . , xn over a field F is a homogeneous quadratic polynomial
aij xi xj where aij ∈ F for 1 6 i 6 n, 1 6 j 6 n.

1
Note. If charF 6= 2 then we may replace each of aij , aji by 2 (aij + aji ). Thus we
may (and we always do) assume that aij = aji .

53
aij xi xj = xtr A x where x is the column vector (x1 , x2 , . . . , xn )tr
P
Note. Then
and A is the matrix (aij ), which, it is worth emphasising, now is symmetric.

There is a close connection between quadratic and bilinear forms (which, recall, were
defined on p. 34).

Lemma. Suppose that charF 6= 2. To each quadratic form Q(x1 , . . . , xn ) over F


there corresponds a unique symmetric bilinear form B(u, v) on F n such that

Q(v) = B(v, v).

Proof Define B(u, v) := 12 (Q(u + v) − Q(u) − Q(v)). Thus if A is the symmetric


matrix representing Q then
1³ ´
B(u, v) = (u + v)tr A (u + v) − utr A u − v tr A v = utr A v .
2
The function (u, v) 7→ utr A v is obviously bilinear (by the distributive laws for matrix
multiplication) so B is bilinear. And clearly B(u, u) = Q(u) for all u ∈ F n .
Uniqueness goes as follows: suppose that B1 (u, u) = B2 (u, u) = Q(u) for all u ∈ F n ,
where B1 and B2 are symmetric bilinear forms. Now

B1 (u + v, u + v) − B1 (u, u) − B1 (v, v) = B1 (u, v) + B1 (v, u) = 2B1 (u, v)

since B1 is symmetric. Similarly, 2B2 (u, v) = B2 (u + v, u + v) − B2 (u, u) − B2 (v, v), and


it follows (given that charF 6= 2) that B1 = B2 .

Real quadratic forms, that is to say, quadratic forms over R are of particular impor-
tance in most branches of mathematics, both pure and applied.

Theorem. Let Q(x1 , . . . , xn ) be a real quadratic form. There exists an invertible


matrix P such that if
X1 x1
   
 ..  = P  ... 
.
Xn xn
then
Q(x1 , . . . , xn ) = X12 + · · · + Xp2 − Xp+1
2
− · · · − Xr2
for all (x1 , . . . , xn )tr ∈ Rn

Proof. Write Q(x1 , . . . , xn ) = utr A u, where u = (x1 , . . . , xn )tr ∈ Rn and A is the


real symmetric matrix of the quadratic form. By the theorem about diagonalisability of
real symmetric matrices (see p. 51) there exists U ∈ O(n) and there exists a diagonal
matrix D ∈ Mn×n (R) such that U −1 A U = D . Now U is orthogonal and so U −1 = U tr .
Therefore U tr A U = D . We will see the significance of this in a moment.
We may write D = Diag (λ1 , . . . , λn ), where

λi > 0 if 1 6 i 6 p, λi < 0 if p + 1 6 i 6 r , λi = 0 if r + 1 6 i 6 n.

54
Define  −1/2
 λi
 if 1 6 i 6 p,
µi := (−λi )−1/2 if p + 1 6 i 6 r,


1 if r + 1 6 i 6 n.
and E := Diag (µ1 , . . . , µn ). Clearly, E is invertible and
 
Ip 0 0
tr
E DE =  0 −Iq 0 ,
0 0 0

where q := r − p. It follows that



Ip 0 0
(U E)tr A (U E) = E tr D E =  0 −Iq 0  ,
0 0 0

and so if P := (U E)−1 and v := (X1 , . . . , Xn )tr where (X1 , . . . , Xn )tr := P u then


³ ´
utr A u = (P −1 v)tr A (P −1 v) = v tr (U E)tr A (U E) v

that is, Q(x1 , . . . , xn ) = X12 + · · · + Xp2 − Xp+1


2 − · · · − Xr2 , as required.

Note 1. The parameter r in the theorem is called the rank of Q. It is the matrix
rank of A.

Note 2. The number p is also an invariant of Q. Define q := r − p. Sometimes


p, sometimes p − q , sometimes the pair (p, q) is known as the signature of Q. Although
the invariance of signature is not part of the syllabus, it is worth understanding. Here is
a simple proof.
Let R be an invertible n × n matrix over R such that if
Y1 x1
   
 ...  = R  ... 
Yn xn

then
Q(x1 , . . . , xn ) = Y12 + · · · + Yp2′ − Yp2′ +1 − · · · − Yr2′
for all (x1 , . . . , xn )tr ∈ Rn . Define q := r − p (as above) and q ′ := r′ − p′ . Call a
subspace U of Rn positive if Q(u) > 0 for all u ∈ U \ {0}, and call a subspace W
non-positive if Q(u) 6 0 for all u ∈ W \ {0}. Clearly, if U is a positive subspace and W
is a non-positive subspace then U ∩ W = {0}, so dim U + dim W 6 n. Define subspaces
U1 , U2 , W1 , W2 by

U1 := {u ∈ Rn | Xp+1 = . . . = Xn = 0},
W1 := {u ∈ Rn | X1 = . . . = Xp = 0},
U2 := {u ∈ Rn | Yp′ +1 = . . . = Yn = 0},
W2 := {u ∈ Rn | Y1 = . . . = Yp′ = 0}.

55
(Note that here Xi = 0 and Yj = 0 are to be construed as linear equations in the
coordinates x1 , . . . , xn of u.) Then U1 and U2 are positive subspaces of dimensions
p, p′ respectively and W1 , W2 are non-positive subspaces of dimensions n − p, n − p′
respectively. It follows that p + (n − p′ ) 6 n so that p 6 p′ , and p′ + (n − p) 6 n so that
p′ 6 p. Therefore p = p′ .
Note that a similar argument with negative and non-negative subspaces will prove
that q = q ′ , and therefore r = p + q = p′ + q ′ = r′ . But of course the fact that r = r′
also comes from the fact that this is the rank of the matrix A of the quadratic form.

Note 3. The invariance of rank and signature is known as Sylvester’s Law of Inertia.

Note 4. If Q(u) > 0 whenever u ∈ Rn \ {0} then Q is said to be positive definite.


This holds if and only if r = p = n. Then the associated bilinear form is an inner product
on Rn .

Exercise 58 [FHS 1992, A1, Qn 3]. For u := (x, y, z, t) ∈ R4 we define q(u) :=


(x2 +y 2 +z 2 −t2 ). A subspace U of R4 is said to be ‘positive’ if q(u) > 0 for all non-zero
u in U , it is said to be ‘negative’ if q(u) < 0 for all non-zero u in U , and it is said to
be ‘null’ if q(u) = 0 for all u ∈ U . Prove or disprove each of the following assertions:
(i) if U is positive then dim U 6 3;
(ii) there is a unique positive subspace of dimension 3;
(iii) if U is negative then dim U 6 1;
(iv) there exist non-zero subspaces U , V , W such that U is positive, V is null, W is
negative and R4 = U ⊕ V ⊕ W .

In geometry and in mechanics we often need to study two real quadratic forms si-
multaneously. The following theorem has a number of applications, in particular to the
study of small vibrations of a mechanical system about a state of stable equilibrium.

Theorem. Let Q(x1 , . . . , xn ), R(x1 , . . . , xn ) be real quadratic forms. Suppose that


Q is positive definite. Then there is an invertible n × n matrix P over R and there exist
a1 , . . . , an ∈ R such that if
X1 x1
   
 ...  = P −1  ... 
Xn xn
then the equations

Q(x1 , . . . , xn ) = X12 + · · · + Xn2


R(x1 , . . . , xn ) = a1 X12 + · · · + an Xn2

hold simultaneously for all (x1 , . . . , xn )tr ∈ Rn .


The coefficients a1 , a2 , . . . , an are the roots of the equation

det(xA − B) = 0

where A, B are the symmetric matrices associated with Q, R respectively.

56
Proof. Let A and B be the real symmetric matrices associated with Q and R
respectively, so that

Q(x1 , . . . , xn ) = utr A u, R(x1 , . . . , xn ) = utr B u,

where u = (x1 , . . . , xn )tr ∈ Rn . The bilinear form associated with Q is h , iQ , say,


where hu, viQ = utr A v . Since Q is positive definite this is an inner product on Rn . Let
u1 , . . . , un be an orthonormal basis for Rn with respect to this new inner product, and
let U be the n × n matrix whose columns are u1 , . . . , un . The equations ui A uj = 1 if
i = j and ui A uj = 0 if i 6= j may be written U tr A U = I .
Now let C := U tr B U . Then, since B is symmetric,

C tr = U tr B tr (U tr )tr = U tr B U = C,

that is C is symmetric. By the Diagonalisability Theorem, Version 2, there exists


an orthogonal matrix T such that T −1 C T = D , where D is diagonal, say D =
Diag (a1 , . . . , an ). Since T is orthogonal T −1 = T tr and so T tr C T = D . Now if
P := U T then
P tr A P = T tr U tr A U T = T tr I T = I,
and
P tr B P = T tr U tr B U T = T tr C T = D.
This means that if v = P −1 u and v = (X1 , . . . , Xn ) then

Q(x1 , . . . , xn ) = (P v)tr A (P v) = v tr (P tr A P ) v = X12 + · · · + Xn2

and
R(x1 , . . . , xn ) = (P v)tr B (P v) = v tr (P tr B P ) v = a1 X12 + · · · + an Xn2 ,
as required.
We have seenQthat U tr A U = I and U tr B U = D = DiagQ (a1 , . . . , an ). Clearly,
det(xI − D) = Q (x − ai ). Thus det(x U tr A U − U tr B U ) = (x − ai ). Therefore
det(x A − B) = c (x − ai ) where c := (detU )−2 . So a1 , . . . , an are the roots of the
equation det(x A − B) = 0 and this competes the proof of the theorem.

Exercise 59 [An old FHS question]. (i) Let A, B be real symmetric n×n matrices.
Suppose that all the eigenvalues of A are positive and let T be an invertible n×n matrix
such that T tr AT = In . [It is a theorem of the course that such a matrix exists]. Show
that T tr BT is symmetric and deduce that there exists an invertible n × n matrix S
such that both S tr AS = In and S tr BS is diagonal. Show also that the diagonal entries
of S tr BS are the roots of the equation det(xA − B) = 0.

(ii) Show that if

Q(x, y, z) = 3x2 + 5y 2 + 3z 2 + 2xy − 2xz + 2yz,


R(x, y, z) = x2 + y 2 + z 2 + 10xy + 2xz − 6yz,

then there exist linear forms l , m, n in x, y , z such that


√ √
Q(x, y, z) = l2 + m2 + n2 and R(x, y, z) = l2 + 2 m2 − 2 n2 .

57
Further exercises IV
Exercise 61. Express the following quadratic forms as sums or differences of
squares of linearly independent linear forms in x, y , z : x2 + 2xy + 2y 2 − 2yz − 3z 2 ;
xy + yz + xz . [Note: it is probably quicker to use the method of ‘completing the
square’ than to use the method given by the proof of the theorem on diagonalisation of
real quadratic forms.]

Exercise 62 [FHS 1983, I, Qn 3]. Let α be a self-adjoint linear transformation of


a finite dimensional real inner-product space V . Show that V has an orthonormal basis
consisting of eigenvectors of α.
Suppose the eigenvalues of α are strictly positive, and let

G = {π | π is a linear transformation of V and π ∗ απ = α}.

Show that G is a group isomorphic to O(V ), the group of orthogonal transformations


of V .

Exercise 63. Let X1 and X2 be subspaces of a real finite-dimensional vector space


V , and suppose that V = X1 + X2 . Suppose that X1 and X2 have inner products h , i1
and h , i2 which agree on X1 ∩ X2 . Show that:
(i) there exists a basis for V which contains an orthonormal basis for X1 and an
orthonormal basis for X2 ;
(ii) there exists an inner product on V which agrees with h , i1 on X1 and h , i2 on
X2 .
Is this inner product unique? Justify your answer.

Exercise 64 [Cambridge Tripos Part IA, 1995]. Let V be a real inner product
1
space, and let ||v|| := hv, vi 2 . Prove that

||x − y|| 6 ||z − x|| + ||y − z||

for all x, y , z ∈ V . When does equality occur?


Prove that ¯¯ ¯¯
¯¯ x − y x ¯¯¯¯ ||y||
¯¯ ||x − y||2 − ||x||2 ¯¯ = ||x − y|| ||x|| .
¯¯

Hence show that ||y|| ||z|| 6 ||z − x + y|| ||x|| + ||z − x|| ||x − y||.

Exercise 65 Let V be the vector space of infinitely Rdifferentiable functions


π
f : [−π, π] → R equipped with the inner product hf, gi := −π f (t) g(t) dt, and let
′′
∆ : V → V be the linear transformation ∆ : f 7→ f .

(i) Show that if V0 := {f ∈ V | f (−π) = f (π) = 0} then V0 is a subspace of V and


dim(V /V0 ) = 2.
(ii) Show that if f, g ∈ V0 then h∆f, gi = hf, ∆gi.
(iii) What is wrong with the assertion that ∆ is a self-adjoint transformation of V0 ?

58
Exercise 66. [FHS 1996, A1, Qn 3.] Let V be a finite-dimensional real inner
product space.
(a) If v ∈ V show that there is a unique element θv in the dual space V ′ of V such
that
θv (u) = hu, vi for all u ∈ V.
(b) Show that the map θ : V → V ′ given by θ(v) = θv (for v ∈ V ) is an isomorphism.
[You may assume that dim V = dim V ′ .]
(c) Let W be a subspace of V , and let W ⊥ be the orthogonal complement of W in
V . Show that θ(W ⊥ ) = W ◦ , where W ◦ is the annihilator of W in V ′ .
Now let V be the space of polynomials in x of degree at most 2, with real coefficients.
Define an inner product on V by setting
Z 1
hf, gi = f (x) g(x) dx
0
√ √
for f, g ∈ V . You may assume that {1, 3(1 − 2x), 5(6x2 − 6x + 1)} is an orthonormal
basis for V . Show that the map φ : V → R given by φ(f ) = f ′′ (0) defines a linear
functional on V (i.e. show that φ ∈ V ′ ), and find v ∈ V such that θv = φ.

Exercise 67 [FHS 1995, A1, Qn 3]. Let V be a finite-dimensional vector space


over a field F , and let T : V → V be a linear transformation. Let v ∈ V , v 6= 0, and
suppose that V is spanned by {v, T v, T 2 v, . . . , T j v, . . .}.
(i) Show that there is an integer k > 1 such that v, T v, T 2 v, . . . , T k−1 v are linearly
independent but
T k v = α0 v + α1 T v + · · · + αk−1 T k−1 v
for some α0 , α1 , . . . , αk−1 ∈ F .
(ii) Prove that {v, T v, T 2 v, . . . , T k−1 v} is a basis for V .
Let T have minimum polynomial m(x) and characteristic polynomial c(x).
(iii) Prove that m(x) = xk − αk−1 xk−1 − · · · − α1 x − α0 .
(iv) By considering the matrix of T with respect to the basis {v, T v, T 2 v, . . . , T k−1 v},
prove (without using the Cayley–Hamilton Theorem) that m(x) = c(x).
 
1 1 0 1
0 1 0 1
Now let A be the matrix   0 0 1 1  . Show that there does not exist a column

0 0 0 1
vector v ∈ R such that R is spanned by {v, A v, A2 v, . . . , Aj v, . . .}.
4 4

Exercise 68. Let V be a finite-dimensional vector space, and let S, T : V → V


be linear transformations. Let m1 and m2 denote the minimal polynomials of S T
and T S respectively. By considering relations such as T (S T )r S = (T S)r+1 show that
m2 (x) = xi m1 (x), where i = −1, 0, or +1. Show that λ is an eigenvalue of S T if and
only if λ is an eigenvalue of T S .
 
1 1 0
Exercise 69. Let A denote the real matrix  0 2 1  . Calculate the
−4 −11 −4
matrix 99 r
P
r=1 A .

ΠMN: Queen’s: 13.xi.2007

59

You might also like