0% found this document useful (0 votes)
75 views

2012:2013 Lecture Notes

This document outlines lecture notes on advanced linear algebra topics, including the Jordan canonical form and its applications. The notes begin with a review of linear maps and change of basis. They then cover the Jordan canonical form in detail, explaining how any linear operator on a vector space can be put into a nearly diagonal form through a change of basis. Applications of the Jordan form to powers of matrices, difference equations, and differential equations are also discussed. Later sections cover bilinear maps, quadratic forms, and their relationships to geometry. The final sections discuss finitely generated abelian groups and their classification.

Uploaded by

Shure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

2012:2013 Lecture Notes

This document outlines lecture notes on advanced linear algebra topics, including the Jordan canonical form and its applications. The notes begin with a review of linear maps and change of basis. They then cover the Jordan canonical form in detail, explaining how any linear operator on a vector space can be put into a nearly diagonal form through a change of basis. Applications of the Jordan form to powers of matrices, difference equations, and differential equations are also discussed. Later sections cover bilinear maps, quadratic forms, and their relationships to geometry. The final sections discuss finitely generated abelian groups and their classification.

Uploaded by

Shure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Algebra I Advanced Linear Algebra (MA251) Lecture Notes

Dmitriy Rumynin
year 2012

Contents
1 Review of Some Linear Algebra

1.1

The matrix of a linear map with respect to a fixed basis . . . . . . . . . . . .

1.2

Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 The Jordan Canonical Form

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

The Cayley-Hamilton theorem

. . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4

Jordan chains and Jordan blocks . . . . . . . . . . . . . . . . . . . . . . . . .

2.5

Jordan bases and the Jordan canonical form . . . . . . . . . . . . . . . . . . .

10

2.6

The JCF when n = 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.7

The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.8

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.9

Proof of Theorem 2.9 (non-examinable) . . . . . . . . . . . . . . . . . . . . .

16

2.10 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.11 Applications to difference equations

. . . . . . . . . . . . . . . . . . . . . . .

18

2.12 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.13 Applications to differential equations . . . . . . . . . . . . . . . . . . . . . . .

21

3 Bilinear Maps and Quadratic Forms

22

3.1

Bilinear maps: definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.2

Bilinear maps: change of basis

. . . . . . . . . . . . . . . . . . . . . . . . . .

23

3.3

Quadratic forms: introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

3.4

Quadratic forms: definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.5

Change of variable under the general linear group . . . . . . . . . . . . . . . .

27

3.6

Change of variable under the orthogonal group . . . . . . . . . . . . . . . . .

30

3.7

Applications of quadratic forms to geometry . . . . . . . . . . . . . . . . . . .

34

3.7.1

34

Reduction of the general second degree equation . . . . . . . . . . . .


1

3.7.2

The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.7.3

The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.8

Unitary, hermitian and normal matrices . . . . . . . . . . . . . . . . . . . . .

36

3.9

Applications to quantum mechanics (non-examinable) . . . . . . . . . . . . .

42

4 Finitely Generated Abelian Groups

45

4.1

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.2

Subgroups, cosets and quotient groups . . . . . . . . . . . . . . . . . . . . . .

46

4.3

Homomorphisms and the first isomorphism theorem . . . . . . . . . . . . . .

49

4.4

Free abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.5

Unimodular elementary row and column operations and the Smith normal
form for integral matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

4.6

Subgroups of free abelian groups . . . . . . . . . . . . . . . . . . . . . . . . .

55

4.7

General finitely generated abelian groups . . . . . . . . . . . . . . . . . . . .

57

4.8

Finite abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.9

Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.10 Third Hilberts problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

4.11 Possible topics for the second year essays

63

. . . . . . . . . . . . . . . . . . . .

Review of Some Linear Algebra

Students will need to be familiar with the whole of the contents of the First Year Linear
Algebra module (MA106). In this section, we shall review the material on matrices of linear
maps and change of basis. Other material will be reviewed as it arises.

1.1

The matrix of a linear map with respect to a fixed basis

Let V and W be vector spaces over a field1 K. Let T : V W be a linear map, where
dim(V ) = n, dim(W ) = m. Choose a basis e1 , . . . , en of V and a basis f1 , . . . , fm of W .
Now, for 1 j n, T (ej ) W , so T (ej ) can be written uniquely as a linear combination of
f1 , . . . , fm . Let
T (e1 ) = 11 f1 + 21 f2 + + m1 fm

T (e2 ) = 12 f1 + 22 f2 + + m2 fm

T (en ) = 1n f1 + 2n f2 + + mn fm
where the coefficients ij K (for 1 i m, 1 j n) are uniquely determined.

The coefficients ij form an m n matrix

11 12
21 22
A=

m1 m2

. . . 1n
. . . 2n

...
. . . mn

over K. Then A is called the matrix of the linear map T with respect to the chosen bases of
V and W . Note that the columns of A are the images T (e1 ), . . . , T (en ) of the basis vectors
of V represented as column vectors with respect to the basis f1 , . . . , fm of W .
It was shown in MA106 that T is uniquely determined by A, and so there is a one-one
correspondence between linear maps T : V W and m n matrices over K, which depends
on the choice of bases of V and W .
For v V , we can write v uniquely as a linear combination of the basis vectors ei ; that is,
v = x1 e1 + + xn en , where the xi are uniquely determined by v and the basis ei . We shall
call xi the coordinates of v with respect to the basis e1 , . . . , en . We associate the column
vector

x1
x2

n,1

v=
. K ,
.
xn
to v, where K n,1 denotes the space of n 1-column vectors with entries in K. Notice that v
is equal to (x1 , x2 , . . . , xn )T , the transpose of the row vector (x1 , x2 , . . . , xn ). To simplify the
typography, we shall often write column vectors in this manner.

It was proved in MA106 that if A is the matrix of the linear map T , then for v V , we
have T (v) = w if and only if Av = w, where w K m,1 is the column vector associated with
w W.
1
It is conventional to use either F or K as the letter to denote a field. F stands for a field, while K comes
from the German word k
orper.

1.2

Change of basis

Let V be a vector space of dimension n over a field K, and let e1 , . . . , en and e1 , . . . , en be


two bases of V . Then there is an invertible n n matrix P = (ij ) such that
n
X
ij ei for 1 j n.
()
ej =
i=1

P is called the basis change matrix or transition matrix for the original basis e1 , . . . , en and
the new basis e1 , . . . , en . Note that the columns of P are the new basis vectors ei written as
column vectors in the old basis vectors ei . (Recall also that P is the matrix of the identity
map V V using basis e1 , . . . , en in the domain and basis e1 , . . . , en in the codomain.)
Usually the original basis e1 , . . . , en will be the standard basis of K n .

Example. Let V = R3 , e1 = (1 0 0), e2 = (0 1 0), e3 = (0 0 1) (the standard basis) and


e1 = (0 1 2), e2 = (1 2 0), e3 = (1 0 0). Then

0 1 1
P = 1 2
0 .
2 0
0

The following result was proved in MA106.

Proposition 1.1 With the above notation, let v V , and let v and v denote the column
vectors associated with v when we use the bases e1 , . . . , en and e1 , . . . , en , respectively. Then
P v = v.
So, in the example above, if we take v = (1 2 4) = e1 2e2 + 4e3 then v = 2e1 2e2 3e3 ,
and you can check that P v = v.
This equation P v = v describes the change of coordinates associated with the basis change.
In Section 3 below, such basis changes will arise as changes of coordinates, so we will use this
relationship quite often.
Now let T : V W , ei , fi and A be as in Subsection 1.1 above, and choose new bases
of W . Then
e1 , . . . , en of V and f1 , . . . , fm
m
X
ij fi for 1 j n,
T (ej ) =
i=1

where B = (ij ) is the m n matrix of T with respect to the bases {ei } and {fi } of V and
W . Let the n n matrix P = (ij ) be the basis change matrix for original basis {ei } and
new basis {ei }, and let the m m matrix Q = (ij ) be the basis change matrix for original
basis {fi } and new basis {fi }. The following theorem was proved in MA106:
Theorem 1.2 With the above notation, we have AP = QB, or equivalently B = Q1 AP .
In most of the applications in this course we will have V = W (= K n ), {ei } = {ei }, {fi } = {fi }
and P = Q, and hence B = P 1 AP .

The Jordan Canonical Form

2.1

Introduction

Throughout this section V will be a vector space of dimension n over a field K, T : V V


will be a linear operator2 , and A will be the matrix of T with respect to a fixed basis e1 , . . . , en
2

i.e a linear map from a space to itself

of V . Our aim is to find a new basis e1 , . . . , en for V , such that the matrix of T with respect
to the new basis is as simple as possible. Equivalently (by Theorem 1.2), we want to find
an invertible matrix P (the associated basis change matrix) such that P 1 AP is a simple as
possible.
Our
form of matrix is a diagonal matrix, but we saw in MA106 that the matrix
 preferred

1 1
, for example, is not similar to a diagonal matrix. We shall generally assume that
0 1
K = C. This is to ensure that the characteristic polynomial of A factorises into linear factors.
Under this assumption, it can be proved that A is always similar to a matrix B = (ij ) of
a certain type (called the Jordan canonical form or sometimes Jordan normal form of the
matrix), which is not far off being diagonal. In fact ij is zero except when j = i or j = i + 1,
and i,i+1 is either 0 or 1.
We start by summarising some definitions and results from MA106. We shall use 0 both
for the zero vector in V and the zero n n matrix. The zero linear operator 0V : V V
corresponds to the zero matrix 0, and the identity linear operator IV : V V corresponds
to the identity n n matrix In .

Because of the correspondence between linear maps and matrices, which respects addition
and multiplication, all statements about A can be rephrased as equivalent statements about
T . For example, if p(x) is a polynomial equation in a variable x, then p(A) = 0 p(T ) = 0V .

If T v = v for K and 0 6= v V , or equivalently, if Av = v, then is an eigenvalue,


and v a corresponding eigenvector of T and A. The eigenvalues can be computed as the roots
of the characteristic polynomial cA (x) = det(A xIn ) of A.
The eigenvectors corresponding to are the non-zero elements in the nullspace (= kernel)
of the linear operator T IV This nullspace is called the eigenspace of T with respect to
the eigenvalue . In other words, the eigenspace is equal to { v V | T (v) = v }, which is
equal to the set of eigenvectors together with 0.

The dimension of the eigenspace, which is called the nullity of T IV is therefore equal to
the number of linearly independent eigenvectors corresponding to . This number plays an
important role in the theory of the Jordan canonical form. From the Dimension Theorem,
proved in MA106, we know that
rank(T IV ) + nullity(T IV ) = n,
where rank(T IV ) is equal to the dimension of the image of T IV .

For the sake of completeness, we shall now repeat the results proved in MA106 about the
diagonalisability of matrices. We shall use the theorem that a set of n linearly independent
vectors of V form a basis of V without further explicit reference.

Theorem 2.1 Let T : V V be a linear operator. Then the matrix of T is diagonal with
respect to some basis of V if and only if V has a basis consisting of eigenvectors of T .
Proof: Suppose that the matrix A = (ij ) of T is diagonal with respect to the basis
e1 , . . . , en of V . Recall from Subsection 1.1 that the image of the i-th basis vector of V is
represented by the i-th column of A. But since A is diagonal, this column has the single
non-zero entry ii . Hence T (ei ) = ii ei , and so each basis vector ei is an eigenvector of A.
Conversely, suppose that e1 , . . . , en is a basis of V consisting entirely of eigenvectors of T .
Then, for each i, we have T (ei ) = i ei for some i K. But then the matrix of A with
respect to this basis is the diagonal matrix A = (ij ) with ii = i for each i.
2

Theorem 2.2 Let 1 , . . . , r be distinct eigenvalues of T : V V , and let v1 , . . . , vr be


corresponding eigenvectors. (So T (vi ) = i vi for 1 i r.) Then v1 , . . . , vr are linearly
independent.
Proof: We prove this by induction on r. It is true for r = 1, because eigenvectors are
non-zero by definition. For r > 1, suppose that for some 1 , . . . , r K we have
1 v1 + 2 v2 + + r vr = 0.
Then, applying T to this equation gives
1 1 v1 + 2 2 v2 + + r r vr = 0.
Now, subtracting 1 times the first equation from the second gives
2 (2 1 )v2 + + r (r 1 )vr = 0.
By inductive hypothesis, v2 , . . . , vr are linearly independent, so i (i 1 ) = 0 for 2 i r.
But, by assumption, i 1 6= 0 for i > 1, so we must have i = 0 for i > 1. But then
1 v1 = 0, so 1 is also zero. Thus i = 0 for all i, which proves that v1 , . . . , vr are linearly
independent.
2
Corollary 2.3 If the linear operator T : V V (or equivalently the n n matrix A) has n
distinct eigenvalues, where n = dim(V ), then T (or A) is diagonalisable.
Proof: Under the hypothesis, there are n linearly independent eigenvectors, which therefore
form a basis of V . The result follows from Theorem 2.1.
2

2.2

The Cayley-Hamilton theorem

This theorem says that a matrix satisfies its own characteristic equation. It is easy to visualise
with the following non-proof:
cA (A) = det(A AI) = det(0) = 0.

This argument is faulty because you cannot really plug the matrix A into det(A xI): you
must compute this polynomial first.
Theorem 2.4 (Cayley-Hamilton) Let cA (x) be the characteristic polynomial of the n n
matrix A over an arbitrary field K. Then cA (A) = 0.
Proof: Recall from MA106 that, for any n n matrix B, we have Badj(B) = det(B)In ,
where adj(B) is the n n matrix whose (j, i)-th entry is the cofactor cij = (1)i+j det(Bij ),
and Bij is the (n 1) (n 1) matrix obtained by deleting the i-th row and the j-th column
of B.
By definition, cA (x) = det(A xIn ), and (A xIn )adj(A xIn ) = det(A xIn )In . Now
det(A xIn ) is a polynomial of degree n in x; that is det(A xIn ) = a0 x0 + a1 x1 + . . . an xn ,
with ai K. Similarly, putting B = A xIn in the last paragraph, we see that the (j, i)th entry (1)i+j det(Bij ) of adj(B) is a polynomial of degree at most n 1 in x. Hence
adj(A xIn ) is itself a polynomial of degree at most n 1 in x in which the coefficients are
n n matrices over K. That is, adj(A xIn ) = B0 x0 + B1 x + . . . Bn1 xn1 , where each Bi
is an n n matrix over K. So we have
(A xIn )(B0 x0 + B1 x + . . . Bn1 xn1 ) = (a0 x0 + a1 x1 + . . . an xn )In .

Since this is a polynomial identity, we can equate coefficients of the powers of x on the left
and right hand sides. In the list of equations below, the equations on the left are the result
of equating coefficients of xi for 0 i n, and those on right are obtained by multiplying
Ai by the corresponding left hand equation.
6

AB0
AB1
AB2

B0
B1

ABn1 Bn2
Bn1

=
a0 I n ,
=
a1 I n ,
=
a2 I n ,

= an1 In ,
=
an I n ,

AB0
A2 B1
A3 B2

AB0
A2 B1

An Bn1 An1 Bn2


An Bn1

=
a0 I n
=
a1 A
=
a2 A2

= an1 An1
=
an An

Now summing all of the equations in the right hand column gives
0 = a0 A0 + a1 A + . . . + an1 An1 + an An
(remember A0 = In ), which says exactly that cA (A) = 0.

By the correspondence between linear maps and matrices, we also have cA (T ) = 0.

2.3

The minimal polynomial

We start this section with a brief general discussion of polynomials in a single variable x with
coefficients in a field K, such as p = p(x) = 2x2 3x + 11. The set of all such polynomials
is denoted by K[x]. There are two binary operations on this set: addition and multiplication
of polynomials. These operations turn K[x] into a ring, which will be studied in great detail
in Algebra-II.
As a ring K[x] has a number of properties in common3 with the integers Z. The notation
a|b mean a divides b. It can be applied to integers: e.g. 3|12; and also to polynomials: e.g.
(x 3)|(x2 4x + 3).

We can divide one polynomial p (with p 6= 0) into another polynomial q and get a remainder
with degree less than p. For example, if q = x5 3, p = x2 + x + 1, then we find q = sp + r
with s = x3 x2 + 1 and r = x 4. For both Z and K[x], this is known as the Euclidean
Algorithm.
A polynomial r is said to be a greatest common divisor of p, q K[x] if r|p, r|q, and, for any
polynomial r with r |p, r |q, we have r |r. Any two polynomials p, q K[x] have a greatest
common divisor and a least common multiple (which is defined similarly), but these are only
determined up to multiplication by a constant. For example, x 1 is a greatest common
divisor of x2 2x + 1 and x2 3x + 2, but so is 1 x and 2x 2. To resolve this ambiguity,
we make the following definition.
Definition. A polynomial with coefficients in a field K is called monic if the coefficient of
the highest power of x is 1.
For example, x3 2x2 + x + 11 is monic, but 2x2 x 1 is not.

Now we can define gcd(p, q) to be the unique monic greatest common divisor of p and q, and
similarly for lcm(p, q).
As with the integers, we can use the Euclidean Algorithm to compute gcd(p, q). For example,
if p = x4 3x3 + 2x2 , q = x3 2x2 x + 2, then p = q(x 1) + r with r = x2 3x + 2, and
q = r(x + 1), so gcd(p, q) = r.
Theorem 2.5 Let A be an n n matrix over K representing the linear operator T : V V .
The following statements hold:
(i) there is a unique monic non-zero polynomial p(x) with minimal degree and coefficients
in K such that p(A) = 0,
3

Technically speaking, they are both Euclidean Domains that is an important topic in Algebra-II.

(ii) if q(x) is any polynomial with q(A) = 0, then p|q.


Proof: (i) If we have any polynomial p(x) with p(A) = 0, then we can make p monic
by multiplying it by a constant. By Theorem 2.4, there exists such a p(x), namely cA (x).
If we had two distinct monic polynomials p1 (x), p2 (x) of the same minimal degree with
p1 (A) = p2 (A) = 0, then p = p1 p2 would be a non-zero polynomial of smaller degree with
p(A) = 0, contradicting the minimality of the degree, so p is unique.
(ii) Let p(x) be the minimal monic polynomial in (i) and suppose that q(A) = 0. As we saw
above, we can write q = sp + r where r has smaller degree than p. If r is non-zero, then
r(A) = q(A) s(A)p(A) = 0 contradicting the minimality of p, so r = 0 and p|q.
2
Definition. The unique monic polynomial A (x) of minimal degree with A (A) = 0 is
called the minimal polynomial of A or of the corresponding linear operator T . (Note that
p(A) = 0 p(T ) = 0 for p K[x].)
By Theorem 2.4 and Theorem 2.5 (ii), we have:

Corollary 2.6 The minimal polynomial of a square matrix A divides its characteristic polynomial.
Similar matrices A and B represent the same linear operator T , and so their minimal polynomial is the same as that of T . Hence we have
Proposition 2.7 Similar matrices have the same minimal polynomial.
For a vector v V , we can also define a relative minimal polynomial A,v as the unique
monic polynomial p of minimal degree for which p(T )(v) = 0V . Since p(T ) = 0 if and only
if p(T )(v) = 0V for all v V , A is the least common multiple of the polynomials A,v for
all v V .

But p(T )(v) = 0V for all v V if and only if p(T )(bi ) = 0V for all bi in a basis b1 , . . . , bn
of V (exercise), so A is the least common multiple of the polynomials A,bi .

This gives a method of calculating A . For any v V , we can compute A,v by calculating the
sequence of vectors v, T (v), T 2 (v), T 3 (v) and stopping when it becomes linearly dependent.
In practice, we compute T (v) etc. as Av for the corresponding column vector v K n,1 .

For example, let K = R and

3 1 0 1
1
1 0 1
.
A=
0
0 1 0
0
0 0 1

Using the standard basis b1 = (1 0 0 0)T , b2 = (0 1 0 0)T , b1 = (0 0 1 0)T , b4 = (0 0 0 1)T


of R4,1 , we have:
Ab1 = (3 1 0 0)T , A2 b1 = A(Ab1 ) = (8 4 0 0)T = 4Ab1 4b1 , so (A2 4A + 4)b1 = 0, and
hence A,b1 = x2 4x + 4 = (x 2)2 .

Ab2 = (1 1 0 0)T , A2 b2 = (4 0 0 0)T = 4Ab2 4b2 , so A,b2 = x2 4x + 4.

Ab3 = b3 , so A,b3 = x 1.

Ab4 = (1 1 0 1)T , A2 b4 = (3 3 0 1)T = 3Ab4 2b4 , so A,b4 = x2 3x + 2 = (x 2)(x 1).

So we have A = lcm(A,b1 , A,b2 , A,b3 , A,b4 ) = (x 2)2 (x 1).

2.4

Jordan chains and Jordan blocks

The Cayley-Hamilton theorem and the theory of minimal polynomials are valid for any matrix
over an arbitrary field K, but the theory of Jordan forms will require an additional assumption
that the characteristic polynomial cA (x) is split in K[x], i.e. it factorises into linear factors. If
the field K = C then all polynomials in K[x] factorise into linear factors by the Fundamental
Theorem of Algebra and JCF works for any matrix.
Definition. A Jordan chain of length k is a sequence of non-zero vectors v1 , . . . , vk K n,1
that satisfies
Av1 = v1 , Avi = vi + vi1 , 2 i k,

for some eigenvalue of A.

Equivalently, (A In )v1 = 0 and (A In )vi = vi1 for 2 i k, so (A In )i vi = 0 for


1 i k.

It is instructive to keep in mind the following model of a Jordan chain that works over
complex or real field. Let V be the vector space of functions in the form f (z)ez where f (z)
is the polynomial of degree less than k. Consider the derivative, that is, the linear operator
T : V V given by T ((z)) = (z). Vectors vi = z i1 ez /(i 1)! form the Jordan chain for
T and the basis of V . In particular, the matrix of T in this basis is the Jordan block defined
below.
Definition. A non-zero vector v V such that (A In )i v = 0 for some i > 0 is called a
generalised eigenvector of A with respect to the eigenvalue .
Note that, for fixed i > 0, { v V | (A In )i v = 0 } is the nullspace of (A In )i , and is
called the generalised eigenspace of index i of A with respect to . When i = 1, this is the
ordinary eigenspace of A with respect to .
Notice that v V is an eigenvector with eigenvalue if and only if A,v = x . Similarly,
generalised eigenvectors are characterised by the property A,v = (x )i .

For example, consider the matrix

3 1 0
A = 0 3 1.
0 0 3

We see that, for the standard basis of K 3,1 , we have Ab1 = 3b1 , Ab2 = 3b2 + b1 , Ab3 =
3b3 + b2 , so b1 , b2 , b3 is a Jordan chain of length 3 for the eigenvalue 3 of A. The generalised
eigenspaces of index 1, 2, and 3 are respectively hb1 i, hb1 , b2 i, and hb1 , b2 , b3 i.
Notice that the dimension of a generalised eigenspace of A is the nullity of (T IV )i , which
is a a function of the linear operator T associated with A. Since similar matrices represent
the same linear operator, we have
Proposition 2.8 The dimensions of corresponding generalised eigenspaces of similar matrices are the same.
Definition. We define a Jordan block with eigenvalue of degree k to be a k k matrix
J,k = (ij ), such that ii = for 1 i k, i,i+1 = 1 for 1 i < k, and ij = 0 if j is not
equal to i or i + 1. So, for example,

3i
0 1 0 0


1
0
2
0 0 1 0
1 1
3i

and
J0,4 =
1 ,
J1,2 =
,
J,3 = 0
2
0 0 0 1
0 1
3i
0
0
2
0 0 0 0
9

are Jordan blocks, where =

3i
2

in the second example.

It should be clear that the matrix of T with respect to the basis v1 , . . . , vn of K n,1 is a Jordan
block of degree n if and only if v1 , . . . , vn is a Jordan chain for A.
Note also that for A = J,k , A,vi = (x )i , so A = (x )k . Since J,k is an upper
triangular matrix with entries on the diagonal, we see that the characteristic polynomial
cA of A is also equal to ( x)k .

Warning: Some authors put the 1s below rather than above the main diagonal in a Jordan
block. This corresponds to either writing the Jordan chain in the reversed order or using
rows instead of columns for the standard vector space. However, if an author does both (uses
rows and reverses the order) then 1s will go back above the diagonal.

2.5

Jordan bases and the Jordan canonical form

Definition. A Jordan basis for A is a basis of K n,1 which is a disjoint union of Jordan chains.
We denote the m n matrix in which all entries are 0 by 0m,n . If A is an m m matrix and
B an n n matrix, then we denote the (m + n) (m + n) matrix with block form


A
0m,n
,
0n,m
B
by A B. For example

So, if



1 1 1
0
1 2
1 0
1 =
0
0 1
0
2 0 2
0

2
1
0
0
0

0
0
1
1
2

0
0
0
0

1 1
.
0
1
0 2

w1,1 , . . . , w1,k1 , w2,1 , . . . , w2,k2 , . . . , ws,1 , . . . , ws,ks


is a Jordan basis for A in which wi,1 , . . . , wi,ki is a Jordan chain for the eigenvalue i for 1 i
s, then the matrix of T with respect to this basis is the direct sum J1 ,k1 J2 ,k2 Js ,ks
of the corresponding Jordan blocks.
We can now state the main theorem of this section, which says that Jordan bases exist.
Theorem 2.9 Let A be an n n matrix over K such that cA (x) splits into linear factors in
K[x]. Then there exists a Jordan basis for A, and hence A is similar to a matrix J which is
a direct sum of Jordan blocks. The Jordan blocks occurring in J are uniquely determined by
A.
The matrix J in the theorem is said to be the Jordan canonical form (JCF) or sometimes
Jordan normal form of A. It is uniquely determined by A up to the order of the blocks.
We will prove the theorem later. First we derive some consequences and study methods for
calculating the JCF of a matrix. As we have discussed before, polynomials over C always
split. The gives the following corollary.
Corollary 2.10 Let A be an n n matrix over C. Then there exists a Jordan basis for A.
The proof of the following corollary requires algebraic techniques beyond the scope of this
course. You can try to prove yourself after you have done Algebra-II4 . The trick is to find
4

Or you can take Galois Theory next year and this should become obvious.

10

a field extension F K such


 that cA (x) splits in F [x]. For example, consider the rotation
0 1
by 90 degrees matrix A =
. Since cA (x) = x2 + 1, its eigenvalues are imaginary
1 0
numbers
i and i. Hence, it admits no JCF over R but over complex numbers it has JCF

i 0
.
0 i
Corollary 2.11 Let A be an nn matrix over K. Then there exists a field extension F K
and a Jordan basis for A in F n,1 .
The next two corollaries are immediate5 consequences of Theorem 2.9 but they are worth
stating because of their computational significance. The first one needs Theorem 1.2 as well.
Corollary 2.12 Let A be an n n matrix over K that admits a Jordan basis. If P is the
matrix having a Jordan basis as columns, then P 1 AP is the JCF of A.
Notice that a Jordan basis is not, in general, unique. Thus, there exists multiple matrices
P such that J = P 1 AP is the JCF of A. Suppose now that the eigenvalues of A are
1 , . . . , t , and that the Jordan blocks in J for the eigenvalue i are Ji ,ki,1 , . . . , Ji ,ki,ji ,
where ki,1 ki,2 ki,ji . The final corollary follows from an explicit calculation6 for J
because both minimal and characteristic polynomials of J and A are the same.
Q
Corollary 2.13 The characteristic polynomial cA (x) = Qti=1 (i x)ki , where ki = ki,1 +
+ ki,ji for 1 i t. The minimal polynomial A (x) = ti=1 (x i )ki,1 .

2.6

The JCF when n = 2 and 3

When n = 2 and n = 3, the JCF can be deduced just from the minimal and characteristic
polynomials. Let us consider these cases.
When n = 2, we have either two distinct eigenvalues 1 , 2 , or a single repeated eigenvalue
1 . If the eigenvalues are distinct, then by Corollary 2.3 A is diagonalisable and the JCF is
the diagonal matrix J1 ,1 J2 ,1 .


1 4
Example 1. A =
. We calculate cA (x) = x2 2x 3 = (x 3)(x + 1), so there are
1 1
two distinct
3 and 1. Associated
eigenvectors
are (2 1)T and (2 1)T , so we
 eigenvalues,



2 2
3
0
put P =
and then P 1 AP =
.
1
1
0 1

If the eigenvalues are equal, then there are two possible JCF-s, J1 ,1 J1 ,1 , which is a scalar
matrix, and J1 ,2 . The minimal polynomial is respectively (x 1 ) and (x 1 )2 in these two
cases. In fact, these cases can be distinguished without any calculation whatsoever, because
in the first case A = P JP 1 = J so A is its own JCF
In the second case, a Jordan basis consists of a single Jordan chain of length 2. To find such
a chain, let v2 be any vector for which (A 1 I2 )v2 6= 0 and let v1 = (A 1 I2 )v2 . (Note
that, in practice, it is often easier to find the vectors in a Jordan chain in reverse order.)


1
4
Example 2. A =
. We have cA (x) = x2 + 2x + 1 = (x + 1)2 , so there is a single
1 3
5

This means I am not proving them here but I expect you to be able to prove them
The characteristic polynomial of J is the product of the characteristic polynomials of the Jordan blocks
and the minimal polynomial of J is the least common multiple of characteristic polynomials of the Jordan
blocks
6

11

eigenvalue 1 with multiplicity 2. Since the first column


 of A +
 I2 is non-zero, we
 can choose

2 1
1
1
T
T
1
v2 = (1 0) and v1 = (A + I2 )v2 = (2 1) , so P =
and P AP =
.
1 0
0 1

Now let n = 3. If there are three distinct eigenvalues, then A is diagonalisable.

Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and the other
has multiplicity 1. Let the eigenvalues be 1 , 1 , 2 , with 1 6= 2 . Then there are two
possible JCF-s for A, J1 ,1 J1 ,1 J2 ,1 and J1 ,2 J2 ,1 , and the minimal polynomial is
(x 1 )(x 2 ) in the first case and (x 1 )2 (x 2 ) in the second.

In the first case, a Jordan basis is a union of three Jordan chains of length 1, each of which
consists of an eigenvector of A.

2
0
0
Example 3. A = 1
5
2 . Then
2 6 2
cA (x) = (2 x)[(5 x)(2 x) + 12] = (2 x)(x2 3x + 2) = (2 x)2 (1 x).

We know from the theory above that the minimal polynomial must be (x 2)(x 1) or
(x 2)2 (x 1). We can decide which simply by calculating (A 2I3 )(A I3 ) to test whether
or not it is 0. We have

0
0
0
1
0
0
A 2I3 = 1
3
2 , A I3 = 1
4
2 ,
2 6 4
2 6 3

and the product of these two matrices is 0, so A = (x 2)(x 1).

The eigenvectors v for 1 = 2 satisfy (A 2I3 )v = 0, and we must find two linearly independent solutions; for example we can take v1 = (0 2 3)T , v2 = (1 1 1)T . An eigenvector
for the eigenvalue 1 is v3 = (0 1 2)T , so we can choose

0
1
0
1
P = 2 1
3
1 2

and then P 1 AP is diagonal with entries 2, 2, 1.

In the second case, there are two Jordan chains, one for 1 of length 2, and one for 2
of length 1. For the first chain, we need to find a vector v2 with (A 1 I3 )2 v2 = 0 but
(A 1 I3 )v2 6= 0, and then the chain is v1 = (A 1 I3 )v2 , v2 . For the second chain, we
simply need an eigenvector for 2 .

3
2
1
Example 4. A = 0
3
1 . Then
1 4 1
cA (x) = (3 x)[(3 x)(1 x) + 4] 2 + (3 x) = x3 + 5x2 8x + 4 = (2 x)2 (1 x),

as in Example 3. We have

2
2
1
0
0
0
1
2
1
2
1.
A2I3 = 0
1
1 , (A2I3 )2 = 1 3 2 , (AI3 ) = 0
1 4 2
2
6
4
1 4 3

and we can check that (A 2I3 )(A I3 ) is non-zero, so we must have A = (x 2)2 (x 1).

12

For the Jordan chain of length 2, we need a vector with (A2I3 )2 v2 = 0 but (A2I3 )v2 6= 0,
and we can choose v2 = (2 0 1)T . Then v1 = (A 2I3 )v2 = (1 1 1)T . An eigenvector for
the eigenvalue 1 is v3 = (0 1 2)T , so we can choose

1
2
0
P = 1
0
1
1 1 2
and then

2 1 0
P 1 AP = 0 2 0 .
0 0 1

Finally, suppose that there is a single eigenvalue, 1 , so cA = (1 x)3 . There are three
possible JCF-s for A, J1 ,1 J1 ,1 J1 ,1 , J1 ,2 J1 ,1 , and J1 ,3 , and the minimal polynomials
in the three cases are (x 1 ), (x 1 )2 , and (x 1 )3 , respectively.
In the first case, J is a scalar matrix, and A = P JP 1 = J, so this is recognisable immediately.

In the second case, there are two Jordan chains, one of length 2 and one of length 1. For the
first, we choose v2 with (A 1 I3 )v2 6= 0, and let v1 = (A 1 I3 )v2 . (This case is easier
than the case illustrated in Example 4, because we have (A 1 I3 )2 v = 0 for all v C3,1 .)
For the second Jordan chain, we choose v3 to be an eigenvector for 1 such that v2 and v3
are linearly independent.

0
2
1
Example 5. A = 1 3 1 . Then
1
2
0
cA (x) = x[(3 + x)x + 2] 2(x + 1) 2 + (3 + x) = x3 3x2 3x 1 = (1 + x)3 .

We have

1
2
1
A + I3 = 1 2 1 ,
1
2
1

and we can check that (A + I3 )2 = 0. The first column of A + I3 is non-zero, so (A +


I3 )(1 0 0)T 6= 0, and we can choose v2 = (1 0 0)T and v1 = (A + I3 )v2 = (1 1 1)T . For v3
we need to choose a vector which is not a multiple of v1 such that (A + I3 )v3 = 0, and we
can choose v3 = (0 1 2)T . So we have

1 1
0
P = 1 0
1
1 0 2
and then

1
1
0
P 1 AP = 0 1
0 .
0
0 1

In the third case, there is a single Jordan chain, and we choose v3 such that (A1 I3 )2 v3 6= 0,
v2 = (A 1 I3 )v3 , v1 = (A 1 I3 )2 v3 .

0
1
0
1 . Then
Example 6. A = 1 1
1
0 2
cA (x) = x[(2 + x)(1 + x)] (2 + x) + 1 = (1 + x)3 .
13

We have

1 1
0
0
1
1
1 , (A + I3 )2 = 0 1 1 ,
A + I3 = 1 0
1 0 1
0
1
1

so (A + I3 )2 6= 0 and A = (x + 1)3 . For v3 , we need a vector that is not in the nullspace of


(A + I3 )2 . Since the second column, which is the image of (0 1 0)T is non-zero, we can choose
v3 = (0 1 0)T , and then v2 = (A + I3 )v3 = (1 0 0)T and v1 = (A + I3 )v2 = (1 1 1)T . So
we have

1 1 0
P = 1 0 1
1 0 0
and then

2.7

The general case

1
1
0
P 1 AP = 0 1
1 .
0
0 1

For dimensions higher than 3, we cannot always determine the JCF just from the characteristic and minimal polynomials. For example, when n = 4, J,2 J,2 and J,2 J,1 J,1
both have cA = ( x)4 and A = (x )2 ,

In general, we can compute the JCF from the dimensions of the generalised eigenspaces.

Let J,k be a Jordan block and let A = J,k Ik . Then we calculate that, for 1 i < k,
Ai has (k i) 1s on the i-th diagonal upwards from the main diagonal, and Ak = 0. For
example, when k = 4,

0 0 0 1
0 0 1 0
0 1 0 0

0 0 1 0
, A2 = 0 0 0 1 , A3 = 0 0 0 0 , A4 = 0.
A=
0 0 0 0
0 0 0 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0

(A matrix A for which Ak = 0 for some k > 0 is called nilpotent.)

It should be clear from this that, for 1 i k, rank(Ai ) = k i, so nullity(Ai ) = i, and for
i k, rank(Ai ) = 0, nullity(Ai ) = k.

On the other hand, if 6= and A = J,k Ik , then, for any integer i, Ai is an upper
triangular matrix with non-zero entries ( )i on the diagonal, and so rank(Ai ) = k,
nullity(Ai ) = 0.
It is easy to see that, for square matrices A and B, rank(A B) = rank(A) + rank(B) and
nullity(A B) = nullity(A) + nullity(B). So, for a matrix J in JCF, we can determine the
sizes of the Jordan blocks for an eigenvalue of J from a knowledge of the nullities of the
matrices (J )i for i > 0.

For example, suppose that J = J2,3 J2,3 J2,1 J1,2 . Then nullity(J + 2I9 ) = 3,
nullity(J + 2I9 )2 = 5, nullity(J + 2I9 )i = 7 for i 3, nullity(J I9 ) = 1, nullity(J I9 )i = 2
for i 2.

First observe that the total number of Jordan blocks with eigenvalue is equal to
nullity(J In ).

More generally, the number of Jordan blocks J,j for with j i is equal to
nullity((J In )i ) nullity((J In )i1 ).
14

The nullspace of (J In )i was defined earlier to be the generalised eigenspace of index i of


J with respect to the eigenvalue . If J is the JCF of a matrix A, then A and J are similar
matrices, so it follows from Proposition 2.8 that nullity((J In )i ) = nullity((A In )i ).
So, summing up, we have:

Theorem 2.14 Let be an eigenvalue of a matrix A and let J be the JCF of A. Then
(i) The number of Jordan blocks of J with eigenvalue is equal to nullity(A In ).
(ii) More generally, for i > 0, the number of Jordan blocks of J with eigenvalue and degree
at least i is equal to nullity((A In )i ) nullity((A In )i1 ).
Note that this proves the uniqueness part of Theorem 2.9.

2.8

Examples

2
0
0
0
0 2
1
0
. Then cA (x) = (2 x)4 , so there is a single
Example 7. A =
0
0 2
0
1
0 2 2

0 0
0 0
0 0
1 0
, and (A + 2I4 )2 = 0,
eigenvalue 2 with multiplicity 4. We find (A + 2I4 ) =
0 0
0 0
1 0 2 0
2
so A = (x + 2) , and the JCF of A could be J2,2 J2,2 or J2,2 J2,1 J2,1 .

To decide which case holds, we calculate the nullity of A + 2I4 which, by Theorem 2.14, is
equal to the number of Jordan blocks with eigenvalue 2. Since A+2I4 has just two non-zero
rows, which are distinct, its rank is clearly 2, so its nullity is 4 2 = 2, and hence the JCF
of A is J2,2 J2,2 .

A Jordan basis consists of a union of two Jordan chains, which we will call v1 , v2 , and
v3 , v4 , where v1 and v3 are eigenvectors and v2 and v4 are generalised eigenvectors of index
2. To find such chains, it is probably easiest to find v2 and v4 first and then to calculate
v1 = (A + 2I4 )v2 and v3 = (A + 2I4 )v4 .
Although it is not hard to find v2 and v4 in practice, we have to be careful, because they
need to be chosen so that no linear combination of them lies in the nullspace of (A + 2I4 ).
In fact, since this nullspace is spanned by the second and fourth standard basis vectors, the
obvious choice is v2 = (1 0 0 0)T , v4 = (0 0 1 0)T , and then v1 = (A + 2I4 )v2 = (0 0 0 1)T ,
v3 = (A + 2I4 )v4 = (0 1 0 2)T , so to transform A to JCF, we put

2
1
0
0
0 2 0 1
0 1
0 0

0 0
0
0
1 0
.
, P 1 = 1 0 0 0 , P 1 AP = 0 2
P =

0 0
0 1 0 0
0
0 2
1
0 1
0 0 1 0
0
0
0 2
1 0 2 0

1 3 1
0
0
2
1
0
. Then cA (x) = (1 x)2 (2 x)2 , so there are two
Example 8. A =
0
0
2
0
0
3
1 1
eigenvalue 1, 2, both with multiplicity 2. There are four possibilities for the JCF (one or
two blocks for each of the two eigenvalues). We could determine the JCF by computing the
minimal polynomial A but it is probably easier to compute the nullities of the eigenspaces

15

and use Theorem 2.14. We have

0 3 1 0
3 3 1
0
9
9
0

3
1
0
0
1
0
0
0
, (A2I4 ) =
, (A2I4 )2 =
A+I4 =
0

0
3 0
0
0
0
0
0
0
0
3
1 0
0
3
1 3
0 9

0
0
0
0

The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two Jordan blocks
with eigenvalue 1. The three non-zero rows of (A 2I4 ) are linearly independent, so its
rank is 3, hence its nullity 1, so there is just one Jordan block with eigenvalue 2, and the JCF
of A is J1,1 J1,1 J2,2 .

For the two Jordan chains of length 1 for eigenvalue 1, we just need two linearly independent
eigenvectors, and the obvious choice is v1 = (1 0 0 0)T , v2 = (0 0 0 1)T . For the Jordan
chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace of (A 2I4 )2 but not in
the nullspace of A 2I4 . (This is why we calculated (A 2I4 )2 .) An obvious choice here is
v4 = (0 0 1 0)T , and then v3 = (1 1 0 1)T , and to transform A to JCF, we put

1
1 0 0
1
0 0 0
1 0 1 0

0 0
1 0
, P 1 = 0 1 0 1 , P 1 AP = 0 1 0 0 .
P =

0 0
0
1 0 0
0
0 2 1
0 1
0 1
1 0
0
0 1 0
0
0 0 2

2.9

Proof of Theorem 2.9 (non-examinable)

We proceed by induction on n = dim(V ). The case n = 1 is clear.


Let be an eigenvalue of T and let U = im(T IV ) and m = dim(U ). Then m =
rank(T IV ) = n nullity(T IV ) < n, because the eigenvectors for lie in the nullspace
of T IV . For u U , we have u = (T IV )(v) for some v V , and hence T (u) =
T (T IV )(v) = (T IV )T (v) U . So T restricts to TU : U U , and we can apply our
inductive hypothesis to TU to deduce that U has a basis e1 , . . . , em , which is a disjoint union
of Jordan chains for TU .
We now show how to extend the Jordan basis of U to one of V . We do this in two stages. For
the first stage, suppose that l of the Jordan chains of TU are for the eigenvalue (possibly
l = 0). For each such chain v1 , . . . , vk with T (v1 ) = v1 , T (vi ) = vi + vi1 , 2 i k,
since vk U = im(T IV ), we can find vk+1 V with T (vk+1 ) = vk+1 + vk , thereby
extending the chain by an extra vector. So far we have adjoined l new vectors to the basis,
by extending the length l Jordan chains by 1. Let us call these new vectors w1 , . . . , wl .
For the second stage, observe that the first vector in each of the l chains lies in the eigenspace
of TU for . We know that the dimension of the eigenspace of T for is the nullspace of
(T IV ), which has dimension n m. So we can adjoin (n m) l further eigenvectors
of T to the l that we have already to complete a basis of the nullspace of (T IV ). Let us
call these (n m) l new vectors wl+1 , . . . , wnm . They are adjoined to our basis of V in
the second stage. They each form a Jordan chain of length 1, so we now have a collection of
n vectors which form a disjoint union of Jordan chains.
To complete the proof, we need to show that these n vectors form a basis of V , for which is
it is enough to show that they are linearly independent.
Partly because of notational difficulties, we provide only a sketch proof of this, and leave the
details to the student. Suppose that 1 w1 + nm wnm + x = 0, where x is a linear
combination of the basis vectors of U . Applying T In gives
1 (T In )(w1 ) + + l (T In )(wl ) + (T In )(x).
16

0
0
.
0
9

Each of i (T In )(wi ) for 1 i l is the last member of one of the l Jordan chains for
TU . When we apply (T In ) to one of the basis vectors of U , we get a linear combination
of the basis vectors of U other than i (T In )(wi ) for 1 i l. Hence, by the linear
independence of the basis of U , we deduce that i = 0 for 1 i l This implies that
(T In )(x) = 0, so x is in the eigenspace of TU for the eigenvalue . But, by construction,
wl+1 , . . . , wnm extend a basis of this eigenspace of TU to that the eigenspace of V , so we
also get i = 0 for l + 1 i n m, which completes the proof.

2.10

Powers of matrices

The theory we developed can be used


to compute powers
of matrices efficiently. Suppose we
2
0
0
0
0 2
1
0
from section 2.8.
need to compute A2012 where A =
0
0 2
0
1
0 2 2

There are two practical ways of computing An for a general matrix. The first one involves
Jordan forms. If J = P 1 AP is the JCF of A then it is sufficient to compute J n because of
the telescoping product:
An = (P JP 1 )n = P JP 1 P JP 1 P . . . JP 1 = P J n P 1 .

Jk1 ,1
0
...
0
Jk1 ,1
0
...
0
0

Jkn2 ,2 . . .
0
Jk2 ,2 . . .
0
then J n = 0
.
If J =

...
...
n
0
0
. . . Jkt ,t
0
0
. . . Jkt ,t

Finally, the power of an individual Jordan block can be computed as


n

n nk+2 C n nk+1
nn1 . . . Ck2
k1
n nk+3 C n nk+2
0
n
. . . Ck3
k2

..
..
..
..
n
= ...
Jk,

.
.
.
.

n
n1
0

0
...

n
n
0
0
...
0

where Ctn = n!/(n t)!t! is the Choose-function, interpreted as Ctn = 0 whenever t > n.
Let us apply it to the matrix from example 7 in 2.8:


n
0 1
0 0
2
1
0
0
0

0
0
1
0
0
2
0
0

1
An = P J n P 1 =
0 0

0 1
0
0 2
1 0
1 0 2 0
0
0
0 2
0


0 1
0 0
0
(2)n n(2)n1
0
0
n
0 0
1
0
1
0
(2)
0
0


0 0
0 1 0
0
(2)n n(2)n1 0
1 0 2 0
0
0
0
(2)n
0

(2)n
0
0
0
n n(2)n1

0
(2)
0

.
n

0
0
(2)
0
n(2)n1
0
n(2)n (2)n

2
0
1
0

0
0
0
1

2
0
1
0

0
0
0
1

1
0
=
0
0

1
0
=
0
0

The second method of computing An uses Lagranges interpolation polynomial. It is less


labour intensive and more suitable for pen-and-paper calculations. Suppose (A) = 0 for a
17

polynomial (z), in practice, (z) is either minimal or characteristic polynomial. Dividing


with a remainder z n = q(z)(z) + h(z), we conclude that
An = q(A)(A) + h(A) = h(A).
Division with a remainder may appear problematic7 for large n but there is a shortcut. If we
know the roots of f (z), say 1 , . . . k with their multiplicities m1 , . . . mk , then h(z) can be
found by solving the system of simultaneous equations in coefficients of h(z):
f (t) (j ) = h(t) (j ), 1 j k, 0 t < mj
where f (z) = z n and f (t) = f (t1) is the t-th derivative. In other words, h(z) is Lagranges
interpolation polynomial for the function z n at the roots of f (z).
We know that A (z) = (z + 2)2 for the matrix A above. Suppose the Lagrange interpolation
of z n at the roots of (z + 2)2 is h(z) = z + . The condition on the coefficients is given by

(2)n
= h(2) = 2 +
n(2)n1 = h (2) =

Solving them gives = n(2)n1 and = (1 n)(2)n . It follows that

(2)n
0
0
0

0 (2)n n(2)n1
0
.
An = n(2)n1 A + (1 n)(2)n I =
n

0
0
(2)
0
n(2)n1
0
n(2)n (2)n

2.11

Applications to difference equations

Let us consider an initial value problem for an autonomous system with discrete time:
x(n + 1) = Ax(n), n N, x(0) = w.
Here x(n) K m is a sequence of vectors in a vector space over a field K. One thinks of x(n)
as a state of the system at time n. The initial state is x(0) = w. The n n-matrix A with
coefficients in K describes the evolution of the system. The adjective autonomous means
that the evolution equation does not change with the time8 .
It takes longer to formulate this problem then to solve it. The solution is a no-brainer:
x(n) = Ax(n 1) = A2 x(n 2) = . . . = An x(0) = An w.
As a working example, let us consider a 2-step linearly recursive sequence. It is determined
by a quadruple (a, b, c, d) K 4 and the rules
s0 = a, s1 = b, sn = csn1 + dsn2 for n 2.
Such sequences are ubiquitous. Arithmetic sequences form a subclass with c = 2, d = 1.
In general, (a, b, 2, 1) determines the arithmetic sequence starting at a with the difference
b a. For instance, (0, 1, 2, 1) determines the sequence of natural numbers sn = n.

A geometric sequence starting at a with ratio q admits a non-unique description. One obvious quadruples giving it is (a, aq, q, 0). However, it is conceptually better to use quadruple
(a, aq, 2q, q 2 ) because the sequences coming from (a, b, 2q, q 2 ) include both arithmetic and
geometric sequences and can be called arithmo-geometric sequences.
If c = d = 1 then this is a Fibonacci type sequence. For instance, (0, 1, 1, 1) determines
Fibonacci numbers Fn while (2, 1, 1, 1) determines Lucas numbers Ln .
7
8

Try to divide z 2012 by z 2 + z + 1 without reading any further.


A nonautonomous system would be described by x(n + 1) = A(n)x(n) here.

18

All of these examples admit closed9 formulae for a generic term sn . Can we find a closed
formula for sn , in general? Yes, we can because this problem is reduced to an initial value
problem with discrete time if we set


 


sn
a
0 1
x(n) =
, w=
, A=
.
sn+1
b
c d
Computing
polynomial, cA (z) = z 2 cz d. If c2 + 4d = 0, the JCF of
 the characteristic

c/2 1
A is J =
. Let q = c/2. Then d = q 2 and we are dealing with the arithmo0 c/2
geometric sequence (a,b, 2q, q 2 ). Let us find the closed formula for sn in 
this caseusing

0
1
0
1
Jordan forms. As A =
one can choose the Jordan basis e2 =
, e1 =
.
q 2 2q
1
q




1 0
1 0
If P =
then P 1 =
and
q 1
q 1
 n



q nq n1
(1 n)q n
nq n1
n
1 n
n 1
1
A = (P JP ) = P J P = P
P =
.
0
qn
nq n+1 (1 + n)q n
This gives the closed formula for arithmo-geometric sequence we were seeking:
sn = (1 n)q n a + nq n1 b.


c2 + 4d)/2
0
and the closed formula
If
6= 0, the JCF of A is
0
(c c2 + 4d)/2
for sn will involve the sum of two geometric sequences. Let us see it through for Fibonacci
and Lucas numbers using Lagranges polynomial.
Since c = d =1, c2 + 4d = 5 and the roots

of cA (z) are thegolden ratio = (1 + 5)/2 and 1 = (1 5)/2. It is useful to observe


that 2 1 = 5 and (1 ) = 1. Let us introduce the number n = n (1 )n .
Suppose the Lagrange interpolation of z n at the roots of z 2 z 1 is h(z) = z + . The
condition on the coefficients is given by
 n

=
h()
=
+
(1 )n = h(1 ) = (1 ) +

Solving them gives = n / 5 and = n1 / 5. It follows that




n1 / 5
n / 5
n
A = A + = n / 5A + n1 / 5I2 =
.
n / 5 (n + n1 )/ 5


 
Fn
0
n
=A
Since
, it immediately implies that
Fn+1
1



Fn1 Fn
n
A =
and Fn = n / 5 .
Fn Fn+1


 
Ln
2
n
and
Similarly for the Lucas numbers, we get
=A
1
Ln+1

Ln = 2Fn1 + Fn = Fn1 + Fn+1 = (n1 + n+1 )/ 5.


c2 +4d

(c +

Closed means non-recursive, for instance, sn = a + n(b a) for the arithmetic sequence

19

2.12

Functions of matrices

P
We restrict to K = R in this section. Let us consider a power series n an z n , an R with a
positive radius of convergence . It defines a function f : (, ) R by
f (x) =

an x n

n=0

and the power series is Taylors series of f (x) at zero. In particular,


an = f [n](0) = f (n) (0)/n!,
where f [n] (z) = f (n) (z)/n! is a divided derivative. We extend the function f (z) to matrices
by the formula

X
f [n](0)An .
f (A) =
n=0

The right hand side of this formula is a matrix whose entries are series. All these series need
to converge for f (A) to be well defined. If the norm10 of A is less than then f (A) is well
defined. Alternatively, if all eigenvalues of A belong to (, ) then f (A) is well defined as
can be seen from the JCF method of computing f (A). If

Jk1 ,1
0
...
0
0
Jk2 ,2 . . .
0
= P 1 AP
J =

...
0
0
. . . Jkt ,t

is the JCF of A then

f (A) = P f (J)P 1

while

f (Jk1 ,1 )
0

0
f (Jk2 ,2 )
=P

0
0

...
0

...
0

...
. . . f (Jkt ,t )

f () f [1] () . . . f [k1]()
0
f () . . . f [k2]()
.
f (Jk, ) =

...
0
0
...
f ()

Lagranges method of computing f (A) works as well despite the fact that there is no sensible
way to divide with a remainder in analytic functions. For instance,
ez =

ez 1
ez z
ez
(z)
+
0
=
(z)
+
1
=
(z) + z
z2 + 1
z2 + 1
z2 + 1

for (z) = z 2 + 1. Thus, there are infinitely many ways to divide with a remainder as
f (z) = q(z)(z) + h(z). The point is that f (A) = h(A) only if q(A) is well defined. Notice
that the naive expression q(A) = (f (A) h(A))(A)1 involves division by zero. However,
if h(z) is the interpolation polynomial then q(A) is well defined and the calculation f (A) =
q(A)(A) + h(A) = h(A) carries through.
Let us compute eA for
the matrix A from example 7, section 2.8. Recall that Taylors series
P
x =
n
for exponent
e
n=0 x /n! converges for all x. Consequently the matrix exponent
P
n
eA =
n=0 A /n! is defined for all real m-by-m matrices.
10

this notion is beyond the scope of this module and will be discussed in Differentiation

20

Suppose the Lagrange interpolation of ez at the roots of A (z) = (z + 2)2 is h(z) = z + .


The condition on the coefficients is given by
 2
e
= h(2) = 2 +
e2 = h (2) =

Solving them gives = e2 and = 3e2 . It follows that


2

e
0
0
0
0 e2
e2
0
.
eA = e2 A + 3e2 I =
2
0
0
e
0
e2
0 2e2 e2

2.13

Applications to differential equations

Let us now consider an initial value problem for an autonomous system with continuous time:
dx(t)
= Ax(t), t [0, ), x(0) = w.
dt
Here A Rnn , w Rn are given, x : R0 Rn is a smooth function to be found. One
thinks of x(t) as a state of the system at time t. The solution to this problem is
x(t) = etA w
because, as one can easily check,
X d tn
X tn1
X tk
d
(x(t)) =
( An w) =
An w = A
Ak w = Ax(t).
dt
dt
n!
(n

1)!
k!
n
n
k

Let us consider a harmonic oscillator described by equation y (t) + y(t) = 0. The general
solution y(t) = sin(t) + cos(t) is well known. Let us obtain it using matrix exponents.
Setting




y(t)
0 1
x(t) =
, A=

1 0
y (t)
the harmonic oscillator becomes the initial value problem with a solution x(t) = etA x(0).
The eigenvalues of A are i and i. Interpolating ezt at these values of z gives the following
condition on h(z) = z +
 it
e
= h(i) =
i +
it
e
= h(i) = i +
Solving them gives = (eit eit )/2i = sin(t) and = (eit + eit )/2 = cos(t). It follows that


cos(t) sin(t)
etA = sin(t)A + cos(t)I2 =
sin(t) cos(t)

and y(t) = cos(t)y(0) + sin(t)y (0).


As another example, let us consider a system of differential equations

y1 3y3
y1 =
y1 (0) = 1
y2 =
y1 y2 6y3 , with the initial condition
y2 (0) = 1

y3 = y1 + 2y2 + 5y3
y3 (0) = 0
Using matrices


1
0 3
y1 (t)
1
x(t) = y2 (t) , w = 1 , A = 1 1 6 ,
0
1
2
5
y3 (t)

21

it becomes an initial value problem. The characteristic polynomial is cA (z) = z 3 + 5z 2


8z + 4 = (1 z)(2 z)2 . We need to interpolate etz at 1 and 2 by h(z) = z 2 + z + . At
the multiple root 2 we need to interpolate up to order 2 that involves tracking the derivative
(etz ) = tetz :
t
= h(1) =
+ +
e
2t
e
= h(2) = 4 + 2 +
2t
te
= h (2) =
4 +

Solving, = (t 1)e2t + et , = (4 3t)e2t 4et , = (2t 3)e2t + 4et . It follows that

3t 3 6t + 6 9t + 6
4 6 6
etA = e2t 3t 2 6t + 4 9t + 3 + et 2 3 3
t
2t
3t + 1
0
0
0

and


y1 (t)
1
(3 3t)e2t 2et
x(t) = y2 (t) = etA 1 = (2 3t)e2t et .
y3 (t)
te2t
0

Bilinear Maps and Quadratic Forms

3.1

Bilinear maps: definitions

Let V and W be vector spaces over a field K.


Definition. A bilinear map on W and V is a map : W V K such that
(i) (1 w1 + 2 w2 , v) = 1 (w1 , v) + 2 (w2 , v) and
(ii) (w, 1 v1 + 2 v2 ) = 1 (w, v1 ) + 2 (w, v2 )
for all w, w1 , w2 W , v, v1 , v2 V , and 1 , 2 K. Notice the difference between linear
and bilinear maps. For instance, let V = W = K. Addition is a linear map but not bilinear.
On the other hand, multiplication is bilinear but not linear.
Let us choose a basis e1 , . . . , en of V and a basis f1 , . . . , fm of W .
Let : W V K be a bilinear map, and let ij = (fi , ej ), for 1 i m, 1 j n.
Then the m n matrix A = (ij ) is defined to be the matrix of with respect to the bases
e1 , . . . , en and f1 , . . . , fm of V and W .
For v V , w W , let v = x1 e1 + + xn en and w = y1 f1 + + ym fm , and hence


x1
y1
x2
y2


n,1
m,1

v=
. K , and w = . K .
.
.
xn
ym
Then, by using the equations (i) and (ii) above, we get
m X
m X
n
n
X
X
yi ij xj = wT Av
yi (fi , ej ) xj =
(w, v) =

For example, let V = W =

R2

and use the natural basis of V . Suppose that A =

Then
((y1 , y2 ), (x1 , x2 )) = (y1 , y2 )

(2.1)

i=1 j=1

i=1 j=1

1 1
2
0
22



x1
x2

= y1 x1 y1 x2 + 2y2 x1 .


1 1
.
2
0

3.2

Bilinear maps: change of basis

We retain the notation of the previous subsection. As in Subsection 1.2 above, suppose that
of W , and let P = ( ) and Q = ( ) be
we choose new bases e1 , . . . , en of V and f1 , . . . , fm
ij
ij
the associated basis change matrices. Then, by Proposition 1.1, if v and w are the column
vectors representing the vectors v and w with respect to the bases {ei } and {fi }, we have
P v = v and Qw = w, and so
T

wT Av = w QT AP v ,
and hence, by Equation (2.1):
Theorem 3.1 Let A be the matrix of the bilinear map : W V K with respect to the
bases e1 , . . . , en and f1 , . . . , fm of V and W , and let B be its matrix with respect to the bases
of V and W . Let P and Q be the basis change matrices, as defined
e1 , . . . , en and f1 , . . . , fm
T
above. Then B = Q AP .
Compare this result with Theorem 1.2.
We shall be concerned from now on only with the case where V = W . A bilinear map
: V V K is called a bilinear form on V . Theorem 3.1 then becomes:
Theorem 3.2 Let A be the matrix of the bilinear form on V with respect to the basis
e1 , . . . , en of V , and let B be its matrix with respect to the basis e1 , . . . , en of V . Let P the
basis change matrix with original basis {ei } and new basis {ei }. Then B = P T AP .
So, in the example at
if we choose the new basis e1 = (1 1),
 of Subsection
 3.1, 
 the end
0 1
1 1
, and
, P T AP =
e2 = (1 0) then P =
2
1
1 0
((y1 e1 + y2 e2 , x1 e1 + x2 e2 )) = y1 x2 + 2y2 x1 + y2 x2 .
Definition. Matrices A and B are called congruent if there exists an invertible matrix P
with B = P T AP .
Definition. A bilinear form on V is called symmetric if (w, v) = (v, w) for all v, w V .
An n n matrix A is called symmetric if AT = A.

We then clearly have:

Proposition 3.3 The bilinear form is symmetric if and only if its matrix (with respect to
any basis) is symmetric.
The best known example is when V = Rn , and is defined by
((x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn )) = x1 y1 + x2 y2 + + xn yn .
This form has matrix equal to the identity matrix In with respect to the standard basis of
Rn . Geometrically, it is equal to the normal scalar product (v, w) = |v||w| cos , where is
the angle between the vectors v and w.

3.3

Quadratic forms: introduction

A quadratic form on the standard vector space K n is a polynomial function of several variables
x1 , . . . xn in which each term has total degree two, such as 3x2 + 2xz + z 2 4yz + xy. One
23

0.8

0.6

0.4
y
0.2

0.8

0.6

0.4

0.2

0.2

0.4
x

0.6

0.8

0.2

0.4

0.6

0.8

Figure 1: 5x2 + 5y 2 6xy = 2


motivation comes to study them comes from the geometry of curves or surfaces defined by
quadratic equations. Consider, for example, the equation 5x2 + 5y 2 6xy = 2 (see Fig. 1).

This represents an ellipse, in which the two principal axes are at an angle of /4 with the
x- and y-axes. To study such curves in general, it is desirable to change variables (which
will turn out to be equivalent to a change of basis) so as to make the principal axes of the
ellipse coincide with the x- and y-axes. This is equivalent to eliminating the xy-term in the
equation. We can do this easily by completing the square.
In the example
5x2 + 5y 2 6xy = 2 5(x 3y/5)2 9y 2 /5 + 5y 2 = 2 5(x 3y/5)2 + 16y 2 /5 = 2
so if we change variables, and put x = x 3y/5 and y = y, then the equation becomes
5x2 + 16y 2 /5 = 2 (see Fig. 2).

Here we have allowed an arbitrary basis change. We shall be studying this situation in
Subsection 3.5.
One disadvantage of doing this is that the shape of the curve has become distorted. If we wish
to preserve the shape, then we should restrict our basis changes to those that preserve distance
and angle. These are called orthogonal basis changes, and we shall be studying that situation

in Subsection3.6. In the example, we can use the change of variables x = (x + y)/ 2,


y = (x y)/ 2 (which represents a non-distorting rotation through an angle of /4), and
the equation becomes x2 + 4y 2 = 1. See Fig. 3.

24

0.8

0.6

0.4
y
0.2

0.6

0.4

0.2

0.2

0.4

0.6

0.2

0.4

0.6

0.8

Figure 2: 5x2 + 16y 2 /5 = 2

0.4
y
0.2

0.5

0.2

0.4

Figure 3: x2 + 4y 2 = 1

25

0.5
x

3.4

Quadratic forms: definitions

Definition. Let V be a vector space over the field K. A quadratic form on V is a function
q : V K that is defined by q(v) = (v, v), where : V V K is a bilinear form.

As this is the official definition of a quadratic form we will use, we do not really need to
observe that it yields the same notion for the standard vector space K n as the definition in
the previous section. However, it is a good exercise that an inquisitive reader should definitely
do. The key is to observe that the function xi xj comes from the bilinear form i,j such that
i,j (ei , ej ) = 1 and zero elsewhere.

In Proposition 3.4 we need to be able to divide by 2 in the field K. This means that we must
assume11 that 1 + 1 6= 0 in K. For example, we would like to avoid the field of two elements.
If you prefer to avoid worrying about such technicalities, then you can safely assume that K
is either Q, R or C.
Let us consider the following three sets. The first set Q(V, K) consists of all quadratic forms
on V . It is a subset of the set of all functions from V to K. The second set Bil(V V, K)
consists of all bilinear forms on V . It is a subset of the set of all functions from V V to K.
Finally, we need Sym(V V, K), the subset of Bil(V V, K) consisting of symmetric bilinear
forms.
There are two interesting functions connecting these sets. We have already defined a square
function : Bil(V V, K) Q(V, K) by ( )(v) = (v, v). The second function :
Q(V, K) Bil(V V, K) is a polarisation12 defined by (q)(u, v) = q(u + v) q(u) q(v).
Proposition 3.4 The following statements hold for all q Q(V, K) and Sym(V
V, K):
(i)
(ii)
(iii)
(iv)

(q) Sym(V V, K),


((q)) = 2q,
(( )) = 2 ,
if 1+1 6= 0 K then there are natural13 bijections between Q(V, K) and Sym(V V, K).

Proof: Observe that q = ( ) for some bilinear form . For u, v V , q(u + v) =


(u + v, u + v) = (u, u) + (v, v) + (u, v) + (v, u) = q(u) + q(v) + (u, v) + (v, u).
It follows that (q)(u, v) = (u, v) + (v, u) and that (q) is a symmetric bilinear form.
Besides it follows that (( )) = 2 if is symmetric.
Since q(v) = 2 v for all K, v V , ((q))(v) = (q)(v, v) = q(2v) q(v) q(v) =
2q(v). Finally, if we can divide by 2 then and /2 defined by /2(q)(u, v) = (q(u + v)
q(u)q(v))/2 provide inverse bijection between symmetric bilinear forms on V and quadratic
forms on V .
2
Let e1 , . . . , en be a basis of V . Recall that the coordinates
of v with respect to this basis are
Pn
defined to be the field elements xi such that v = i=1 xi ei .

Let A = (ij ) be the matrix of with respect to this basis. We will also call A the matrix
of q with respect to this basis. Then A is symmetric because is, and by Equation (2.1) of
Subsection 3.1, we have
11

Fields with 1 + 1 = 0 are fields of characteristic 2. One can actually do quadratic and bilinear forms over
them but the theory is quite specific. It could be a good topic for a second year essay.
12
Some authors call it linearisation.
13
There is a precise mathematical way of defining natural using Category Theory but it is far beyond the
scope of this course. The only meaning we can endow this word with is that we do not make any choices for
this bijection.

26

q(v) = v Av =

n
n X
X

xi ij xj =

n
X
i=1

i=1 j=1

ii x2i

+2

i1
n X
X

ij xi xj .

(3.1)

i=1 j=1

When n 3,we shall usually write x, y, z instead of x1 , x2 , x3 . For example, if n = 2 and


1
3
A=
, then q(v) = x2 2y 2 + 6xy.
3 2

Conversely, if we are given a quadratic form as in the right hand side of Equation (3.1), then it
2
2
2
is easy to write
down its matrix
A. For example, if n = 3 and q(v) = 3x +y 2z +4xy xz,
3 2 1/2

then A =
2 1
0 .
1/2 0
2

3.5

Change of variable under the general linear group

Our general aim is to make a change of basis so as to eliminate the terms in q(v) that involve
xi xj for i 6= j, leaving only terms of the form ii x2i . In this section, we will allow arbitrary
basis changes; in other words, we allow basis change matrices P from the general linear group
GL(n, K). It follows from Theorem 3.2 that when we make such a change, the matrix A of
q is replaced by P T AP .
As with other results in linear algebra, we can formulate theorems either in terms of abstract
concepts like quadratic forms, or simply as statements about matrices.
Theorem 3.5 Assume that 1 + 1 6= 0 K.

Let q be a quadratic form on V . Then there is a basis e1 , . . . , en of V such that q(v) =


P
n
2

i=1 i (xi ) , where the xi are the coordinates of v with respect to e1 , . . . , en .


Equivalently, given any symmetric matrix A, there is an invertible matrix P such that P T AP
is a diagonal matrix; that is, A is congruent to a diagonal matrix.

Proof: This is by induction on n. There is nothing to prove when n = 1. As usual, let


A = (ij ) be the matrix of q with respect to the initial basis e1 , . . . , en .
Case 1. First suppose that 11 6= 0. As in the example in Subsection 3.3, we can complete
the square. We have
q(v) = 11 x21 + 212 x1 x2 + . . . 21n x1 xn + q0 (v),
where q0 is a quadratic form involving only the coordinates x2 , . . . , xn . So
q(v) = 11 (x1 +

1n
12
x2 + . . .
xn )2 + q1 (v),
11
11

where q1 (v) is another quadratic form involving only x2 , . . . , xn .


We now make the change of coordinates x1 = x1 + 12
x2 + . . . 1n
xn , xi = xi for 2 i n.
11
11
Then we have q(v) = 1 (x1 )2 + q1 (v), where 1 = 11 and q1 (v) involves only x2 , . . . , xn .
By inductive hypothesis (applied to the subspace of V spanned by e2 , . . . , en ), we can change
the coordinates of q1Pfrom x2 , . . . , xn to x2 , . . . , xn , say, to bring it to the required form, and
then we get q(v) = ni=1 i (xi )2 (where x1 = x1 ) as required.

Case 2. 11 = 0 but ii 6= 0 for some i > 1. In this case, we start by interchanging e1 with
ei (or equivalently x1 with xi ), which takes us back to Case 1.

Case 3. ii = 0 for 1 i n. If ij = 0 for all i and j then there is nothing to prove, so


assume that ij 6= 0 for some i, j. Then we start by making a coordinate change xi = xi + xj ,
27

xj = xi xj , xk = xk for k 6= i, j. This introduces terms 2ij ((xi )2 (xj )2 ) into q, taking us


back to Case 2.
2
Notice that, in the first change of coordinates in Case 1 of the proof, we have

13


1 12
. . . 1n

11
11
11
x1
x1

0
1
0
.
.
.
0
x2
x2

0
0
1 ...
0
.
. =
or equivalently



...
.
.

...
xn
xn
0
0
0 ...
1

13
12


11
. . . 1n
1 11
11
x1
x1

1
0 ...
0
x2 0
x2
0

0
1 ...
0
. =
. .


...
.
.

...
xn
xn
0
0
0 ...
1

In other words, v = P v , where

13
11
. . . 1n
1 12
11
11
0
1
0 ...
0

P = 0
0
1 ...
0
,

...
0
0
0 ...
1

so by Proposition 1.1, P is the basis change matrix with original basis {ei } and new basis
{ei }.

0 1/2 5/2
Example. Let n = 3 and q(v) = xy + 3yz 5xz, so A = 1/2
0
3/2 .
5/2 3/2
0

Since we are using x, y, z for our variables, we can use x1 , y1 , z1 (rather than x , y , z ) for the
variables with respect to a new basis, which will make things typographically simpler!

We are in Case 3 of the proof above, and so we start with a coordinatechange x =x1 + y1 ,
1
1 0

y = x1 y1 , z = z1 , which corresponds to the basis change matrix P1 = 1 1 0 . Then


0
0 1
2
2
we get q(v) = x1 y1 2x1 z1 8y1 z1 .

We are now in Case 1 of the proof above, and the next basis change, from completing the
square, is x2 = x1 z1 , y2 = y1 , z2 = z1 , or equivalently,
x
1 = x2 + z2 , y1 = y2 , z1 = z2 , and

1 0 1
then the associated basis change matrix is P2 = 0 1 0 , and q(v) = x22 y22 8y2 z2 z22 .
0 0 1

We now proceed by induction on the 2-coordinate form in y2 , z2 , and completing the square
again leads to the basis change
x3 =
x2 , y3 = y2 + 4z2 , z3 = z2 , which corresponds to the

1 0
0
basis change matrix P3 = 0 1 4 , and q(v) = x23 y32 + 15z32 .
0 0
1
The total basis change in moving from the original basis with coordinates x, y, z to the final

28

basis with coordinates x3 , y3 , z3 is

1
1 3
P = P1 P2 P3 = 1 1
5 ,
0
0
1

1
0 0
T

and you can check that P AP = 0 1 0 , as expected.


0
0 15

Since P is an invertible matrix, P T is also invertible (its inverse is (P 1 )T ), and so the


matrices P T AP and A are equivalent, and hence have the same rank. (This was proved in
MA106.) The rank of the quadratic form q is defined to be the rank of its matrix A. So we
have just shown that the rank of q is independent of the choice of basis used for the matrix A.
If P T AP is diagonal, then its rank is equal to the number of non-zero terms on the diagonal.
Notice that both statements of Theorem 3.5 fail in characteristic 2. Let K be the field of two
elements.
 The quadratic form q(x, y) = xy cannot be diagonalised. Similarly, the symmetric
0 1
matrix
is not congruent to a diagonal matrix. Do it as an exercise: there are 6
1 0
possible change of variable (you need to choose two among three possible variables x, y and
x + y) and you can observe directly what happens with each change of variables.
P
In the case K = C, after reducing q to the form q(v) = ni=1 ii x2i , we can permute the
coordinates if necessary to get ii 6= 0 for 1 i r and ii = 0 for r + 1 i n, where

r = rank(q).
can then make a further coordinates change xi = ii xi (1 i r), giving
Pr We
q(v) = i=1 (xi )2 . Hence we have proved:
Proposition 3.6 A quadratic form q over C has the form q(v) =
suitable basis, where r = rank(q).

Pr

2
i=1 xi

with respect to a

Equivalently, given a symmetric matrix A Cn,n , there is an invertible matrix P Cn,n such
that P T AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i r, ii = 0 for
r + 1 i n, and r = rank(A).
When K = R, we cannot take square roots of negative numbers, but we can replace each
positive i by 1 and each negative i by 1 to get:
Proposition
P
Pu3.7 2(Sylvesters Theorem) A quadratic form q over R has the form q(v) =
t
2
x
i=1 i
i=1 xt+i with respect to a suitable basis, where t + u = rank(q).

Equivalently, given a symmetric matrix A Rn,n , there is an invertible matrix P Rn,n such
that P T AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i t, ii = 1
for t + 1 i t + u, and ii = 0 for t + u + 1 i n, and t + u = rank(A).
We shall now prove that the numbers t and u of positive and negative terms are invariants
of q. The difference t u between the numbers of positive and negative terms is called the
signature of q.
Theorem 3.8 Suppose that q is a quadratic form over the vector space V over R, and that
e1 , . . . , en and e1 , . . . , en are two bases of V with associated coordinates xi and xi , such that
q(v) =

t
X
i=1

Then t = t and u = u .

x2i

u
X

x2t+i

t
X
i=1

i=1

29

(xi )2

u
X
(xt +i )2 .

i=1

Proof: We know that t + u = t + u = rank(q), so it is enough to prove that t = t .


Suppose not, and suppose that t > t . Let V1 = {v V | xt+1 = xt+2 = . . . = xn = 0},
and let V2 = {v V | x1 = x2 = . . . = xt = 0}. Then V1 and V2 are subspaces of V with
dim(V1 ) = t and dim(V2 ) = n t . It was proved in MA106 that
dim(V1 + V2 ) = dim(V1 ) + dim(V2 ) dim(V1 V2 ).
However, dim(V1 + V2 ) dim(V ) = n, and so t > t implies that dim(V1 ) + dim(V2 ) > n.
Hence dim(V1 V2 ) > 0, and there is a non-zero vector v V1 V2 . But it is easily seen
from the expressions for q(v) in the statement of the theorem that 0 6= v V1 q(v) > 0,
whereas v V2 q(v) 0, which is a contradiction, and completes the proof.
2

3.6

Change of variable under the orthogonal group

In this subsection, we assume throughout that K = R.


Definition. A quadratic form q on V is said to be positive definite if q(v) > 0 for all
0 6= v V .

It is clear that this is the case if and only if t = n and u = 0 in Proposition 3.7; that is, if
q has rank and signature n. In P
this case, Proposition 3.7 says that there is a basis {ei } of
V with respect to which q(v) = ni=1 x2i or, equivalently, such that the matrix A of q is the
identity matrix In .

The associated symmetric bilinear form is also called positive definite when q is. If we use
a basis such that A = In , then is just the standard scalar (or inner) product on V .
Definition. A vector space V over R together with a positive definite symmetric bilinear
form is called a Euclidean space.
We shall assume from now on that V is a Euclidean space, and that the basis e1 , . . . , en has
been chosen such that the matrix of is In . Since is the standard scalar product, we shall
write v w instead of (v, w).
Note that v w = vT w where, as usual, v and w are the column vectors associated with v
and w.

For v V , define |v| = v v. Then |v| is the length of v. Hence the length, and also the
cosine v w/(|v||w|) of the angle between two vectors can be defined in terms of the scalar
product.
Definition. A linear operator T : V V is said to be orthogonal if it preserves the scalar
product on V . That is, if T (v) T (w) = v w for all v, w V .

Since length and angle can be defined in terms of the scalar product, an orthogonal linear
operator preserves distance and angle, so geometrically it is a rigid map. In R2 , for example,
an orthogonal operator is a rotation about the origin or a reflection about a line through the
origin.
If A is the matrix of T , then T (v) = Av, so T (v) T (w) = vT AT Aw, and hence T is
orthogonal if and only if AT A = In , or equivalently if AT = A1 .
Definition. An n n matrix is called orthogonal if AT A = In .

So we have proved:

Proposition 3.9 A linear operator T : V V is orthogonal if and only if its matrix A


(with respect to a basis such that the matrix of the bilinear form is In ) is orthogonal.
30

Incidentally, the fact that AT A = In tells us that A and hence T is invertible, and so we have
also proved:
Proposition 3.10 An orthogonal linear operator is invertible.
Let c1 , c2 , . . . , cn be the columns of the matrix A. As we observed in Subsection 1.1, ci is
equal to the column vector representing T (ei ). In other words, if T (ei ) = fi then fi = ci .
Since the (i, j)-th entry of AT A is cT
i cj = fi fj , we see that T and A are orthogonal if and
14
only if
fi fi = i,j , 1 i, j n.
()
Definition. A basis f1 , . . . , fn of V that satisfies () is called orthonormal.
By Proposition 3.10, an orthogonal linear operator is invertible, so T (ei ) (1 i n) form a
basis of V , and we have:
Proposition 3.11 A linear operator T is orthogonal if and only if T (e1 ), . . . , T (en ) is an
orthonormal basis of V .


cos sin
Example For any R, let A =
. (This represents a counter-clockwise
sin
cos
rotation through an angle .) Then it is easily checked that AT A = AAT = I2 . Notice that
the columns of A are mutually orthogonal vectors of length 1, and the same applies to the
rows of A.
The following theorem tells us that we can always complete an orthonormal set of vectors to
an orthonormal basis.
Theorem 3.12 (Gram-Schmidt) Let V be a Euclidean space of dimension n, and suppose
that, for some r with 0 r n, f1 , . . . , fr are vectors in V that satisfy the equations () for
1 i, j r. Then f1 , . . . , fr can be extended to an orthonormal basis f1 , . . . , fn of V .
P
Proof: We prove first that f1 , . . . , fr are linearly independent. Suppose that ri=1 xi fi = 0
for some x1 , . . . , xr R. Then, P
for each j with 1 j r, the scalar product of the left hand
side of this equation with fj is ri=1 xi fj fi = xj , by (). Since fj 0 = 0, this implies that
xj = 0 for all j, so the fi are linearly independent.
The proof of the theorem will be by induction on n r. We can start the induction with the
case n r = 0, when r = n, and there is nothing to prove. So assume that n r > 0; i.e.
that r < n. By a result from MA106, we can extend any linearly independent set of vectors
to a basis of V , so there is a basis f1 , . . . , fr , gr+1 , . . . , gn of V containing the fi . The trick is
to define
r
X

(fi gr+1 )fi .


fr+1 = gr+1
i=1

If we take the scalar product of this equation by fj for some 0 j r, then we get

fj fr+1
= fj gr+1

r
X
(fi gr+1 )(fj fi )
i=1

and then, by (), fj fi is non-zero only when j = i, so the sum on the right hand side

simplifies to fj gr+1 , and the whole equation simplifies to fj fr+1


= fj gr+1 fj gr+1 = 0.
14

We are using Kroneckers delta symbol in the next formula. It is just the identity matrix Im = (i,j ) of
sufficiently large size. In laymans terms, i,i = 1 and i,j = 0 if i 6= j.

31


is non-zero by linear independence of the basis, and if we define fr+1 =
The vector fr+1

fr+1 /|fr+1 |, then we still have fj fr+1 = 0 for 1 j r, and we also have fr+1 .fr+1 = 1.
Hence f1 , . . . , fr+1 satisfy the equations (), and the result follows by inductive hypothesis. 2

Recall from MA106 that if T is a linear operator with matrix A, and v is a non-zero vector
such that T (v) = v (or equivalently Av = v), then is called an eigenvalue and v an
associated eigenvector of T and A. It was proved in MA106 that the eigenvalues are the roots
of the characteristic equation det(A xIn ) = 0 of A.
Proposition 3.13 Let A be a real symmetric matrix. Then A has an eigenvalue in R, and
all complex eigenvalues of A lie in R.
Proof: (To simplify the notation, we will write just v for a column vector v in this proof.)
The characteristic equation det(A xIn ) = 0 is a polynomial equation of degree n in x, and
since C is an algebraically closed field, it certainly has a root C, which is an eigenvalue
for A if we regard A as a matrix over C. We shall prove that any such lies in R, which will
prove the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of replacing all
entries of v or B by their complex conjugates. Since the entries of A lie in R, we have A = A.
Let v be a complex eigenvector associated with . Then
Av = v

(1)

so, taking complex conjugates and using A = A, we get


Av = v.

(2)

vT A = vT ,

(3)

Transposing (1) and using AT = A gives

so by (2) and (3) we have


vT v = vT Av = vT v.
But if v = (1 , 2 , . . . , n )T , then vT v = 1 1 + . . . + n n , which is a non-zero real number
(eigenvectors are non-zero by definition). Thus = , so R.
2
Before coming to the main theorem of this section, we recall the notation A B for matrices,
which we introduced in Subsection 2.5. It is straightforward to check that (A1 B1 ) (A2 B2 )
=(A1 A2 B1 B2 ), provided that A1 and A2 are matrices with the same dimensions.

Theorem 3.14 Let q be a quadratic form defined on


P a Euclidean space V . Then there is an
orthonormal basis e1 , . . . , en of V such that q(v) = ni=1 i (xi )2 , where xi are the coordinates
of v with respect to e1 , . . . , en . Furthermore, the numbers i are uniquely determined by q.
Equivalently, given any symmetric matrix A, there is an orthogonal matrix P such that P T AP
is a diagonal matrix. Since P T = P 1 , this is saying that A is simultaneously similar and
congruent to a diagonal matrix.

Proof: We start with a general remark about orthogonal basis changes. The matrix q
represents a quadratic form on V with respect to the initial orthonormal basis e1 , . . . , en of
V , but it also represents a linear operator T : V V with respect to the same basis. When
we make an orthogonal basis change with original basis e1 , . . . , en and a new orthonormal
basis f1 , . . . , fn with the basis change matrix P , then P is orthogonal, so P T = P 1 and
32

hence P T AP = P 1 AP . Hence, by Theorems 1.2 and 3.2, the matrix P T AP simultaneously


represents both the linear operator T and the quadratic form q with respect to the new basis.
Recall from MA106 that two n n matrices are called similar if there exists an invertible
n n matrix P with B = P 1 AP . In particular, if P is orthogonal, then A and P T AP
are similar. It was proved in MA106 that similar matrices have the same eigenvalues. But
the i are precisely the eigenvalues of the diagonalised matrix P T AP , and so the i are the
eigenvalues of the original matrix A, and hence are uniquely determined by A and q. This
proves the uniqueness part of the theorem.
The equivalence of the two statements in the theorem follows from Proposition 3.11 and
Theorem 3.2. Their proof will by induction on n = dim(V ). There is nothing to prove when
n = 1. By Proposition 3.13, A and its associated linear operator T have a real eigenvalue
1 . Let v be a corresponding eigenvector (of T ). Then f1 = v/|v| is also an eigenvector (i.e.
T f1 = 1 f1 ), and f1 f1 = 1. By Theorem 3.12, there is an orthonormal basis f1 , . . . , fn of
V containing f1 . Let B be the matrix of q with respect to this basis, so B = P T AP with
P orthogonal. By the remark above, B is also the matrix of T with respect to f1 , . . . , fn ,
and because T f1 = 1 f1 , the first column of B is (1 , 0, . . . , 0)T . But B is the matrix of
the quadratic form q, so it is symmetric, and hence the first row of B is (1 , 0, . . . , 0), and
therefore B = P T AP = P 1 AP = (1 ) A1 , where A1 is an (n 1) (n 1) matrix and
(1 ) is a 1 1 matrix.

Furthermore, B symmetric implies A1 symmetric, and by inductive assumption there is an


(n 1) (n 1) orthogonal matrix Q1 with QT
1 A1 Q1 diagonal. Let Q = (1) Q1 . Then Q
T
is also orthogonal (check!) and we have (P Q) A(P Q) = QT (P T AP )Q = (1 ) QT
1 A1 Q1 is
diagonal. But P Q is the product of two orthogonal matrices and so is itself orthogonal. This
completes the proof.
2
Although it is not used in the proof of the theorem above, the following proposition is useful
when calculating examples. It helps us to write down more vectors in the final orthonormal
basis immediately, without having to use Theorem 3.12 repeatedly.
Proposition 3.15 Let A be a real symmetric matrix, and let 1 , 2 be two distinct eigenvalues of A, with corresponding eigenvectors v1 , v2 . Then v1 v2 = 0.
Proof: (As in Proposition 3.13, we will write v rather than v for a column vector in this
proof. So v1 v2 is the same as v1T v2 .) We have
Av1 = 1 v1 (1)

and

Av2 = 2 v2 (2).

Transposing (1) and using A = AT gives v1T A = 1 v1T , and so


v1T Av2 = 1 v1T v2 (3)

and similarly

v2T Av1 = 2 v2T v1 (4).

Transposing (4) gives v1T Av2 = 2 v1T v2 and subtracting (3) from this gives (2 1 )v1T v2 = 0.
2
Since 2 1 6= 0 by assumption, we have v1T v2 = 0.


1 3
Example 1. Let n = 2 and q(v) = x2 + y 2 + 6xy, so A =
. Then
3 1
det(A xI2 ) = (1 x)2 9 = x2 2x 8 = (x 4)(x + 2),
so the eigenvalues of A are 4 and 2. Solving Av = v for = 4 and 2, we find corresponding eigenvectors (1 1)T and (1 1)T . Proposition 3.15 tells us that these vectors
are orthogonal to each other (which we can of course check directly!), so if we divide them
1 T
) then we get an
by their lengths to give vectors of length 1, giving ( 12 12 )T and ( 12
2
33

orthonormal basis consisting of eigenvectors of A, which is what we want.


! The corresponding
1

basis change matrix P has these vectors as columns, so P = 12


2

4
0
that P T P = I2 (i.e. P is orthogonal) and that P T AP =
.
0 2

1
2
1

, and we can check

Example 2. Let n = 3 and

q(v) = 3x2 + 6y 2 + 3z 2 4xy 4yz + 2xz,

3 2
1
6 2 .
so A = 2
1 2
3

Then, expanding by the first row,

det(A xI3 ) = (3 x)(6 x)(3 x) 4(3 x) 4(3 x) + 4 + 4 (6 x)


= x3 + 12x2 36x + 32 = (2 x)(x 8)(x 2),

so the eigenvalues are 2 (repeated) and 8. For the eigenvalue 8, if we solve Av = 8v then we
find a solution v = (1 2 1)T . Since 2 is a repeated eigenvalue, we need two corresponding
eigenvectors, which must be orthogonal to each other. The equations Av = 2v all reduce
to x 2y + z = 0, and so any vector (x, y, z)T satisfying this equation is an eigenvector for
= 2. By Proposition 3.15 these eigenvectors will all be orthogonal to the eigenvector for
= 8, but we will have to choose them orthogonal to each other. We can choose the first one
arbitrarily, so lets choose (1 0 1)T . We now need another solution that is orthogonal to
this. In other words, we want x, y and z not all zero satisfying x 2y + z = 0 and x z = 0,
and x = y = z = 1 is a solution. So we now have a basis (1 2 1)T , (1 0 1)T , (1 1 1)T of
three mutually orthogonal eigenvectors. To
basis, we just need to divide
an orthonormal

get
by their lengths, which are, respectively, 6, 2, and 3, and then the basis change matrix
P has these vectors as columns, so

1/ 2 1/3
1/6

P = 2/6
0 1/3 .
1/ 6 1/ 2 1/ 3
It can then be checked that P T P = I3 and that P T AP is the diagonal matrix with entries
8, 2, 2.

3.7
3.7.1

Applications of quadratic forms to geometry


Reduction of the general second degree equation

The general equation of the second degree in n variables x1 , . . . , xn is


n
X
i=1

i x2i +

n X
i1
X

ij xi xj +

n
X

i xi + = 0.

i=1

i=1 j=1

This defines a quadric hypersurface15 in n-dimensional Euclidean space. To study the possible shapes of the curves and surfaces defined, we first simplify this equation by applying
coordinate changes resulting from isometries of Rn .
By Theorem 3.14, we can apply an orthogonal basis change (that is, an isometry of Rn that
fixes the origin) which has the effect of eliminating the terms ij xi xj in the above sum.
15

also called quadric surface if n = 3 or quadric curve if n = 3.

34

Now, whenever i 6= 0, we can replace xi by xi i /(2i ), and thereby eliminate the term
i xi from the equation. This transformation is just a translation, which is also an isometry.
If i = 0, then we cannot eliminate the term i xi . Let us permute the coordinates such that
i 6= 0 for 1 i r, and i 6= 0 for r + 1 i r + s. Then if s > 1, by using Theorem 3.12,
we
Pscan find an orthogonal transformation that leaves
Pxs i unchanged for 1 i r and replaces
i=1 r+j xr+j by xr+1 (where is the length of
i=1 r+j xr+j ), and then we have only a
single non-zero i ; namely r+1 = .

Finally, if there is a non-zero r+1 = , then we can perform the translation that replaces
xr+1 by xr+1 /, and thereby eliminate .
We have now reduced to one of two possible types of equation:
r
X

i x2i + = 0

and

r
X

i x2i + xr+1 = 0.

i=1

i=1

In fact, by dividing through by or , we can assume that = 0 or 1 in the first equation,


and that = 1 in the second. In both cases, we shall assume that r 6= 0, because otherwise
we have a linear equation. The curve defined by the first equation is called a central quadric
because it has central symmetry; i.e. if a vector v satisfies the equation, then so does v.

We shall now consider the types of curves and surfaces that can arise in the familiar cases
n = 2 and n = 3. These different types correspond to whether the i are positive, negative
or zero, and whether = 0 or 1.

We shall use x, y, z instead of x1 , x2 , x3 , and , , instead of 1 , 2 , 3 . We shall assume


also that , , are all strictly positive, and write , etc., for the negative case.
3.7.2

The case n = 2

When n = 2 we have the following possibilities.


(i) x2 = 0. This just defines the line x = 0 (the y-axis).

(ii) x2 = 1. This defines the two parallel lines x = 1/ .

(iii) x2 = 1. This is the empty curve!

(iv) x2 + y 2 = 0. The single point (0, 0).


(v) x2 y 2 = 0. This defines two straight lines y =
(vi) x2 + y 2 = 1. An ellipse.

/ x, which intersect at (0, 0).

(vii) x2 y 2 = 1. A hyperbola.

(viii) x2 y 2 = 1. The empty curve again.


(ix) x2 y = 0. A parabola.
3.7.3

The case n = 3

When n = 3, we still get the nine possibilities (i) (ix) that we had in the case n = 2, but
now they must be regarded as equations in the three variables x, y, z that happen not to
involve z.

So, in Case (i), we now get the plane x = 0, in case (ii) we get two parallel planes x = 1/ ,
in Case (iv) we get the line x = y = 0 (the z-axis), in case (v) two intersecting planes
35

p
y = /x, and in Cases (vi), (vii) and (ix), we get, respectively, elliptical, hyperbolic
and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit x2 y 2 z 2 = 1, which is empty.

(x) x2 + y 2 + z 2 = 0. The single point (0, 0, 0).


(xi) x2 + y 2 z 2 = 0. See Fig. 4.

This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of the form
x2 + y 2 = c, whereas the cross sections parallel to the other coordinate planes are generally
hyperbolas. Notice also that if a particular point (a, b, c) is on the surface, then so is t(a, b, c)
for any t R. In other words, the surface contains the straight line through the origin and
any of its points. Such lines are called generators. When each point of a 3-dimensional surface
lies on one or more generators, it is possible to make a model of the surface with straight
lengths of wire or string.
(xii) x2 + y 2 + z 2 = 1. An ellipsoid. See Fig. 5.
(xiii) x2 + y 2 z 2 = 1. A hyperboloid. See Fig. 6.

There are two types of 3-dimensional hyperboloids. This one is connected, and is known as
a hyperboloid of one sheet. Although it is not immediately obvious, each point of this surface
lies on exactly two generators; that is, lines that lie entirely on the surface. For each R,
the line defined by the pair of equations
p
p

x z = (1 y);
( x + z) = 1 + y.
lies entirely on the surface; to see this, just multiply the two equations together. The same
applies to the lines defined by the pairs of equations
p
p

y z = (1 x);
( y + z) = 1 + x.
It can be shown that each point on the surface lies on exactly one of the lines in each if these
two families.
(xiv) x2 y 2 z 2 = 1. A hyperboloid. See Fig. 7.

This one has two connected components and is called a hyperboloid of two sheets. It does not
have generators. Besides it is easy to observe that it is disconnected. Substitute x = 0 into
its equation. The resulting equation y 2 z 2 = 1 has no solutions. This means that the
hyperboloid does not intersect the plane x = 0. A closer inspection confirms that the two
parts of the hyperboloid lie one both sides of the plane: intersect the hyperboloid with the
line y = z = 0 to see two points on both sides.
(xv) x2 + y 2 z = 0. An elliptical paraboloid. See Fig. 8.

(xvi) x2 y 2 z = 0. A hyperbolic paraboloid. See Fig. 9.

As in the case of the hyperboloid of one sheet, there are two generators passing through each
point of this surface, one from each of the following two families of lines:
p
p

( x ) y = z;
x + y = .
p
p

( x + ) y = z;
x y = .

3.8

Unitary, hermitian and normal matrices

The results in Subsections 3.1 - 3.6 apply vector spaces over the real numbers R. A naive
reformulation of some of the results to complex numbers fails. For instance, the vector
v = (i, 1)T C2 is isotropic, i.e. it has v T v = i2 + 12 = 0, which creates various difficulties.
36

4
2
z0

6
4

4
2

0x

y0

2
4

6
8

Figure 4: x2 /4 + y 2 z 2 = 0

2
1
z0
1
2
2

2
1

1
0x

y0
1

1
2

Figure 5: x2 + 2y 2 + 4z 2 = 7

37

z0
2
2
2

0x
1
2

y0
1
4

Figure 6: x2 /4 + y 2 z 2 = 1

6
4
2
z0
8

4
2

6
4

0x
2

2
4

y0
6

2
4

Figure 7: x2 /4 y 2 z 2 = 1

38

z2
2
0
2

0x
1
2

y0
1
2

Figure 8: z = x2 /2 + y 2

10
8
6
4
2
z0
2
4
6
8
10
4

4
2

2
0x

y0
2

2
4

Figure 9: z = x2 y 2

39

To build the theory for vector spaces over C we have to replace bilinear forms with sesquilinear16 form, that is, maps : W V C such that
(i) (1 w1 + 2 w2 , v) =
1 (w1 , v) +
2 (w2 , v) and
(ii) (w, 1 v1 + 2 v2 ) = 1 (w, v1 ) + 2 (w, v2 )

for all w, w1 , w2 W , v, v1 , v2 V , and 1 , 2 C. As usual, z denotes the complex


conjugate of z.
Sesquilinear forms can be represented by matrices A Cm,n as in Subsection 3.1. Let
A = AT be the conjugate matrix of A, that is, (ij ) = ji . The representation comes by
choosing a basis and writing ij = (fi , ej ). Similarly to equation (2.1), we get
(w, v) =

n
m X
X

yi (fi , ej ) xj =

n
m X
X

yi ij xj = w Av

i=1 j=1

i=1 j=1

Cn

For instance, the standard inner product on


becomes v w = v w rather than vT w. Note
that, for v Rn , v = vT , so this definition is compatible with the one for real vectors. The
length |v| of a vector is given by |v|2 = v v = v v, which is always a non-negative real
number.
We are going to formulate several propositions that generalise results of the previous sections
to hermitian matrices. The proofs are very similar and left for you to fill them up as an
exercise. The first two propositions are generalisation of Theorems 3.1 and 3.2.
Proposition 3.16 Let A be the matrix of the sesquilinear map : W V C with respect
to the bases e1 , . . . , en and f1 , . . . , fm of V and W , and let B be its matrix with respect to the
of V and W . If P and Q are the basis change matrices then
bases e1 , . . . , en and f1 , . . . , fm
B = Q AP .
Proposition 3.17 Let A be the matrix of the sesquilinear form on V with respect to the
basis e1 , . . . , en of V , and let B be its matrix with respect to the basis e1 , . . . , en of V . If P
is the basis change matrix then B = P AP .
Definition. A matrix A Cn,n is called hermitian if A = A . A sesquilinear form on V is
called hermitian if (w, v) = (v, w) for all v, w V .

These are the complex analogues of symmetric matrices and symmetric bilinear forms. The
following proposition is an analogue of Proposition 3.3.
Proposition 3.18 A sesquilinear form is hermitian if and only if its matrix is hermitian.

Hermitian matrices A and B are congruent if there exists an invertible matrix P with B =
P AP . A Hermitian quadratic form is a function q : V C given by q(v) = (v, v)
for some sesquilinear form . The following is a hermitian version of Sylvesters Theorem
(Proposition 3.7) and Inertia Law (Theorem 3.8) together.
P
P
Proposition 3.19 A hermitian quadratic form q has the form q(v) = ti=1 |xi |2 ui=1 |xt+i |2
with respect to a suitable basis, where t + u = rank(q).

Equivalently, given a hermitian matrix A Cn,n , there is an invertible matrix P Cn,n such
that P AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i t, ii = 1
for t + 1 i t + u, and ii = 0 for t + u + 1 i n, and t + u = rank(A).

Numbers t and u are uniquely determined by q (or A).


16

from Latin one and a half

40

Similarly to the real case, the difference t u is called the signature of q (or A). We say that
a hermitian matrix (or a hermitian form) is positive definite if its signature is equal to the
dimension of the space. By Proposition 3.19 a positive definite hermitian form looks like the
standard inner product on Cn in some choice of a basis. A hermitian vector space is a vector
space over C equipped with a hermitian positive definite form.
Definition. A linear operator T : V V on a hermitian vector space (V, ) is said to be
unitary if it preserves the form . That is, if (T (v), T (w)) = (v, w) for all v, w V .
Definition. A matrix A Cn,n is called unitary if A A = In .

A basis e1 , . . . en of a hermitian space (V, ) is orthonormal if (ei , ei ) = 1 and (ei , ej ) = 0


for all i 6= j. The following is an analogue of Proposition 3.9.
Proposition 3.20 A linear operator T : V V is unitary if and only if its matrix A with
respect to an orthonormal basis is unitary.
The Gram-Schmidt process works perfectly well in hermitian setting, so Theorem 3.12 turns
into the following statement.
Proposition 3.21 Let (V, ) be a hermitian space of dimension n, and suppose that, for
some r with 0 r n, f1 , . . . , fr are vectors in V that satisfy (ei , ei ) = i,j for 1 i, j r.
Then f1 , . . . , fr can be extended to an orthonormal basis f1 , . . . , fn of V .
Proposition 3.21 ensures existence of orthonormal bases in hermitian spaces. Proposition 3.13
and Theorem 3.14 have analogues as well.
Proposition 3.22 Let A be a complex hermitian matrix. Then A has an eigenvalue in R,
and all complex eigenvalues of A lie in R.
Proposition 3.23 Let q be a hermitian quadratic form defined on P
a hermitian space V .
Then there is an orthonormal basis e1 , . . . , en of V such that q(v) = ni=1 i |xi |2 , where xi
are the coordinates of v with respect to e1 , . . . , en . Furthermore, the numbers i are real and
uniquely determined by q.
Equivalently, given any hermitian matrix A, there is a unitary matrix P such that P AP is
a real diagonal matrix.
Notice the crucial difference between Theorem 3.14 and Proposition 3.14. In the former
we start with a real matrix to end up with a real diagonal matrix. In the latter we start
with a complex matrix but still we end up with a real diagonal matrix. The point is that
Theorem 3.14 admits a useful generalisation to a wider class of matrices.
Definition. A matrix A Cn,n is called normal if AA = A A.

In particular, all Hermitian and and all unitary matrices are normal. Consequently, all real
symmetric and real orthogonal matrices are normal.
Lemma 3.24 If A Cn,n is normal and P Cn,n is unitary then P AP is normal.
Proof: If B = P AP = B then using (BC) = C B we compute that BB = (P AP )(P AP ) =
P AP P A P = P AA P = P A AP = (P A P )(P AP ) = B B.
2
The following theorem is extremely useful as the general criterion for diagonalisability of
matrices.
41

Theorem 3.25 A matrix A Cn,n is normal if and only if there exists a unitary matrix
P Cn,n such that P AP is diagonal17 .
Proof: The if part follows from Lemma 3.24 as diagonal matrices are normal.
For the only if part we proceed by induction on n. If n = 1, there is nothing to prove. Let
us assume we have proved the statement for all dimensions less than n. Matrix A admits
an eigenvector v Cn with an eigenvalue . Let W be the vector subspace of all vectors x
satisfying Ax = x. If W = Cn then A is a scalar matrix and we are done. Otherwise, we
have a nontrivial18 decomposition Cn = W W where W = {v Cn | w W v w = 0}

Let us notice that A W W because AA x = A Ax = A x = (A x) for any x W . It


follows that AW W since (Ay) x = y (A x) y W = 0 so (Ay) x = 0 for all x W ,
y W . Similarly, A W W .

Now choose orthonormal bases of W and W . Together they form a new orthonormal
of
 basis 
B
0
Cn . Change of basis matrix P is unitary, hence by Lemma 3.24 the matrix P AP =
0 C
is normal. It follows that the matrices B and C are normal of smaller size and we can use
induction assumption to complete the proof.
2
Theorem 3.25 is an extremely useful criterion for diagonalisability of matrices. To find P in
practice, we use similar methods to those in the real case.


6
2+2i
Example. A =
. Then
22i
4
cA (x) = (6 x)(4 x) (2+2i)(22i) = x2 10x + 16 = (x 2)(x 8),
so the eigenvalues are 2 and 8. (It can be shown that the eigenvalues of any Hermitian matrix
are real.) Corresponding eigenvectors are v1 = (1+i, 2)T and v2 = (1+i, 1)T . We find that
|v1 |2 = v1 v1 = 6 and |v2 |2 = 3, so we divide by their lengths to get an orthonormal basis
v1 /|v1 |, v2 /|v2 | of C2 . Then the matrix
!
P =

1+i

6
2

1+i

3
1
3

having this basis as columns is unitary and satisfies

3.9

P AP

2 0
0 8

Applications to quantum mechanics (non-examinable)

With all the linear algebra we know it is a little step aside to understand basics of quantum
mechanics. We discuss Shrodingers picture19 of quantum mechanics and derive (mathematically) Heisenbergs uncertainty principle.
The main ingredient of quantum mechanics is a hermitian vector space (V, <, >). There are
physical arguments showing that real Euclidean vector spaces are no good and that V must
be infinite-dimensional. Here we just take their conclusions at face value. The states of the
system are lines in V . We denote by [v] the line Cv spanned by v V . We use normalised
vectors, i.e., v such that < v, v >= 1 to present states as this makes formulae slightly easier.
It is impossible to observe the state of the quantum system but we can try to observe some
physical quantities such as momentum, energy, spin, etc. Such physical quantities become
observables, i.e., hermitian linear operators : V V . Hermitian in this context means
17

with complex entries


i.e., neither W nor W is zero.
19
The alternative is Heisenbergs picture but we have no time to discuss it here.

18

42

that < x, y >=< x, y > for all x, y V . Sweeping a subtle mathematical point under the
carpet20 , we assume that is diagonalisable with eigenvectors e1 , e2 , e3 , . . . and eigenvalues
1 , 2 , . . .. Proof of Proposition 3.22 goes through in the infinite dimensional case, so we
conclude that allPi belong to R. Back to Physics, if we measure on a state [v] with
normalised v = n n en then the measurement will return n as a result with probability
|n |2
One observable is energy H : V V , often called hamiltonian. It is central to the theory
because it determines the time evolution [v(t)] of the system by Shrodingers equation:
dv(t)
1
= Hv(t)
dt
i~
where ~ 1034 Joule per second21 is the reduced Planck constant. We know how to solve
this equation: v(t) = etH/i~ v(0).
As a concrete example, let us look at the quantum oscillator. The full energy of the classical
harmonic oscillator mass m and frequency is
h=

1
p2
+ m 2 x2
2m 2

where x is the position and p = mx is the momentum. To quantise it, we have to play
with this expression. The vector subspace of the space of all smooth functions C (R, C)
x2 /2 | f (x) C[x]}, which we make hermitian by
admits a convenient
R subspace V = {f (x)e

< (x), (x) >= (x)(x)dx.


Quantum momentum and quantum position are linear
operators (observables) on this space:
P (f (x)) = i~f (x), X(f (x)) = f (x) x.
The quantum Hamiltonian is a second order differential operator operator given by the same
equation
1
~2 d2
1
P2
+ m 2 X 2 =
+ m 2 x2 .
H=
2m 2
2m dx2 2
As mathematicians, we can assume that m = 1 and = 1, so that H(f ) = (f x2 f )/2.
The eigenvectors of H are Hermite functions
n (x) = (1)n ex

2 /2

(ex )(n) , n = 0, 1, 2 . . .

with eigenvalues n + 1/2 which are discrete energy levels of the quantum oscillator. Notice

that < k , n >= k,n 2n n! , so they are orthogonal but not orthonormal. The states [n ]
are pure states: they do not change with time and always give n + 1/2 as energy. If we take
a system in a state [v] where
X
1
v=
n 4 n/2 n

2
n!
n

is normalised then the measurement of energy will return n + 1/2 with probability |n |2
Notice that the measurement breaks the system!! It changes it to the state [n ] and all
future measurements will return the same energy!

20
If V were finite-dimensional we could have used Proposition 3.23. But V is infinite dimensional! To ensure
diagonalisability V must be complete with respect to the hermitian norm. Such spaces are called Hilbert
spaces. Diagonalisability is still subtle as eigenvectors do not span the whole V but only a dense subspace.
Furthermore, if V admits no dense countably dimensional subspace, further difficulties arise. . . Pandora box
of functional analysis is wide open, so let us try to keep it shut.
21
Notice the physical dimensions: H is energy, t is time, i dimensionless, ~ equalises the dimensions in the
both sides irrespectively of what v is.

43

Alternatively, it is possible to model the quantum oscillator on the vector space W = C[x] of
polynomials. One has to use the natural linear bijection
: W V, (f (x)) = f (x)ex

2 /2

transfer all the formulae to W . The metric becomes < f, g >=< (f ), (g) >=
Rand

x2 dx, the formulae for P and X changes accordingly, and at the end one
f (x)g(x)e
2
2
arrives at Hermite polynomials 1 (n (x)) = (1)n ex (ex )(n) instead of Hermite functions.
Let us go back to an abstract system with two observables P and Q. It is pointless to measure
Q after measuring P as the system is broken. But can we measure them simultaneously?
The answer is given by Heisenbergs uncertainty principle. Mathematically, it is a corollary
of Schwarzs inequality:
||v||2 ||w||2 = < v, v >< w, w > | < v, w > |2 .
Let e1 , e2 , e3 , . . . be eigenvectors for P with eigenvalues
p1 , p2 , . . .. The probability that pj
P
is returned after measuring on [v] with v =

e
depends
on the multiplicity of the
n n n
eigenvalue:
X
Prob(pj is returned) =
|k |2 .
pk =pj

Hence, we should have the expected value


X
X
E(P, v) =
pk |k |2 =
< k ek , pk k ek >=< v, P (v) > .
k

To compute the expected quadratic error we use the shifted observable Pv = P E(P, v)I:
q
p
p
D(P, v) = E(Pv 2 , v) = < v, Pv (Pv (v)) > = < Pv (v), Pv (v) > = ||Pv (v)||

where we use the fact that P and Pv are hermitian. Notice that D(P, v) has a physical
meaning of uncertainty of measurement of P . Notice also that the operator P Q QP is
no longer hermitian in general but we can still talk about its expected value. Here goes
Heisenbergs principle.
Theorem 3.26
D(P, v) D(Q, v)

1
|E(P Q QP, v)|
2

Proof: In the right hand side, E(P Q QP, v) = E(Pv Qv Qv Pv , v) =< v, Pv Qv (v) >
< v, Qv Pv (v) >=< Pv (v), Qv (v) > < Qv (v), Pv (v) >. Remembering that the form is
hermitian,
E(P Q QP, v) =< Pv (v), Qv (v) > < Pv (v), Qv (v) > = 2 Im(< Pv (v), Qv (v) >) ,
twice the imaginary part. So the right hand side is estimated by Schwarzs inequality:
Im(< Pv (v), Qv (v) >) | < Pv (v), Qv (v) > | ||Pv (v)|| ||Qv (v)|| .
2
Two cases of particular physical interest are commuting observables, i.e. P Q = QP and
conjugate observables, i.e. P Q QP = i~I. Commuting observable can be measured simultaneously with any degree of certainty. Conjugate observables obey Heisenbergs uncertainty:
D(P, v) D(Q, v)
44

~
.
2

Finitely Generated Abelian Groups

4.1

Definitions

Groups were introduced in the first year in Foundations, and will be studied in detail next
term in Algebra II: Groups and Rings. In this course, we are only interested in abelian (=
commutative) groups, which are defined as follows.
Definition. An abelian group is a set G together with a binary operation, which we write
as addition, and which satisfies the following properties:
(i) (Closure) for all g, h G, g + h G;
(ii) (Associativity) for all g, h, k G, (g + h) + k = g + (h + k);
(iii) there exists an element 0G G such that:
(a) (Identity) for all g G, g + 0G = g; and
(b) (Inverse) for all g G there exists g G such that g + (g) = 0G ;

(iv) (Commutativity) for all g, h G, g + h = h + g.

Usually we just write 0 rather than 0G . We only write 0G if we need to distinguish between
the zero elements of different groups.
The commutativity axiom (iv) is not part of the definition of a general group, and for general
(non-abelian) groups, it is more usual to use multiplicative rather than additive notation. All
groups in this course should be assumed to be abelian, although many of the definitions in
this section apply equally well to general groups.
Examples. 1. The integers Z.
2. Fix a positive integer n > 0 and let
Zn = {0, 1, 2, . . . , n1} = { x Z | 0 x < n }.
where addition is computed modulo n. So, for example, when n = 9, we have 2 + 5 = 7,
3 + 8 = 2, 6 + 7 = 4, etc. Note that the inverse x of x Zn is equal to n x in this example.
3. Examples from linear algebra. Let K be a field.
(i) The elements of K form an abelian group under addition.
(ii) The non-zero elements of K form an abelian group K under multiplication.
(iii) The vectors in any vector space form an abelian group under addition.

Proposition 4.1 (The cancellation law) Let G be any group, and let g, h, k G. Then
g + h = g + k h = k.
Proof: Add g to both sides of the equation and use the Associativity and Identity axioms.
2
For any group G, g G, and integer n > 0, we define ng to be g + g + + g, with n
occurrences of g in the sum. So, for example, 1g = g, 2g = g + g, 3g = g + g + g, etc.
We extend this notation to all n Z by defining 0g = 0 and (n)g = (ng) for n < 0.
Overall, this defines a scalar action Z G G which allows as to think of abelian groups
as vector spaces over Z (or using precise terminology Z-modules - algebraic modules will
play a significant role in Rings and Modules in year 3).
Definition. A group G is called cyclic if there exists an element x G such that every
element of G is of the form mx for some m Z.

The element x in the definition is called a generator of G. Note that Z and Zn are cyclic with
generator x = 1.
45

Definition. A bijection : G H between two (abelian) groups is called an isomorphism


if (g + h) = (g) + (h) for all g, h G, and the groups G and H are called isomorphic is
there is an isomorphism between them.
The notation G
= H means that G is isomorphic to H; isomorphic groups are often thought
of as being essentially the same group, but with elements having different names.

Note (exercise) that any isomorphism must satisfy (0G ) = 0H and (g) = (g) for all
g G.
Proposition 4.2 Any cyclic group G is isomorphic either to Z or to Zn for some n > 0.
Proof: Let G be cyclic with generator x. So G = { mx | m Z }. Suppose first that the
elements mx for m Z are all distinct. Then the map : Z G defined by (m) = mx is
a bijection, and it is clearly an isomorphism.
Otherwise, we have lx = mx for some l < m, and so (ml)x = 0 with m l > 0. Let n be
the least integer with n > 0 and nx = 0. Then the elements 0x = 0, 1x, 2x, . . . , (n1)x of G
are all distinct, because otherwise we could find a smaller n. Furthermore, for any mx G,
we can write m = rn + s for some r, s Z with 0 s < n. Then mx = (rn + s)x = sx, so
G = { 0, 1x, 2x, . . . , (n1)x }, and the map : Zn G defined by (m) = mx for 0 m < n
is a bijection, which is easily seen to be an isomorphism.
2
Definition. For an element g G, the least integer n > 0 with nx = 0, if it exists, is called
the order |g| of g. If there is no such n, then g has infinite order and we write |g| = .

Exercise. If : G H is an isomorphism then |g| = |(g)| for all g G.

Definition. A group GPis generated or spanned by a subset X of G if every g G can be


written as a finite sum ki=1 mi xi , with mi Z and xi X. It is finitely generated if it has
a finite generating set X = {x1 , . . . , xn }.
So a group is cyclic if and only if it has a generating set X with |X| = 1.

In general, if G is generated by X, then we write G = hXi or G = hx1 , . . . , xn i when


X = {x1 , . . . , xn } is finite.

Definition. The direct sum of groups G1 , . . . , Gn


{ (g1 , g2 , . . . , gn ) | gi Gi } with component-wise addition

is

defined to

be the

set

(g1 , g2 , . . . , gn ) + (h1 , h2 , . . . , hn ) = (g1 +h1 , g2 +h2 , . . . , gn +hn ).


This is a group with identity element (0, 0, . . . , 0) and (g1 , g2 , . . . , gn ) = (g1 , g2 , . . . , gn ).

In general (non-abelian) group theory this is more often known as the direct product of
groups.
The main result of this section, known as the fundamental theorem of finitely generated abelian
groups, is that every finitely generated abelian group is isomorphic to a direct sum of cyclic
groups. (This is not true in general for abelian groups, such as the additive group Q of
rational numbers, which are not finitely generated.)

4.2

Subgroups, cosets and quotient groups

Definition. A subset H of a group G is called a subgroup of G if it forms a group under the


same operation as that of G.
Lemma 4.3 If H is a subgroup of G, then the identity element 0H of H is equal to the
identity element 0G of G.
46

Proof: Using the identity axioms for H and G, 0H + 0H = 0H = 0H + 0G . Now by the


cancellation law, 0H = 0G .
2
The definition of a subgroup is semantic in its nature. While it precisely pinpoints what a
subgroup is, it is quite cumbersome to use. The following proposition gives a usable criterion.
Proposition 4.4 Let H be a subset of a group G. The following statements are equivalent.
(i) H is
(ii) (a)
(b)
(c)
(iii) (a)
(b)

a subgroup of G.
H is nonempty; and
h1 , h2 H h1 + h2 H; and
h H h H.
H is nonempty; and
h1 , h2 H h1 h2 H.

Proof: If H is a subgroup of G then it is nonempty as it contains 0H . Moreover, h1 h2 =


h1 + (h2 ) H if so are h1 and h2 . Thus, (i) implies (iii).

To show that (iii) implies (ii) we pick x H. Then 0 = x x H. Now h = 0 h H for


any h H. Finally, h1 + h2 = h1 (h2 ) H for all h1 , h2 H.

To show that (ii) implies (i) we need to verify the four group axioms in H. Two of these,
Closure, and Inverse, are the conditions (b) and (c). The other two axioms are Associativity and Identity. Associativity holds because it holds in G, and H is a subset of G. Since
we are assuming that H is nonempty, there exists h H, and then h H by (c), and
h + (h) = 0 H by (b), and so Identity holds, and H is a subgroup.
2
Examples. 1. There are two standard subgroups of any group G: the whole group G itself,
and the trivial subgroup {0} consisting of the identity alone. Subgroups other than G are
called proper subgroups, and subgroups other than {0} are called non-trivial subgroups.

2. If g is any element of any group G, then the set of all integer multiples { mg | m Z }
forms a subgroup of G called the cyclic subgroup generated by g.
Let us look at a few specific examples. If G = Z, then 5Z, which consists of all multiples of
5, is the cyclic subgroup generated by 5. Of course, we can replace 5 by any integer here, but
note that the cyclic groups generated by 5 and 5 are the same.

If G = hgi is a finite cyclic group of order n and m is a positive integer dividing n, then
the cyclic subgroup generated by mg has order n/m and consists of the elements kmg for
0 k < n/m.

Exercise. What is the order or the cyclic subgroup generated by mg for general m (where
we drop the assumption that m|n)?
Exercise. Show that the groups of non-zero complex numbers C under the operation of
multiplication has finite cyclic subgroups of all possible orders.
Definition. Let g G. Then the coset H + g is the subset { h + g | h H } of G.

(Note: Since our groups are abelian, we have H + g = g + H, but in general group theory
the right and left cosets Hg and gH can be different.)
Examples. 3. G = Z, H = 5Z. There are just 5 distinct cosets H = H + 0 = { 5n | n Z },
H + 1 = { 5n + 1 | n Z }, H + 2, H + 3, H + 4. Note that H + i = H + j whenever i j
(mod 5).

47

4. G = Z6 , H = {0, 3}. There are 3 distinct cosets, H = H + 3 = {0, 3}, H + 1 = H + 4 =


{1, 4}, and H + 2 = H + 5 = {2, 5},
5. G = C , the group of non-zero complex numbers under multiplication, S 1 = {z, |z| = 1},
the unit circle. The cosets are circles. There are uncountably many distinct cosets, one for
each positive real number (radius of a circle).
Proposition 4.5 The following are equivalent for g, k G:
(i) k H + g;
(ii) H + g = H + k;
(iii) k g H.

Proof: Clearly H + g = H + k k H + g, so (ii) (i).

If k H + g, then k = h + g for some fixed h H, so g = k h. Let f H + g. Then, for


some h1 H, we have f = h1 + g = h1 + k h H + k, so Hg Hk. Similarly, if f H + k,
then for some h1 H, we have f = h1 + k = h1 + h + g H + g, so H + k H + g. Thus
H + g = H + k, and we have proved that (i) (ii).
If k H + g, then, as above, k = h + g, so k g = h H and (i) (iii).

Finally, if k g H, then putting h = k g, we have h + g = k, so k H + g, proving


(iii) (i).
2
Corollary 4.6 Two right cosets H + g1 and H + g2 of H in G are either equal or disjoint.
Proof: If H + g1 and H + g2 are not disjoint, then there exists an element k (H + g1 )
(H + g2 ), but then H + g1 = H + k = H + g2 by the proposition.
2

Corollary 4.7 The cosets of H in G partition G.


Proposition 4.8 If H is finite, then all right cosets have exactly |H| elements.
Proof: Since h1 + g = h2 + g h1 = h2 by the cancellation law, it follows that the map
: H Hg defined by (h) = h + g is a bijection, and the result follows.
2
Corollary 4.7 and Proposition 4.8 together imply:

Theorem 4.9 (Lagranges Theorem) Let G be a finite (abelian) group and H a subgroup of
G. Then the order of H divides the order of G.
Definition. The number of distinct right cosets of H in G is called the index of H in G and
is written as |G : H|.

If G is finite, then we clearly have |G : H| = |G|/|H|. But, from the example G = Z, H = 5Z


above, we see that |G : H| can be finite even when G and H are infinite.
Proposition 4.10 Let G be a finite (abelian) group. Then for any g G, the order |g| of g
divides the order |G| of G.

Proof: Let |g| = n. We saw in Example 2 above that the integer multiples { mg | m
Z } of g form a subgroup H of G. By minimality of n, the distinct elements of H are
{0, g, 2g, . . . , (n1)g}, so |H| = n and the result follows from Lagranges Theorem.
2

As an application, we can now immediately classify all finite (abelian) groups whose order is
prime.

48

Proposition 4.11 Let G be a (abelian) group having prime order p. Then G is cyclic; that
is, G
= Zp .
Proof: Let g G with 0 6= g. Then |g| > 1, but |g| divides p by Proposition 4.10, so |g| = p.
But then G must consist entirely of the integer multiples mg (0 m < p) of g, so G is cyclic.
2
Definition. If A and B are subsets of a group G, then we define their sum A + B = { a + b |
a A, b B }.
Lemma 4.12 If H is a subgroup of the abelian group G and H + g, H + h are cosets of H
in G, then (H + g) + (H + h) = H + (g + h).
Proof: Since G is abelian, this follows directly from commutativity and associativity.

Theorem 4.13 Let H be a subgroup of an abelian group G. Then the set G/H of cosets
H + g of H in G forms a group under addition of subsets.
Proof: We have just seen that (H + g) + (H + h) = H + (g + h), so we have closure, and
associativity follows easily from associativity of G. Since (H + 0) + (H + g) = H + g for all
g G, H = H + 0 is an identity element, and since (H g) + (H + g) = H g + g = H,
H g is an inverse to H + g for all cosets H + g. Thus the four group axioms are satisfied
and G/H is a group.
2
Definition. The group G/H is called the quotient group (or the factor group) of G by H.
Notice that if G is finite, then |G/H| = |G : H| = |G|/|H|. So, although the quotient group
seems a rather complicated object at first sight, it is actually a smaller group than G.
Examples. 1. Let G = Z and H = mZ for some m > 0. Then there are exactly m distinct
cosets, H, H + 1, . . . , H + (m 1). If we add together k copies of H + 1, then we get H + k.
So G/H is cyclic of order m and with generator H + 1. So by Proposition 4.2, Z/mZ
= Zm .
2. G = R and H = Z. The quotient group G/H is isomorphic to the circle subgroup
S 1 of the multiplicative group C . One writes an explicit isomorphism : G/H S 1 by
(x + Z) = e2xi .
3. G = Q and H = Z. The quotient group G/H features in one of the previous exams. It
has been required to show that this group is infinite, not finitely generated and that every
element of G/H has finite order.
4. The quotient groups play important role in Analysis: they are used to define Lebesgue
spaces. Let p 1 be a real number, U R Ran interval. Consider the vector space V of all
measurable functions f : U R such that U |f (x)|p dx < . It follows from Minkowskis
inequality that this is a vector space and consequently an abelian
group. It contains a vector
R
subspace W of negligible functions, that is, f (x) satisfying U |f (x)|p dx = 0. The quotient
group V /W is actually a vector space called Lebesgues space and denoted Lp (U, R).

4.3

Homomorphisms and the first isomorphism theorem

Definition. Let G and H be groups. A homomorphism from G to H is a map : G H


such that (g1 + g2 ) = (g1 ) + (g2 ) for all g1 , g2 G.

Homomorphisms correspond to linear transformations between vector spaces.

Note that an isomorphism is just a bijective homomorphism. There are two other types of
morphism that are worth mentioning at this stage.
49

A homomorphism is injective if it is an injection; that is, if (g1 ) = (g2 ) g1 = g2 .


A homomorphism is surjective if it is a surjection; that is, if im() = H. Sometimes, a
surjective homomorphism is called epimorphism while an injective homomorphism is called
monomorphism but we will not use this terminology in these lectures.
Lemma 4.14 Let : G H be a homomorphism. Then (0G ) = 0H and (g) = (g) for
all g G.
Proof: Exercise. (Similar to results for linear transformations.)

Example. Let G be any group, and let n Z. Then : G G defined by (g) = ng for all
g G is a homomorphism.

Kernels and images are defined as for linear transformations of vector spaces.

Definition. Let : G H be a homomorphism. Then the kernel ker() of is defined to


be the set of elements of G that map onto 0H ; that is,
ker() = { g | g G, (g) = 0H }.
Note that by Lemma 4.14 above, ker() always contains 0G .
Proposition 4.15 Let : G H be a homomorphism. Then is injective if and only if
ker() = {0G }.
Proof: Since 0G ker(), if is injective then we must have ker() = {0G }. Conversely,
suppose that ker() = {0G }, and let g1 , g2 G with (g1 ) = (g2 ). Then 0H = (g1 )
(g2 ) = (g1 g2 ) (by Lemma 4.14), so g1 g2 ker() and hence g1 g2 = 0G and g1 = g2 .
So is injective.
2
Theorem 4.16 (i) Let : G H be a homomorphism. Then ker() is a subgroup of G and
im() is a subgroup of H.
(ii) Let H be a subgroup of a group G. Then the map : G G/H defined by (g) = H + g
is a surjective homomorphism with kernel H.
Proof: (i) is straightforward using Proposition 4.4. For (ii), it is easy to check that is
surjective, and (g) = 0G/H H + g = H + 0G g H, so ker() = H.
2
The following lemma explains a connection between quotients and homomorphisms. It clarifies the trickiest point in the proof of the forthcoming First Isomorphism Theorem.

Lemma 4.17 Let : G H be a homomorphism with a kernel K, A a subgroup of H. The


homomorphism determines a homomorphism : G/A H via (A + g) = (g) for all
g G if and only if A K.
Proof: We need to check that the map is well-defined, i.e. whenever A + g = A + h with
g 6= h, we need to ensure that (g) = (h). In this case g = a + h for some a A. Hence,
is well-defined if and only if (g) = (a) + (h) = (h) for all g, h G, a A if and only if
(a) = 0 for all a A if and only if A K.
Observe also that, once is well-defined, it is trivially a homomorphism, since so is :

(A + h) + (A + g) = (h) + (g) = (h + g) = (A + h + g) = ((A + h) + (A + g)).


2

50

If one denotes the set of all homomorphism from G to H by hom(G, H) there is an elegant
way to reformulate Lemma 4.17. The composition with the quotient map : G G/A
defines a bijection
hom(G/A, H) { hom(G, H) | (A) = {0}},

7 .

Theorem 4.18 (The First Isomorphism Theorem) Let : G H be a homomorphism with


kernel K. Then G/K
= im(). More precisely, there is an isomorphism : G/K im()
defined by (K + g) = (g) for all g G.
Proof: The map is a well-defined homomorphism by Lemma 4.17. Clearly, im() = im().
Finally,
(K + g) = 0H (g) = 0H g K K + g = K + 0 = 0G/K .
By Proposition 4.15, is injective. Thus : G/K im() is an isomorphism.

One can associate two quotient groups to a homomorphism : G H. The cokernel of


is Coker() = H/im() and the coimage of is Coim() = G/ ker(). In short, the first
isomorphism theorem states that the natural homomorphism from the coimage to the image
is an isomorphism.
We shall be using this theorem later, when we prove the main theorem on finitely generated
abelian group in Subsection 4.7. The crucial observation is that any finitely generated abelian
group is the cokernel of a homomorphism between two finitely generated free abelian groups,
which we will discuss in the next section.

4.4

Free abelian groups

Definition. The direct sum Zn of n copies of Z is known as a (finitely generated) free abelian
group of rank n.
More generally, a finitely generated abelian group is called free abelian if it is isomorphic to
Zn for some n 0.

(The free abelian group Z0 of rank 0 is defined to be the trivial group {0} containing the
single element 0.)

The groups Zn have many properties in common with vector spaces such as Rn , but we must
expect some differences, because Z is not a field.
We can define the standard basis of Zn exactly as for Rn ; that is, x1 , x2 , . . . , xn , where xi
has 1 in its i-th component and 0 in the other components. This has the same properties as
a basis of a vector space; i.e. it is linearly independent and spans Zn .
Definition. Elements x1 , . . . , xn of an abelian group G are called linearly independent if, for
1 , . . . , n Z, 1 x1 + + n xn = 0G implies 1 = 2 = = n = 0Z .
Definition. Elements x1 , . . . , xn form a free basis of the abelian group G if and only if they
are linearly independent and generate (span) G.
Now consider elements x1 , . . . , xn of an abelian group G. It is possible to extend the assignment (xi ) = xiPto a group homomorphism : Zn G. As a function we define
((a1 , a2 , . . . an )T ) = ni=1 ai xi . We leave the proof of the following result as an exercise.

Proposition 4.19 (i) The function is a group homomorphism.


(ii) Elements xi are linearly independent if and only if is injective.
(iii) Elements xi span G if and only if is surjective.
51

(iv) Elements xi form a basis of G if and only if is an isomorphism.


Note that this proposition makes a perfect sense for vector spaces. Also note that the last
statement implies that x1 , . . . , xn is a free basis of G if and only if every element g G has
a unique expression g = 1 x1 + + n xn with i Z, very much like for vector spaces.

Before Proposition 4.19 we were trying to extend the assignment (xi ) = xi to a group
homomorphism : Zn G. Note that the extension we wrote is unique. This is the key to
the next corollary. The details of the proof are left to the reader.
Corollary 4.20 (Universal property of the free abelian group). Let G be a free group with
a basis x1 , . . . xn . Let H be a group and a1 , . . . an H. Then there exists a unique group
homomorphism : G H such that (xi ) = ai for all i.
As for finite dimensional vector spaces, it turns out that any two free bases of a free abelian
group have the size, but this has to be proved. It will follow directly from the next theorem.
Let x1 , x2 , . . . , xn be the standard free basis of Zn , and let y1 , . . . ym be another free basis.
As in Linear Algebra, we can define the associated change of basis matrix P (with original
basis {xi } and new basis {yi }) , where the columns of P are yiT ; that is, they express

yi in
2 1
terms of xi . For example, if n = m = 2, y1 = (2 7), y2 = (1 4), then P =
. In
7 4
Pn
general, P = (ij ) is an n m matrix with yj = i=1 ij xi for 1 j m.
P
Theorem 4.21 Let y1 , . . . ym Zn with yj = ni=1 ij xi for 1 j m. Then the following
are equivalent:
(i) y1 , . . . ym is a free basis of Zn ;
(ii) n = m and P is an invertible matrix such that P 1 has entries in Z;
(iii) n = m and det(P ) = 1.

(A matrix P Zn,n with det(P ) = 1 is called unimodular.)


Proof: (i) (ii). If y1 , . . .P
ym is a free basis of Zn then it spans Zn , so there is an m n
matrix T = (ij ) with xk = m
j=1 jk yj for 1 k n. Hence
xk =

m
X

jk yj =

m
X

jk

j=1

j=1

n
X
i=1

m
n X
X
ij jk )xi ,
(
ij xi =
i=1 j=1

Pm

and, since x1 , . . . , xn is a free basis, this implies that j=1 ij jk = 1 when i = k and 0 when
i 6= k. In other words P T = In , and similarly T P = Im , so P and T are inverse matrices.
But we can think of P and T as inverse matrices over the field Q, so it follows from First
Year Linear Algebra that m = n, and T = P 1 has entries in Z.
(ii) (i). If T = P 1 has entries in Z then, again thinking of them as matrices over the field
Q, rank(P ) = n, so the columns of P are linearly independent over Q and hence also over Z.
Since the columns of P are just the column vectors representing y1 , . . . ym , this tells us that
y1 , . . . ym are linearly independent.
Using P T = In , for 1 k n we have
m
X
j=1

Pm

jk yj =

m
X
j=1

jk

n
X
i=1

m
n X
X
ij jk )xi = xk ,
(
ij xi =
i=1 j=1

because j=1 ij jk is equal to 1 when i = k and 0 when i 6= k. Since x1 , . . . xn spans Zn ,


and we can express each xk as a linear combination of y1 , . . . ym , it follows that y1 , . . . ym
span Zn and hence form a free basis of Zn .
52

(ii) (iii). If T = P 1 has entries in Z, then det(P T ) = det(P ) det(T ) = det(In ) = 1, and
since det(P ), det(T ) Z, this implies det(P ) = 1.

(iii) (ii). From First year Linear Algebra, P 1 =


that P 1 has entries in Z.

1
det(P ) adj(P ),

so det(P ) = 1 implies
2

Examples. If n = 2 and y1 = (2 7), y2 = (1 4), then det(P ) = 8 7 = 1, so y1 , y2 is a free


basis of Z2 .
But, if y1 = (1 0), y2 = (0 2), then det(P ) = 2, so y1 , y2 is not a free basis of Z2 . Recall
that in Linear Algebra over a field, any set of n linearly independent vectors in a vector space
V of dimension n form a basis of V . This example shows that this result is not true in Zn ,
because y1 and y2 are linearly independent but do not span Z2 .
But as in Linear Algebra, for v Zn , if x(= vT ) and y are the column vectors representing
v using free bases x1 , . . . xn and y1 , . . . yn , respectively, then we have x = P y, so y = P 1 x.

4.5

Unimodular elementary row and column operations and the Smith


normal form for integral matrices

We interrupt our discussion of finitely generated abelian groups at this stage to investigate
how the row and column reduction process of Linear Algebra can be adapted to matrices
over Z. Recall from MA106 that we can use elementary row and column operations to reduce
an m n matrix of rank r over a field K to a matrix B = (ij ) with ii = 1 for 1 i r
and ij = 0 otherwise. We called this the Smith Normal Form of the matrix. We can do
something similar over Z, but the non-zero elements ii will not necessarily all be equal to 1.
The reason that we disallowed = 0 for the row and column operations (R3) and (C3)
(multiply a row or column by a scalar ) was that we wanted all of our elementary operations
to be reversible. When performed over Z, (R1), (C1), (R2) and (C2) are reversible, but (R3)
and (C3) are reversible only when = 1. So, if A is an m n matrix over Z, then we define
the three types of unimodular elementary row operations as follows:
(UR1): Replace some row ri of A by ri + trj , where j 6= i and t Z;

(UR2): Interchange two rows of A;

(UR3): Replace some row ri of A by ri .

The unimodular column operations (UC1), (UC2), (UC3) are defined similarly. Recall from
MA106 that performing elementary row or column operations on a matrix A corresponds
to multiplying A on the left or right, respectively, by an elementary matrix. These elementary matrices all have determinant 1 (1 for (UR1) and 1 for (UR2) and (UR3)), so are
unimodular matrices over Z.
Theorem 4.22 Let A be an m n matrix over Z with rank r. Then, by using a sequence of
unimodular elementary row and column operations, we can reduce A to a matrix B = (ij )
with ii = di for 1 i r and ij = 0 otherwise, and where the integers di satisfy di > 0
for 1 i r, and di |di+1 for 1 i < r. Subject to these conditions, the di are uniquely
determined by the matrix A.
Proof: We shall not prove the uniqueness part here. The fact that the number of non-zero
ii is the rank of A follows from the fact that unimodular row and column operations do not
change the rank. We use induction on m + n. The base case is m = n = 1, where there is
nothing to prove. Also if A is the zero matrix then there is nothing to prove, so assume not.
Let d be the smallest entry with d > 0 in any matrix C = (ij ) that we can obtain from A
by using unimodular elementary row and column operations. By using (R2) and (C2), we
53

can move d to position (1, 1) and hence assume that 11 = d. If d does not divide 1j for
some j > 0, then we can write 1j = qd + r with q, r Z and 0 < r < d, and then replacing
the j-th column cj of C by cj qc1 results in the entry r in position (1, j), contrary to the
choice of d. Hence d|1j for 2 j n and similarly d|i1 for 2 i m.

Now, if 1j = qd, then replacing cj of C by cj qc1 results in entry 0 position (1, j). So
we can assume that 1j = 0 for 2 j n and i1 = 0 for 2 i m. If m = 1 or n = 1,
then we are done. Otherwise, we have C = (d) C for some (m 1) (n 1) matrix C .
By inductive hypothesis, the result of the theorem applies to C , so by applying unimodular
row and column operations to C which do not involve the first row or column, we can reduce
C to D = (ij ), which satisfies 11 = d, ii = di > 0 for 2 i r, and ij = 0 otherwise,
where di |di+1 for 2 i < r. To complete the proof, we still have to show that d|d2 . If not,
then adding row 2 to row 1 results in d2 in position (1,2) not divisible by d, and we obtain a
contradiction as before.
2


42
21
Example 1. A =
.
35 14

The general strategy is to reduce the size of entries in the first row and column, until the
(1,1)-entry divides all other entries in the first row and column. Then we can clear all of
these other entries.
 Matrix 
42
21
35 14


7 14
0 21

Operation
c1 c1 2c2
c2 c2 2c1

 Matrix 
0
21
7 14


7 0
0 21

18 18 18 90
54
12
45 48
.
Example 2. A =
9 6
6 63
18
6
15 12

54

Operation
r2 r2
r1 r2

Matrix

18 18 18 90
54
12
45 48

9 6
6 63
18
6
15 12

3
6
15 12
9
12
45 48

3 6
6 63
0 18 18 90

3
0
0 0
0 6
0 12

0 12 9 51
0 18 18 90

3 0
0 0
0 3 9 51

0 6
0 12
0 0 18 90

3 0
0
0
0 3
0
0

0 0
18 90
0 0 18
90

Operation

Matrix

0 18 18 90
9
12
45 48

3 6
6 63
3
6
15 12

3
6
15 12
0 6
0 12

0 12 9 51
0 18 18 90

3 0
0 0
0 6
0 12

0 3 9 51
0 0 18 90

3 0
0
0
0 3 9
51

0 0
18 90
0 0 18
90

3 0 0 0
0 3 0 0

0 0 18 0
0 0 0 0

c1 c1 c3

r2 r2 3r1
r3 r3 r1

c2 c2
c2 c2 + c3

r3 r3 2r2

c4 c4 + 5c3
r4 r4 + r3

Operation
r1 r4

c2 c2 2c1
c3 c3 5c1
c4 c4 4c1

r2 r3

c3 c3 + 3c2
c4 c4 17c2

Note: There is also a generalisation to integer matrices of the the row reduced normal form
from Linear Algebra, where only row operations are allowed. This is known as the Hermite
Normal Form and is more complicated. It will appear on an exercise sheet.

4.6

Subgroups of free abelian groups

Proposition 4.23 Any subgroup of a finitely generated abelian group is finitely generated.
Proof: Let K < G with G an abelian group generated by x1 , . . . , xn . We shall prove by
induction on n that K can be generated by at most n elements. If n = 1 then G is cyclic.
Write G = {nx|n Z}. Let m be the smallest positive number such that mx K. If such a
number does not exist then K = {0}. Otherwise, K {nmx|n Z}. The opposite inclusion
follows using division with a remainder: write t = qm + r with 0 r < m. Then tx K if
and only if rx = (t mq)x K if and only if r = 0 due to minimality of m. In both cases
K is cyclic.
Suppose n > 1, and let H be the subgroup of G generated by x1 , . . . , xn1 . By induction,
K H is generated by y1 , . . . , ym1 , say, with m n. If K H, then K = K H and we
are done, so suppose not.
Then there exist elements of the form h + txn K with h H and t 6= 0. Since (h + txn )
K, we can assume that t > 0. Choose such an element ym = h + txn K with t minimal
subject to t > 0. We claim that K is generated by y1 , . . . , ym , which will complete the proof.
Let k K. Then k = h + uxn with h H and u Z. If t does not divide u then we can
write u = tq + r with q, r Z and 0 < r < t, and then k qym = (h qh) + rxn K,
contrary to the choice of t. So t|u and hence u = tq and k qym K H. But K H is
generated by y1 , . . . , ym1 , so we are done.
2

55

Now let H be a subgroup of the free abelian group Zn , and suppose that H is generated by
v1 , . . . , vm . Then H can be represented by an n m matrix A in which the columns are
T.
v1T , . . . , vm
Example

1
A= 3
1

3. If n = 3 and H is generated by v1 = (1 3 1) and v2 = (2 0 1), then


2
0 .
1

As we saw above, if we use a different free basis y1 , . . . , yn of Zn with basis change matrix
P , then each column vjT of A is replaced by P 1 vjT , and hence A itself is replaced by P 1 A.
So in Example 3, if we

P = 1
0

use the basis y1 = (0 1 0), y2 = (1 0 1), y3 = (1

1 1
1 1 1
1
1
1

0 1 , P = 0
0
1 , P A = 1
1 0
1
0 1
2

1 0) of Zn , then

1
1.
1

For example, the first column (1 1 2)T of P 1 A represents y1 y2 +2y3 = (1 3 1) = v1 .

In particular, if we perform a unimodular elementary row operation on A, then the resulting


matrix represents the same subgroup H of Zn but using a different free basis of Zn .
We can clearly replace a generator vi of H by vi + rvj for r Z without changing the
subgroup H that is generated. We can also interchange two of the generators or replace one
of the generators vi by vi without changing H. In other words, performing a unimodular
elementary column operation on A amounts to changing the generating set for H, so again
the resulting matrix still represents the same subgroup H of Zn .
Summing up, we have:
Proposition 4.24 Suppose that the subgroup H of Zn is represented by the matrix A Zn,m .
Then if the matrix B Zn,m is obtained by performing a sequence of unimodular row and
column operations on A, then B represents the same subgroup H of Zn using a (possibly)
different free basis of Zn .
In particular, by Theorem 4.22, we can transform A to a matrix B in Smith Normal Form.
So, then if B represents H with the free basis y1 , . . . , yn of Zn , then the r non-zero columns
of B correspond to the elements d1 y1 , d2 y2 , . . . , dr yr of Zn . So we have:
Theorem 4.25 Let H be a subgroup of Zn . Then there exists a free basis y1 , . . . , yn of Zn
such that H = h d1 y1 , d2 y2 , . . . , dr yr i, where each di > 0 and di |di+1 for 1 i < r.
In
3, it is straightforward to calculate the Smith Normal Form of A, which is
Example

1 0
0 3 , so H = hy1 , 3y2 i.
0 0

By keeping track of the unimodular row operations carried out, we can, if we need to, find
the free basis y1 , . . . , yn of Zn . Doing this in Example 3, we get:

56

Matrix
1 2
3 0
1 1
1
2
0 6
3
0
1
0
0 6
3
0
1
0
0
3
0 6
1 0
0 3
0 0

4.7

Operation

New free basis

r2 r2 3r1
r3 r3 + r1

y1 = (1 3 1), y2 = (0 1 0), y3 = (0 0 1)

c2 c2 2c1

y1 = (1 3 1), y2 = (0 1 0), y3 = (0 0 1)

r2 r3

y1 = (1 3 1), y2 = (0 0 1), y3 = (0 1 0)

r3 r3 + 2r2

y1 = (1 3 1), y2 = (0 2 1), y3 = (0 1 0)

General finitely generated abelian groups

Let G be a finitely generated abelian group. If G has n generators, Proposition 4.19 gives a
surjective homomorphism : Zn G. From the First isomorphism Theorem (Theorem 4.18)
we deduce that G
= Zn /K, where K = ker(). So we have proved that every finitely
generated abelian group is isomorphic to a quotient group of a free abelian group.
From the definition of , we see that
K = { (1 , 2 , . . . , n ) Zn | 1 x1 + + n xn = 0G }.
By Theorem 4.23, this subgroup K is generated by finitely many elements v1 , . . . , vm of Zn .
The notation
h x1 , . . . , xn | v1 , . . . , vm i
is often used to denote the quotient group Zn /K, so we have

G
= h x1 , . . . , xn | v1 , . . . , vm i.
Now we can apply Theorem 4.25 to this subgroup K, and deduce that there is a free basis
y1 , . . . , yn of Zn such that K = h d1 y1 , . . . , dr yr i for some r n, where each di > 0 and
di |di+1 for 1 i < r.

So we also have

G
= h y1 , . . . , yn | d1 y1 , . . . , dr yr i,

and G has generators y1 , . . . , yn with di yi = 0 for 1 i r.


Proposition 4.26 The group
h y1 , . . . , yn | d1 y1 , . . . , dr yr i
is isomorphic to the direct sum of cyclic groups
Zd1 Zd2 . . . Zdr Znr .
Proof: This is another application of the First Isomorphism Theorem. Let H = Zd1 Zd2
Zdr Znr , so H is generated by y1 , . . . , yn , with y1 = (1, 0, . . . , 0), . . . , yn = (0, 0, . . . , 1). Let

57

y1 , . . . , yn be the standard free basis of Zn . Then, by Proposition 4.19, there is a surjective


homomorphism from Zn to H for which
(1 y1 + + n yn ) = 1 y1 + + n yn
for all 1 , . . . , n Z. Then, by Theorem 4.18, we have H
= Zn /K, with
K = { (1 , 2 , . . . , n ) Zn | 1 y1 + + n yn = 0H }.
Now 1 y1 + + n yn is the element (1 , 2 , . . . , n ) of H, which is the zero element if and
only if i is the zero element of Zdi for 1 i r and i = 0 for r + 1 i n.
But i is the zero element of Zdi if and only if di |i , so we have

K = { (1 , 2 , . . . , r , 0, . . . , 0) Zn | di |i for 1 i r }
which is generated by the elements d1 y1 , . . . , dr yr . So
H
= Zn /K = h y1 , . . . , yn | d1 y1 , . . . , dr yr i.
2
Putting all of these results together, we get the main theorem:
Theorem 4.27 (The fundamental theorem of finitely generated abelian groups) If G is a
finitely generated abelian group, then G is isomorphic to a direct sum of cyclic groups. More
precisely, if G is generated by n elements then, for some r with 0 r n, there are integers
d1 , . . . , dr with di > 0 and di |di+1 such that
G
= Zd1 Zd2 . . . Zdr Znr .

So G is isomorphic to a direct sum of r finite cyclic groups of orders d1 , . . . , dr , and n r


infinite cyclic groups.
There may be some factors Z1 , the trivial group of order 1. These can be omitted from the
direct sum (except in the case when G
= Z1 is trivial). It can be deduced from the uniqueness
part of Theorem 4.22, which we did not prove, that the numbers in the sequence d1 , d2 , . . . , dr
that are greater than 1 are uniquely determined by G.
Note that, n r may be 0, which is the case if and only if G is finite. At the other extreme,
if all di = 1, then G is free abelian.
The group G corresponding to Example 1 in Section 4.5 is
h x1 , x2 | 42x1 35x2 , 21x1 14x2 i

and we have G
= Z7 Z21 , a group of order 7 21 = 147.

The group defined by Example 2 in Section 4.5 is

h x1 , x2 , x3 , x4 | 18x1 + 54x2 + 9x3 + 18x4 , 18x1 + 12x2 6x3 + 6x4 ,


18x1 + 45x2 + 6x3 + 15x4 , 90x1 + 48x2 + 63x3 + 12x4 i,
which is isomorphic to Z3 Z3 Z18 Z, and is an infinite group with a (maximal) finite
subgroup of order 3 3 18 = 162,
The group defined by Example 3 in Section 4.6 is

h x1 , x2 , x3 | x1 + 3x2 x3 , 2x1 + x3 i,

and is isomorphic to Z1 Z3 Z
= Z3 Z, so it is infinite, with a finite subgroup of order 3.
58

4.8

Finite abelian groups

In particular, for any finite abelian group G, we have G


= Zd1 Zd2 Zdr , where di |di+1
for 1 i < r, and |G| = d1 d2 dr .

From the uniqueness part of Theorem 4.22 (which we did not prove), it follows that, if di |di+1
for 1 i < r and ei |ei+1 for 1 i < s. then Zd1 Zd2 Zdr
= Ze1 Ze2 Zes if and only if
r = s and di = ei for 1 i r.

So the isomorphism classes of finite abelian groups of order n > 0 are in one-one correspondence with expressions n = d1 d2 dr for which di |di+1 for 1 i < r. This enables us to
classify isomorphism classes of finite abelian groups.
Examples. 1. n = 4. The decompositions are 4 and 2 2, so G
= Z4 or Z2 Z2 .

2. n = 15. The only decomposition is 15, so G


= Z15 is necessarily cyclic.
3. n = 36. Decompositions are 36, 2 18, 3 12 and 6 6, so G
= Z36 , Z2 Z18 , Z3 Z12
and Z6 Z6 .

Although we have not proved in general that groups of the same order but with different
decompositions of the type above are not isomorphic, this can always be done in specific
examples by looking at the orders of elements.
We saw in an exercise above that if : G H is an isomorphism then |g| = |(g)| for all
g G. So isomorphic groups have the same number of elements of each order.

Note also that, if g = (g1 , g2 , . . . , gn ) is an element of a direct sum of n groups, then |g| is the
least common multiple of the orders |gi | of the components of g.

So, in the four groups of order 36, G1 = Z36 , G2 = Z2 Z18 , G3 = Z3 Z12 and G4 = Z6 Z6 ,
we see that only G1 contains elements of order 36. Hence G1 cannot be isomorphic to G2 ,
G3 or G4 . Of the three groups G2 , G3 and G4 , only G2 contains elements of order 18, so G2
cannot be isomorphic to G3 or G4 . Finally, G3 has elements of order 12 but G4 does not, so
G3 and G4 are not isomorphic, and we have now shown that no two of the four groups are
isomorphic to each other.
As a slightly harder example, Z2 Z2 Z4 is not isomorphic to Z4 Z4 , because the former
has 7 elements of order 2, whereas the latter has only 3.

4.9

Tensor products

Given two abelian groups A and G, one can form a new abelian group A B, their tensor
product, do not confuse with the direct product. We consider the direct product X = A B
as a set. Let F be the free abelian group with X as a basis:
F = h A G | i.

P
Elements of F are formal finite Z-linear combinations i ni (ai , gi ), ni Z, ai A, gi G.
Let F0 be the subgroup of F generated by the following elements
(a + b, g) (a, g) (b, g), n(a, g) (na, g),

(a, g + h) (a, g) (a, h),

n(a, g) (a, ng)

for all possible Z, a, b A, g, h G. The tensor product is the quotient group


A G = F/F0 = h A G | relations above i.
We have to get used to this definition that seems strange at the first glance. First, it is easy
to materialise certain elements of A G. Elementary tensors are
a g = (a, g) + F0
59

for various a A, g G. However, it is important to realise that not all tensors are
elementary. Generators for F0 become relations on elementary tensors,
(a + b) g = a g + b g, n(a g) = (na) h,

a (g + h) = a g + a h, n(a g) = a (nh),
P
so a general element of A G is a sum i ai gi .

Exercise. Show that a 0 = 0 g = 0 for all a A, g G.

If
hj span A G. Indeed, given
Pelements bi span A, while elements hj span
PG, then bi P
a

G,
we
can
express
all
a
=
n
b
,
g
=
k
k
k
k k
i ki i
j mkj hj . Then
X
XX
X
X
ak gk =
(
nki bi ) (
mkj hj ) =
nki mkj bi gj
k

k,i,j

In fact, even a more subtle statement holds.


Exercise. Let bi , be a basis of A, hj , a basis of G. Then the elementary tensors bi hj
constitute a basis of A G.

In particular, a tensor product of two free groups is free. However, for general groups tensor
products could behave in quite an unpredictable way. For instance, Z2 Z3 = 0. Indeed,
1Z2 1Z3 = 3 1Z2 1Z3 = 1Z2 3 1Z3 = 0.
To help sorting out zero from nonzero elements in tensor products we need to understand a
connection between tensor products and bilinear maps. Let A, G, and H be abelian groups.

Definition. A function : A G H is a bilinear map if


(a + b, g) = (a, g) + (b, g), n(a, g) = (na, g)
(a, g + h) = (a, g) + (a, h), n(a, g) = (a, ng)
for all possible n Z, a, b A, g, h G. Let Bil(A G, H) be the set of all bilinear maps
from A G to H.
Lemma 4.28 (Universal property of tensor product.). The function
: A G A G,

(a, g) = a g

is a bilinear map. This bilinear map is universal, i.e. the composition with defines a
bijection
hom(A G, H) Bil(A G, H), 7 .
Proof: The function is a bilinear map: the four properties of a bilinear map easily follow
from the corresponding generators of F0 . For instance, (a + b, g) = (a, g) + (b, g) because
(a + b, g) (a, g) (b, g) F0 .

Let F un denote the set of functions between two sets. By the universal property of a free
abelian group (Corollary 4.20), we have a bijection
hom(F, H) F un(A G, H).
Bilinear maps correspond to functions vanishing on F0 , i.e., to linear maps from F/F0
(Lemma 4.17).
2
In the following section we will need a criterion for elements of RS 1 to be nonzero. The circle
group S 1 is a group under multiplication, creating certain confusion for tensor products. To
avoid this confusion we identify the multiplicative group S 1 with the additive group R/2Z
via the natural isomorphism exi 7 x + 2Z.
60

Proposition 4.29 Let a (x + 2Z) R R/2Z where x R. Then a (x + 2Z) = 0 if


and only if a = 0 or x/ Q.
Proof: If a = 0, then a (x + 2Z) = 0. If a 6= 0 and x = n/m with m, n Z, then
a (x+ 2Z) = 2m(a/2m) (x+ 2Z) = a/2m 2m(x+ 2Z) = a (2n + 2Z) = a 0 = 0.
In the opposite direction, let us consider a (x + 2Z) with a 6= 0 and x/ 6 Q. It suffices to
construct a bilinear map : R R/2Z A to some group A such that (a, x + 2Z) 6= 0.
e (x + 2Z)) =
By Lemma 4.28 this gives a homomorphism e : R R/2Z A with (a
(a, x + 2Z) 6= 0. Hence, a (x + 2Z) 6= 0.

Let us consider R as a vector space over Q. The subgroup Q of R is a vector subspace,


hence the quotient group A = R/Q is also a vector space over Q. Since 2Z Q, we have
a homomorphism
: R/2Z R/Q, (z + 2Z) = z + Q.
Since x/ 6 Q, (x + 2Z) 6= 0. Choose a basis ei of R over Q such that e1 = a. Let
ei : R Q be the linear function22 computing the i-th coordinate in this basis:
X
ei (
xj ej ) = xi .
j

The required bilinear map is defined using multiplication by a scalar in A = R/Q: (b, z +
2Z) = e1 (b)(z + 2Z). Clearly, (a, x + 2Z) = e1 (e1 )(x + 2Z) = 1 (x + 2Z) =
x + Q 6= 0.
2

Exercise. Zn Zm = Zgcd(n,m) .

4.10

Third Hilberts problem

All the hard work we have done is going to pay off now. We will understand a solution of
the third Hilbert problem. In 1900 Hilbert formulated 23 problems that, in his view, would
influence Mathematics of the 20th century. The third problem was the first solved: the same
year 1900 by Dehn, which is quite remarkable as the problem was missing from Hilberts
lecture and appeared in print only in 1902, two years after solution.
In his third problem Hilbert asks whether two 3D polytopes of the same volume are scissor
congruent. Recall that M and N are congruent if there is a motion that moves M to N . M
and N are scissor congruent if one can cut M into pieces Mi (cutting along planes) and N
into pieces Ni such that individual pieces Mi and Ni are congruent for each i.
Let us consider a scissor group generated by all n-dimensional polytopes P
Pn = h P | M N, A B C i
with these relations for each pair M , N of congruent polytopes and each cut A = B C
of a polytope by a hyperplane. For a polytope M , let use denote by [M ] Pn its class in
the scissor group. Clearly, M and N are scissor congruent if and only if [M ] = [N ]. By
Lemma 4.17, n-dimensional volume is a homomorphism
n : Pn R,

n ([M ]) = volume(M ).

The 3rd Hilbert problem is whether 3 is injective.


Theorem 4.30 2 is injective.
22

commonly known as a covector

61

Proof: For a polygon M , there are triangles T1 , T2 . . . Tn such that [M ] = [T1 ]+[T2 ]+. . . [Tn ].
It follows from triangulation of M illustrated on the next picture.

It suffices to show that if two triangles T and T have the same area, then [T ] = [T ]. Indeed,
using it, one can reshape triangles to T1 , T2 . . . Tn so that they add up to a triangle T :

Then [M ] = [T1 ] + [T2 ] + . . . [Tn ] = [T1 ] + [T2 ] + . . . [Tn ] = [T ] and we supposedly know that
two triangles of the same area are scissors equivalent. The following picture shows that a
triangles with the base b and the height h is equivalent to the rectangle with sides b and h/2.

In particular, any polygon is equivalent to a right-angled triangle. The last picture shows
who two right-angled triangles of the same area are scissors congruent.
A
P
B

The equal area triangles are CAB and CP Q. This means that |CA||CB| = |CP ||CQ|.
Hence, |CA|/|CQ| = |CP |/|CB| and triangles CP B and CAQ are similar. In particular,
AQ and P B are parallel, thus triangles AP B and QP B share the same base and height and,
consequently, scissors congruent. Finally,
[CAB] = [CP B] + [AP B] = [CP B] + [QP B] = [CP Q]
2

Observe that n is surjective, hence 2 is an isomorphism and P2 = R. However, 3 is not


injective, disproving the 3rd Hilbert problem.
Theorem 4.31 Let T be a regular tetrahedron, C a cube, both of unit volume. Then
[T ] 6= [C] P3 .
Proof: Let M be a polytope with the of edges I. For each edge i, let hi be its length hi ,
i R/2Z the angle near this edge. The Dehn invariant of M is
X
(M ) =
hi i R R/2Z.
i

Using Lemma 4.17, we conclude that is a well-defined homomorphism : P3 R/2Z.


Indeed, defines a homomorphism from the free group F generated by all polytopes. Keeping
62

in mind P3 = F/F0 , we need to check that vanishes on generators of F0 . Clearly, (M N ) =


0 if M and N are congruent. It is slightly more subtle to see that (A B C) = 0 if
A = B C is a cut. One can collect the terms in 4 types of groups, with the zero sum in
each group.
Survivor: an edge length h with angle survives completely in B or C. This contributes
h h = 0.
Edge cut: an edge length h with angle is cut into edges of lengths hB in B and hC in C.
This contributes h hB hC = (h hB hC ) = 0 = 0.
Angle cut: an edge length h with angle has its angle cut into angles B in B and C in
C. This contributes h h B h C = h ( B C ) = h 0 = 0.
New edge: a new edge of length h is created. If its angle in B is , then its angle in C is
. This contributes h h ( ) = h = 0, by Proposition 4.29.
Finally, using Proposition 4.29,

([C]) = 12(1 ) = 12 = 0, while ([T ]) = 6


arccos 6= 0,
3
4
4
3
3
by Lemma 4.32. Hence, [C] 6= [T ].

Lemma 4.32 arccos(1/3)/ 6 Q.


Proof: Let arccos(1/3) = q. We consider a sequence xn = cos(2n q). If q is rational, then
this sequence admits only finitely many values. On the other hand, x0 = 1/3 and
xn+1 = cos(2 2n q) = 2 cos2 (2n q) 1 = 2x2n 1.
Computing several first terms,
x1 =

17
5983
28545857
7
, x2 = , x3 =
, x4 =
,...
8
9
81
3
316

as can be easily shown the denominators grow indefinitely. Contradiction.

Now it is natural to ask what exactly the group P3 is. It was proved later (in 1965) that the
joint homomorphism (3 , ) : P3 R (R R/2Z) is injective. It is not surjective but the
image can be explicitly described but we wont do it here.

4.11

Possible topics for the second year essays

If you would like to write an essay taking something further from this course, here are some
suggestions. Ask me if you want more information.
(i) Bilinear and quadratic forms over fields of characteristic 2 (i.e. where 1 + 1 = 0). You
can do hermitian forms too.
(ii) Grassmann algebras, determinants and tensors.
(iii) Matrix Exponents and Baker-Campbell-Hausdorff formula.
(iv) Abelian groups and public key cryptography (be careful not to repeat whatever is covered in Algebra-2).
(v) Lattices (abelian groups with bilinear forms), lattice E8 and Leech lattice.
(vi) Abelian group law on an elliptic curve.
(vii) Groups Pn for other n, including a precise description of P3 .

The End

63

You might also like