2012:2013 Lecture Notes
2012:2013 Lecture Notes
Dmitriy Rumynin
year 2012
Contents
1 Review of Some Linear Algebra
1.1
1.2
Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
. . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
2.4
2.5
10
2.6
11
2.7
14
2.8
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.9
16
17
. . . . . . . . . . . . . . . . . . . . . . .
18
20
21
22
3.1
22
3.2
. . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.3
23
3.4
26
3.5
27
3.6
30
3.7
34
3.7.1
34
3.7.2
The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.7.3
The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.8
36
3.9
42
45
4.1
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.2
46
4.3
49
4.4
51
4.5
Unimodular elementary row and column operations and the Smith normal
form for integral matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.6
55
4.7
57
4.8
59
4.9
Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
61
63
. . . . . . . . . . . . . . . . . . . .
Students will need to be familiar with the whole of the contents of the First Year Linear
Algebra module (MA106). In this section, we shall review the material on matrices of linear
maps and change of basis. Other material will be reviewed as it arises.
1.1
Let V and W be vector spaces over a field1 K. Let T : V W be a linear map, where
dim(V ) = n, dim(W ) = m. Choose a basis e1 , . . . , en of V and a basis f1 , . . . , fm of W .
Now, for 1 j n, T (ej ) W , so T (ej ) can be written uniquely as a linear combination of
f1 , . . . , fm . Let
T (e1 ) = 11 f1 + 21 f2 + + m1 fm
T (e2 ) = 12 f1 + 22 f2 + + m2 fm
T (en ) = 1n f1 + 2n f2 + + mn fm
where the coefficients ij K (for 1 i m, 1 j n) are uniquely determined.
11 12
21 22
A=
m1 m2
. . . 1n
. . . 2n
...
. . . mn
over K. Then A is called the matrix of the linear map T with respect to the chosen bases of
V and W . Note that the columns of A are the images T (e1 ), . . . , T (en ) of the basis vectors
of V represented as column vectors with respect to the basis f1 , . . . , fm of W .
It was shown in MA106 that T is uniquely determined by A, and so there is a one-one
correspondence between linear maps T : V W and m n matrices over K, which depends
on the choice of bases of V and W .
For v V , we can write v uniquely as a linear combination of the basis vectors ei ; that is,
v = x1 e1 + + xn en , where the xi are uniquely determined by v and the basis ei . We shall
call xi the coordinates of v with respect to the basis e1 , . . . , en . We associate the column
vector
x1
x2
n,1
v=
. K ,
.
xn
to v, where K n,1 denotes the space of n 1-column vectors with entries in K. Notice that v
is equal to (x1 , x2 , . . . , xn )T , the transpose of the row vector (x1 , x2 , . . . , xn ). To simplify the
typography, we shall often write column vectors in this manner.
It was proved in MA106 that if A is the matrix of the linear map T , then for v V , we
have T (v) = w if and only if Av = w, where w K m,1 is the column vector associated with
w W.
1
It is conventional to use either F or K as the letter to denote a field. F stands for a field, while K comes
from the German word k
orper.
1.2
Change of basis
P is called the basis change matrix or transition matrix for the original basis e1 , . . . , en and
the new basis e1 , . . . , en . Note that the columns of P are the new basis vectors ei written as
column vectors in the old basis vectors ei . (Recall also that P is the matrix of the identity
map V V using basis e1 , . . . , en in the domain and basis e1 , . . . , en in the codomain.)
Usually the original basis e1 , . . . , en will be the standard basis of K n .
0 1 1
P = 1 2
0 .
2 0
0
Proposition 1.1 With the above notation, let v V , and let v and v denote the column
vectors associated with v when we use the bases e1 , . . . , en and e1 , . . . , en , respectively. Then
P v = v.
So, in the example above, if we take v = (1 2 4) = e1 2e2 + 4e3 then v = 2e1 2e2 3e3 ,
and you can check that P v = v.
This equation P v = v describes the change of coordinates associated with the basis change.
In Section 3 below, such basis changes will arise as changes of coordinates, so we will use this
relationship quite often.
Now let T : V W , ei , fi and A be as in Subsection 1.1 above, and choose new bases
of W . Then
e1 , . . . , en of V and f1 , . . . , fm
m
X
ij fi for 1 j n,
T (ej ) =
i=1
where B = (ij ) is the m n matrix of T with respect to the bases {ei } and {fi } of V and
W . Let the n n matrix P = (ij ) be the basis change matrix for original basis {ei } and
new basis {ei }, and let the m m matrix Q = (ij ) be the basis change matrix for original
basis {fi } and new basis {fi }. The following theorem was proved in MA106:
Theorem 1.2 With the above notation, we have AP = QB, or equivalently B = Q1 AP .
In most of the applications in this course we will have V = W (= K n ), {ei } = {ei }, {fi } = {fi }
and P = Q, and hence B = P 1 AP .
2.1
Introduction
of V . Our aim is to find a new basis e1 , . . . , en for V , such that the matrix of T with respect
to the new basis is as simple as possible. Equivalently (by Theorem 1.2), we want to find
an invertible matrix P (the associated basis change matrix) such that P 1 AP is a simple as
possible.
Our
form of matrix is a diagonal matrix, but we saw in MA106 that the matrix
preferred
1 1
, for example, is not similar to a diagonal matrix. We shall generally assume that
0 1
K = C. This is to ensure that the characteristic polynomial of A factorises into linear factors.
Under this assumption, it can be proved that A is always similar to a matrix B = (ij ) of
a certain type (called the Jordan canonical form or sometimes Jordan normal form of the
matrix), which is not far off being diagonal. In fact ij is zero except when j = i or j = i + 1,
and i,i+1 is either 0 or 1.
We start by summarising some definitions and results from MA106. We shall use 0 both
for the zero vector in V and the zero n n matrix. The zero linear operator 0V : V V
corresponds to the zero matrix 0, and the identity linear operator IV : V V corresponds
to the identity n n matrix In .
Because of the correspondence between linear maps and matrices, which respects addition
and multiplication, all statements about A can be rephrased as equivalent statements about
T . For example, if p(x) is a polynomial equation in a variable x, then p(A) = 0 p(T ) = 0V .
The dimension of the eigenspace, which is called the nullity of T IV is therefore equal to
the number of linearly independent eigenvectors corresponding to . This number plays an
important role in the theory of the Jordan canonical form. From the Dimension Theorem,
proved in MA106, we know that
rank(T IV ) + nullity(T IV ) = n,
where rank(T IV ) is equal to the dimension of the image of T IV .
For the sake of completeness, we shall now repeat the results proved in MA106 about the
diagonalisability of matrices. We shall use the theorem that a set of n linearly independent
vectors of V form a basis of V without further explicit reference.
Theorem 2.1 Let T : V V be a linear operator. Then the matrix of T is diagonal with
respect to some basis of V if and only if V has a basis consisting of eigenvectors of T .
Proof: Suppose that the matrix A = (ij ) of T is diagonal with respect to the basis
e1 , . . . , en of V . Recall from Subsection 1.1 that the image of the i-th basis vector of V is
represented by the i-th column of A. But since A is diagonal, this column has the single
non-zero entry ii . Hence T (ei ) = ii ei , and so each basis vector ei is an eigenvector of A.
Conversely, suppose that e1 , . . . , en is a basis of V consisting entirely of eigenvectors of T .
Then, for each i, we have T (ei ) = i ei for some i K. But then the matrix of A with
respect to this basis is the diagonal matrix A = (ij ) with ii = i for each i.
2
2.2
This theorem says that a matrix satisfies its own characteristic equation. It is easy to visualise
with the following non-proof:
cA (A) = det(A AI) = det(0) = 0.
This argument is faulty because you cannot really plug the matrix A into det(A xI): you
must compute this polynomial first.
Theorem 2.4 (Cayley-Hamilton) Let cA (x) be the characteristic polynomial of the n n
matrix A over an arbitrary field K. Then cA (A) = 0.
Proof: Recall from MA106 that, for any n n matrix B, we have Badj(B) = det(B)In ,
where adj(B) is the n n matrix whose (j, i)-th entry is the cofactor cij = (1)i+j det(Bij ),
and Bij is the (n 1) (n 1) matrix obtained by deleting the i-th row and the j-th column
of B.
By definition, cA (x) = det(A xIn ), and (A xIn )adj(A xIn ) = det(A xIn )In . Now
det(A xIn ) is a polynomial of degree n in x; that is det(A xIn ) = a0 x0 + a1 x1 + . . . an xn ,
with ai K. Similarly, putting B = A xIn in the last paragraph, we see that the (j, i)th entry (1)i+j det(Bij ) of adj(B) is a polynomial of degree at most n 1 in x. Hence
adj(A xIn ) is itself a polynomial of degree at most n 1 in x in which the coefficients are
n n matrices over K. That is, adj(A xIn ) = B0 x0 + B1 x + . . . Bn1 xn1 , where each Bi
is an n n matrix over K. So we have
(A xIn )(B0 x0 + B1 x + . . . Bn1 xn1 ) = (a0 x0 + a1 x1 + . . . an xn )In .
Since this is a polynomial identity, we can equate coefficients of the powers of x on the left
and right hand sides. In the list of equations below, the equations on the left are the result
of equating coefficients of xi for 0 i n, and those on right are obtained by multiplying
Ai by the corresponding left hand equation.
6
AB0
AB1
AB2
B0
B1
ABn1 Bn2
Bn1
=
a0 I n ,
=
a1 I n ,
=
a2 I n ,
= an1 In ,
=
an I n ,
AB0
A2 B1
A3 B2
AB0
A2 B1
=
a0 I n
=
a1 A
=
a2 A2
= an1 An1
=
an An
Now summing all of the equations in the right hand column gives
0 = a0 A0 + a1 A + . . . + an1 An1 + an An
(remember A0 = In ), which says exactly that cA (A) = 0.
2.3
We start this section with a brief general discussion of polynomials in a single variable x with
coefficients in a field K, such as p = p(x) = 2x2 3x + 11. The set of all such polynomials
is denoted by K[x]. There are two binary operations on this set: addition and multiplication
of polynomials. These operations turn K[x] into a ring, which will be studied in great detail
in Algebra-II.
As a ring K[x] has a number of properties in common3 with the integers Z. The notation
a|b mean a divides b. It can be applied to integers: e.g. 3|12; and also to polynomials: e.g.
(x 3)|(x2 4x + 3).
We can divide one polynomial p (with p 6= 0) into another polynomial q and get a remainder
with degree less than p. For example, if q = x5 3, p = x2 + x + 1, then we find q = sp + r
with s = x3 x2 + 1 and r = x 4. For both Z and K[x], this is known as the Euclidean
Algorithm.
A polynomial r is said to be a greatest common divisor of p, q K[x] if r|p, r|q, and, for any
polynomial r with r |p, r |q, we have r |r. Any two polynomials p, q K[x] have a greatest
common divisor and a least common multiple (which is defined similarly), but these are only
determined up to multiplication by a constant. For example, x 1 is a greatest common
divisor of x2 2x + 1 and x2 3x + 2, but so is 1 x and 2x 2. To resolve this ambiguity,
we make the following definition.
Definition. A polynomial with coefficients in a field K is called monic if the coefficient of
the highest power of x is 1.
For example, x3 2x2 + x + 11 is monic, but 2x2 x 1 is not.
Now we can define gcd(p, q) to be the unique monic greatest common divisor of p and q, and
similarly for lcm(p, q).
As with the integers, we can use the Euclidean Algorithm to compute gcd(p, q). For example,
if p = x4 3x3 + 2x2 , q = x3 2x2 x + 2, then p = q(x 1) + r with r = x2 3x + 2, and
q = r(x + 1), so gcd(p, q) = r.
Theorem 2.5 Let A be an n n matrix over K representing the linear operator T : V V .
The following statements hold:
(i) there is a unique monic non-zero polynomial p(x) with minimal degree and coefficients
in K such that p(A) = 0,
3
Technically speaking, they are both Euclidean Domains that is an important topic in Algebra-II.
Corollary 2.6 The minimal polynomial of a square matrix A divides its characteristic polynomial.
Similar matrices A and B represent the same linear operator T , and so their minimal polynomial is the same as that of T . Hence we have
Proposition 2.7 Similar matrices have the same minimal polynomial.
For a vector v V , we can also define a relative minimal polynomial A,v as the unique
monic polynomial p of minimal degree for which p(T )(v) = 0V . Since p(T ) = 0 if and only
if p(T )(v) = 0V for all v V , A is the least common multiple of the polynomials A,v for
all v V .
But p(T )(v) = 0V for all v V if and only if p(T )(bi ) = 0V for all bi in a basis b1 , . . . , bn
of V (exercise), so A is the least common multiple of the polynomials A,bi .
This gives a method of calculating A . For any v V , we can compute A,v by calculating the
sequence of vectors v, T (v), T 2 (v), T 3 (v) and stopping when it becomes linearly dependent.
In practice, we compute T (v) etc. as Av for the corresponding column vector v K n,1 .
3 1 0 1
1
1 0 1
.
A=
0
0 1 0
0
0 0 1
Ab3 = b3 , so A,b3 = x 1.
2.4
The Cayley-Hamilton theorem and the theory of minimal polynomials are valid for any matrix
over an arbitrary field K, but the theory of Jordan forms will require an additional assumption
that the characteristic polynomial cA (x) is split in K[x], i.e. it factorises into linear factors. If
the field K = C then all polynomials in K[x] factorise into linear factors by the Fundamental
Theorem of Algebra and JCF works for any matrix.
Definition. A Jordan chain of length k is a sequence of non-zero vectors v1 , . . . , vk K n,1
that satisfies
Av1 = v1 , Avi = vi + vi1 , 2 i k,
It is instructive to keep in mind the following model of a Jordan chain that works over
complex or real field. Let V be the vector space of functions in the form f (z)ez where f (z)
is the polynomial of degree less than k. Consider the derivative, that is, the linear operator
T : V V given by T ((z)) = (z). Vectors vi = z i1 ez /(i 1)! form the Jordan chain for
T and the basis of V . In particular, the matrix of T in this basis is the Jordan block defined
below.
Definition. A non-zero vector v V such that (A In )i v = 0 for some i > 0 is called a
generalised eigenvector of A with respect to the eigenvalue .
Note that, for fixed i > 0, { v V | (A In )i v = 0 } is the nullspace of (A In )i , and is
called the generalised eigenspace of index i of A with respect to . When i = 1, this is the
ordinary eigenspace of A with respect to .
Notice that v V is an eigenvector with eigenvalue if and only if A,v = x . Similarly,
generalised eigenvectors are characterised by the property A,v = (x )i .
3 1 0
A = 0 3 1.
0 0 3
We see that, for the standard basis of K 3,1 , we have Ab1 = 3b1 , Ab2 = 3b2 + b1 , Ab3 =
3b3 + b2 , so b1 , b2 , b3 is a Jordan chain of length 3 for the eigenvalue 3 of A. The generalised
eigenspaces of index 1, 2, and 3 are respectively hb1 i, hb1 , b2 i, and hb1 , b2 , b3 i.
Notice that the dimension of a generalised eigenspace of A is the nullity of (T IV )i , which
is a a function of the linear operator T associated with A. Since similar matrices represent
the same linear operator, we have
Proposition 2.8 The dimensions of corresponding generalised eigenspaces of similar matrices are the same.
Definition. We define a Jordan block with eigenvalue of degree k to be a k k matrix
J,k = (ij ), such that ii = for 1 i k, i,i+1 = 1 for 1 i < k, and ij = 0 if j is not
equal to i or i + 1. So, for example,
3i
0 1 0 0
1
0
2
0 0 1 0
1 1
3i
and
J0,4 =
1 ,
J1,2 =
,
J,3 = 0
2
0 0 0 1
0 1
3i
0
0
2
0 0 0 0
9
3i
2
It should be clear that the matrix of T with respect to the basis v1 , . . . , vn of K n,1 is a Jordan
block of degree n if and only if v1 , . . . , vn is a Jordan chain for A.
Note also that for A = J,k , A,vi = (x )i , so A = (x )k . Since J,k is an upper
triangular matrix with entries on the diagonal, we see that the characteristic polynomial
cA of A is also equal to ( x)k .
Warning: Some authors put the 1s below rather than above the main diagonal in a Jordan
block. This corresponds to either writing the Jordan chain in the reversed order or using
rows instead of columns for the standard vector space. However, if an author does both (uses
rows and reverses the order) then 1s will go back above the diagonal.
2.5
Definition. A Jordan basis for A is a basis of K n,1 which is a disjoint union of Jordan chains.
We denote the m n matrix in which all entries are 0 by 0m,n . If A is an m m matrix and
B an n n matrix, then we denote the (m + n) (m + n) matrix with block form
A
0m,n
,
0n,m
B
by A B. For example
So, if
1 1 1
0
1 2
1 0
1 =
0
0 1
0
2 0 2
0
2
1
0
0
0
0
0
1
1
2
0
0
0
0
1 1
.
0
1
0 2
Or you can take Galois Theory next year and this should become obvious.
10
2.6
When n = 2 and n = 3, the JCF can be deduced just from the minimal and characteristic
polynomials. Let us consider these cases.
When n = 2, we have either two distinct eigenvalues 1 , 2 , or a single repeated eigenvalue
1 . If the eigenvalues are distinct, then by Corollary 2.3 A is diagonalisable and the JCF is
the diagonal matrix J1 ,1 J2 ,1 .
1 4
Example 1. A =
. We calculate cA (x) = x2 2x 3 = (x 3)(x + 1), so there are
1 1
two distinct
3 and 1. Associated
eigenvectors
are (2 1)T and (2 1)T , so we
eigenvalues,
2 2
3
0
put P =
and then P 1 AP =
.
1
1
0 1
If the eigenvalues are equal, then there are two possible JCF-s, J1 ,1 J1 ,1 , which is a scalar
matrix, and J1 ,2 . The minimal polynomial is respectively (x 1 ) and (x 1 )2 in these two
cases. In fact, these cases can be distinguished without any calculation whatsoever, because
in the first case A = P JP 1 = J so A is its own JCF
In the second case, a Jordan basis consists of a single Jordan chain of length 2. To find such
a chain, let v2 be any vector for which (A 1 I2 )v2 6= 0 and let v1 = (A 1 I2 )v2 . (Note
that, in practice, it is often easier to find the vectors in a Jordan chain in reverse order.)
1
4
Example 2. A =
. We have cA (x) = x2 + 2x + 1 = (x + 1)2 , so there is a single
1 3
5
This means I am not proving them here but I expect you to be able to prove them
The characteristic polynomial of J is the product of the characteristic polynomials of the Jordan blocks
and the minimal polynomial of J is the least common multiple of characteristic polynomials of the Jordan
blocks
6
11
Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and the other
has multiplicity 1. Let the eigenvalues be 1 , 1 , 2 , with 1 6= 2 . Then there are two
possible JCF-s for A, J1 ,1 J1 ,1 J2 ,1 and J1 ,2 J2 ,1 , and the minimal polynomial is
(x 1 )(x 2 ) in the first case and (x 1 )2 (x 2 ) in the second.
In the first case, a Jordan basis is a union of three Jordan chains of length 1, each of which
consists of an eigenvector of A.
2
0
0
Example 3. A = 1
5
2 . Then
2 6 2
cA (x) = (2 x)[(5 x)(2 x) + 12] = (2 x)(x2 3x + 2) = (2 x)2 (1 x).
We know from the theory above that the minimal polynomial must be (x 2)(x 1) or
(x 2)2 (x 1). We can decide which simply by calculating (A 2I3 )(A I3 ) to test whether
or not it is 0. We have
0
0
0
1
0
0
A 2I3 = 1
3
2 , A I3 = 1
4
2 ,
2 6 4
2 6 3
The eigenvectors v for 1 = 2 satisfy (A 2I3 )v = 0, and we must find two linearly independent solutions; for example we can take v1 = (0 2 3)T , v2 = (1 1 1)T . An eigenvector
for the eigenvalue 1 is v3 = (0 1 2)T , so we can choose
0
1
0
1
P = 2 1
3
1 2
In the second case, there are two Jordan chains, one for 1 of length 2, and one for 2
of length 1. For the first chain, we need to find a vector v2 with (A 1 I3 )2 v2 = 0 but
(A 1 I3 )v2 6= 0, and then the chain is v1 = (A 1 I3 )v2 , v2 . For the second chain, we
simply need an eigenvector for 2 .
3
2
1
Example 4. A = 0
3
1 . Then
1 4 1
cA (x) = (3 x)[(3 x)(1 x) + 4] 2 + (3 x) = x3 + 5x2 8x + 4 = (2 x)2 (1 x),
as in Example 3. We have
2
2
1
0
0
0
1
2
1
2
1.
A2I3 = 0
1
1 , (A2I3 )2 = 1 3 2 , (AI3 ) = 0
1 4 2
2
6
4
1 4 3
and we can check that (A 2I3 )(A I3 ) is non-zero, so we must have A = (x 2)2 (x 1).
12
For the Jordan chain of length 2, we need a vector with (A2I3 )2 v2 = 0 but (A2I3 )v2 6= 0,
and we can choose v2 = (2 0 1)T . Then v1 = (A 2I3 )v2 = (1 1 1)T . An eigenvector for
the eigenvalue 1 is v3 = (0 1 2)T , so we can choose
1
2
0
P = 1
0
1
1 1 2
and then
2 1 0
P 1 AP = 0 2 0 .
0 0 1
Finally, suppose that there is a single eigenvalue, 1 , so cA = (1 x)3 . There are three
possible JCF-s for A, J1 ,1 J1 ,1 J1 ,1 , J1 ,2 J1 ,1 , and J1 ,3 , and the minimal polynomials
in the three cases are (x 1 ), (x 1 )2 , and (x 1 )3 , respectively.
In the first case, J is a scalar matrix, and A = P JP 1 = J, so this is recognisable immediately.
In the second case, there are two Jordan chains, one of length 2 and one of length 1. For the
first, we choose v2 with (A 1 I3 )v2 6= 0, and let v1 = (A 1 I3 )v2 . (This case is easier
than the case illustrated in Example 4, because we have (A 1 I3 )2 v = 0 for all v C3,1 .)
For the second Jordan chain, we choose v3 to be an eigenvector for 1 such that v2 and v3
are linearly independent.
0
2
1
Example 5. A = 1 3 1 . Then
1
2
0
cA (x) = x[(3 + x)x + 2] 2(x + 1) 2 + (3 + x) = x3 3x2 3x 1 = (1 + x)3 .
We have
1
2
1
A + I3 = 1 2 1 ,
1
2
1
1 1
0
P = 1 0
1
1 0 2
and then
1
1
0
P 1 AP = 0 1
0 .
0
0 1
In the third case, there is a single Jordan chain, and we choose v3 such that (A1 I3 )2 v3 6= 0,
v2 = (A 1 I3 )v3 , v1 = (A 1 I3 )2 v3 .
0
1
0
1 . Then
Example 6. A = 1 1
1
0 2
cA (x) = x[(2 + x)(1 + x)] (2 + x) + 1 = (1 + x)3 .
13
We have
1 1
0
0
1
1
1 , (A + I3 )2 = 0 1 1 ,
A + I3 = 1 0
1 0 1
0
1
1
1 1 0
P = 1 0 1
1 0 0
and then
2.7
1
1
0
P 1 AP = 0 1
1 .
0
0 1
For dimensions higher than 3, we cannot always determine the JCF just from the characteristic and minimal polynomials. For example, when n = 4, J,2 J,2 and J,2 J,1 J,1
both have cA = ( x)4 and A = (x )2 ,
In general, we can compute the JCF from the dimensions of the generalised eigenspaces.
Let J,k be a Jordan block and let A = J,k Ik . Then we calculate that, for 1 i < k,
Ai has (k i) 1s on the i-th diagonal upwards from the main diagonal, and Ak = 0. For
example, when k = 4,
0 0 0 1
0 0 1 0
0 1 0 0
0 0 1 0
, A2 = 0 0 0 1 , A3 = 0 0 0 0 , A4 = 0.
A=
0 0 0 0
0 0 0 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
It should be clear from this that, for 1 i k, rank(Ai ) = k i, so nullity(Ai ) = i, and for
i k, rank(Ai ) = 0, nullity(Ai ) = k.
On the other hand, if 6= and A = J,k Ik , then, for any integer i, Ai is an upper
triangular matrix with non-zero entries ( )i on the diagonal, and so rank(Ai ) = k,
nullity(Ai ) = 0.
It is easy to see that, for square matrices A and B, rank(A B) = rank(A) + rank(B) and
nullity(A B) = nullity(A) + nullity(B). So, for a matrix J in JCF, we can determine the
sizes of the Jordan blocks for an eigenvalue of J from a knowledge of the nullities of the
matrices (J )i for i > 0.
For example, suppose that J = J2,3 J2,3 J2,1 J1,2 . Then nullity(J + 2I9 ) = 3,
nullity(J + 2I9 )2 = 5, nullity(J + 2I9 )i = 7 for i 3, nullity(J I9 ) = 1, nullity(J I9 )i = 2
for i 2.
First observe that the total number of Jordan blocks with eigenvalue is equal to
nullity(J In ).
More generally, the number of Jordan blocks J,j for with j i is equal to
nullity((J In )i ) nullity((J In )i1 ).
14
Theorem 2.14 Let be an eigenvalue of a matrix A and let J be the JCF of A. Then
(i) The number of Jordan blocks of J with eigenvalue is equal to nullity(A In ).
(ii) More generally, for i > 0, the number of Jordan blocks of J with eigenvalue and degree
at least i is equal to nullity((A In )i ) nullity((A In )i1 ).
Note that this proves the uniqueness part of Theorem 2.9.
2.8
Examples
2
0
0
0
0 2
1
0
. Then cA (x) = (2 x)4 , so there is a single
Example 7. A =
0
0 2
0
1
0 2 2
0 0
0 0
0 0
1 0
, and (A + 2I4 )2 = 0,
eigenvalue 2 with multiplicity 4. We find (A + 2I4 ) =
0 0
0 0
1 0 2 0
2
so A = (x + 2) , and the JCF of A could be J2,2 J2,2 or J2,2 J2,1 J2,1 .
To decide which case holds, we calculate the nullity of A + 2I4 which, by Theorem 2.14, is
equal to the number of Jordan blocks with eigenvalue 2. Since A+2I4 has just two non-zero
rows, which are distinct, its rank is clearly 2, so its nullity is 4 2 = 2, and hence the JCF
of A is J2,2 J2,2 .
A Jordan basis consists of a union of two Jordan chains, which we will call v1 , v2 , and
v3 , v4 , where v1 and v3 are eigenvectors and v2 and v4 are generalised eigenvectors of index
2. To find such chains, it is probably easiest to find v2 and v4 first and then to calculate
v1 = (A + 2I4 )v2 and v3 = (A + 2I4 )v4 .
Although it is not hard to find v2 and v4 in practice, we have to be careful, because they
need to be chosen so that no linear combination of them lies in the nullspace of (A + 2I4 ).
In fact, since this nullspace is spanned by the second and fourth standard basis vectors, the
obvious choice is v2 = (1 0 0 0)T , v4 = (0 0 1 0)T , and then v1 = (A + 2I4 )v2 = (0 0 0 1)T ,
v3 = (A + 2I4 )v4 = (0 1 0 2)T , so to transform A to JCF, we put
2
1
0
0
0 2 0 1
0 1
0 0
0 0
0
0
1 0
.
, P 1 = 1 0 0 0 , P 1 AP = 0 2
P =
0 0
0 1 0 0
0
0 2
1
0 1
0 0 1 0
0
0
0 2
1 0 2 0
1 3 1
0
0
2
1
0
. Then cA (x) = (1 x)2 (2 x)2 , so there are two
Example 8. A =
0
0
2
0
0
3
1 1
eigenvalue 1, 2, both with multiplicity 2. There are four possibilities for the JCF (one or
two blocks for each of the two eigenvalues). We could determine the JCF by computing the
minimal polynomial A but it is probably easier to compute the nullities of the eigenspaces
15
0 3 1 0
3 3 1
0
9
9
0
3
1
0
0
1
0
0
0
, (A2I4 ) =
, (A2I4 )2 =
A+I4 =
0
0
3 0
0
0
0
0
0
0
0
3
1 0
0
3
1 3
0 9
0
0
0
0
The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two Jordan blocks
with eigenvalue 1. The three non-zero rows of (A 2I4 ) are linearly independent, so its
rank is 3, hence its nullity 1, so there is just one Jordan block with eigenvalue 2, and the JCF
of A is J1,1 J1,1 J2,2 .
For the two Jordan chains of length 1 for eigenvalue 1, we just need two linearly independent
eigenvectors, and the obvious choice is v1 = (1 0 0 0)T , v2 = (0 0 0 1)T . For the Jordan
chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace of (A 2I4 )2 but not in
the nullspace of A 2I4 . (This is why we calculated (A 2I4 )2 .) An obvious choice here is
v4 = (0 0 1 0)T , and then v3 = (1 1 0 1)T , and to transform A to JCF, we put
1
1 0 0
1
0 0 0
1 0 1 0
0 0
1 0
, P 1 = 0 1 0 1 , P 1 AP = 0 1 0 0 .
P =
0 0
0
1 0 0
0
0 2 1
0 1
0 1
1 0
0
0 1 0
0
0 0 2
2.9
0
0
.
0
9
Each of i (T In )(wi ) for 1 i l is the last member of one of the l Jordan chains for
TU . When we apply (T In ) to one of the basis vectors of U , we get a linear combination
of the basis vectors of U other than i (T In )(wi ) for 1 i l. Hence, by the linear
independence of the basis of U , we deduce that i = 0 for 1 i l This implies that
(T In )(x) = 0, so x is in the eigenspace of TU for the eigenvalue . But, by construction,
wl+1 , . . . , wnm extend a basis of this eigenspace of TU to that the eigenspace of V , so we
also get i = 0 for l + 1 i n m, which completes the proof.
2.10
Powers of matrices
There are two practical ways of computing An for a general matrix. The first one involves
Jordan forms. If J = P 1 AP is the JCF of A then it is sufficient to compute J n because of
the telescoping product:
An = (P JP 1 )n = P JP 1 P JP 1 P . . . JP 1 = P J n P 1 .
Jk1 ,1
0
...
0
Jk1 ,1
0
...
0
0
Jkn2 ,2 . . .
0
Jk2 ,2 . . .
0
then J n = 0
.
If J =
...
...
n
0
0
. . . Jkt ,t
0
0
. . . Jkt ,t
n nk+2 C n nk+1
nn1 . . . Ck2
k1
n nk+3 C n nk+2
0
n
. . . Ck3
k2
..
..
..
..
n
= ...
Jk,
.
.
.
.
n
n1
0
0
...
n
n
0
0
...
0
where Ctn = n!/(n t)!t! is the Choose-function, interpreted as Ctn = 0 whenever t > n.
Let us apply it to the matrix from example 7 in 2.8:
n
0 1
0 0
2
1
0
0
0
0
0
1
0
0
2
0
0
1
An = P J n P 1 =
0 0
0 1
0
0 2
1 0
1 0 2 0
0
0
0 2
0
0 1
0 0
0
(2)n n(2)n1
0
0
n
0 0
1
0
1
0
(2)
0
0
0 0
0 1 0
0
(2)n n(2)n1 0
1 0 2 0
0
0
0
(2)n
0
(2)n
0
0
0
n n(2)n1
0
(2)
0
.
n
0
0
(2)
0
n(2)n1
0
n(2)n (2)n
2
0
1
0
0
0
0
1
2
0
1
0
0
0
0
1
1
0
=
0
0
1
0
=
0
0
(2)n
0
0
0
0 (2)n n(2)n1
0
.
An = n(2)n1 A + (1 n)(2)n I =
n
0
0
(2)
0
n(2)n1
0
n(2)n (2)n
2.11
Let us consider an initial value problem for an autonomous system with discrete time:
x(n + 1) = Ax(n), n N, x(0) = w.
Here x(n) K m is a sequence of vectors in a vector space over a field K. One thinks of x(n)
as a state of the system at time n. The initial state is x(0) = w. The n n-matrix A with
coefficients in K describes the evolution of the system. The adjective autonomous means
that the evolution equation does not change with the time8 .
It takes longer to formulate this problem then to solve it. The solution is a no-brainer:
x(n) = Ax(n 1) = A2 x(n 2) = . . . = An x(0) = An w.
As a working example, let us consider a 2-step linearly recursive sequence. It is determined
by a quadruple (a, b, c, d) K 4 and the rules
s0 = a, s1 = b, sn = csn1 + dsn2 for n 2.
Such sequences are ubiquitous. Arithmetic sequences form a subclass with c = 2, d = 1.
In general, (a, b, 2, 1) determines the arithmetic sequence starting at a with the difference
b a. For instance, (0, 1, 2, 1) determines the sequence of natural numbers sn = n.
A geometric sequence starting at a with ratio q admits a non-unique description. One obvious quadruples giving it is (a, aq, q, 0). However, it is conceptually better to use quadruple
(a, aq, 2q, q 2 ) because the sequences coming from (a, b, 2q, q 2 ) include both arithmetic and
geometric sequences and can be called arithmo-geometric sequences.
If c = d = 1 then this is a Fibonacci type sequence. For instance, (0, 1, 1, 1) determines
Fibonacci numbers Fn while (2, 1, 1, 1) determines Lucas numbers Ln .
7
8
18
All of these examples admit closed9 formulae for a generic term sn . Can we find a closed
formula for sn , in general? Yes, we can because this problem is reduced to an initial value
problem with discrete time if we set
sn
a
0 1
x(n) =
, w=
, A=
.
sn+1
b
c d
Computing
polynomial, cA (z) = z 2 cz d. If c2 + 4d = 0, the JCF of
the characteristic
c/2 1
A is J =
. Let q = c/2. Then d = q 2 and we are dealing with the arithmo0 c/2
geometric sequence (a,b, 2q, q 2 ). Let us find the closed formula for sn in
this caseusing
0
1
0
1
Jordan forms. As A =
one can choose the Jordan basis e2 =
, e1 =
.
q 2 2q
1
q
1 0
1 0
If P =
then P 1 =
and
q 1
q 1
n
q nq n1
(1 n)q n
nq n1
n
1 n
n 1
1
A = (P JP ) = P J P = P
P =
.
0
qn
nq n+1 (1 + n)q n
This gives the closed formula for arithmo-geometric sequence we were seeking:
sn = (1 n)q n a + nq n1 b.
c2 + 4d)/2
0
and the closed formula
If
6= 0, the JCF of A is
0
(c c2 + 4d)/2
for sn will involve the sum of two geometric sequences. Let us see it through for Fibonacci
and Lucas numbers using Lagranges polynomial.
Since c = d =1, c2 + 4d = 5 and the roots
=
h()
=
+
(1 )n = h(1 ) = (1 ) +
n1 / 5
n / 5
n
A = A + = n / 5A + n1 / 5I2 =
.
n / 5 (n + n1 )/ 5
Fn
0
n
=A
Since
, it immediately implies that
Fn+1
1
Fn1 Fn
n
A =
and Fn = n / 5 .
Fn Fn+1
Ln
2
n
and
Similarly for the Lucas numbers, we get
=A
1
Ln+1
(c +
Closed means non-recursive, for instance, sn = a + n(b a) for the arithmetic sequence
19
2.12
Functions of matrices
P
We restrict to K = R in this section. Let us consider a power series n an z n , an R with a
positive radius of convergence . It defines a function f : (, ) R by
f (x) =
an x n
n=0
X
f [n](0)An .
f (A) =
n=0
The right hand side of this formula is a matrix whose entries are series. All these series need
to converge for f (A) to be well defined. If the norm10 of A is less than then f (A) is well
defined. Alternatively, if all eigenvalues of A belong to (, ) then f (A) is well defined as
can be seen from the JCF method of computing f (A). If
Jk1 ,1
0
...
0
0
Jk2 ,2 . . .
0
= P 1 AP
J =
...
0
0
. . . Jkt ,t
f (A) = P f (J)P 1
while
f (Jk1 ,1 )
0
0
f (Jk2 ,2 )
=P
0
0
...
0
...
0
...
. . . f (Jkt ,t )
f () f [1] () . . . f [k1]()
0
f () . . . f [k2]()
.
f (Jk, ) =
...
0
0
...
f ()
Lagranges method of computing f (A) works as well despite the fact that there is no sensible
way to divide with a remainder in analytic functions. For instance,
ez =
ez 1
ez z
ez
(z)
+
0
=
(z)
+
1
=
(z) + z
z2 + 1
z2 + 1
z2 + 1
for (z) = z 2 + 1. Thus, there are infinitely many ways to divide with a remainder as
f (z) = q(z)(z) + h(z). The point is that f (A) = h(A) only if q(A) is well defined. Notice
that the naive expression q(A) = (f (A) h(A))(A)1 involves division by zero. However,
if h(z) is the interpolation polynomial then q(A) is well defined and the calculation f (A) =
q(A)(A) + h(A) = h(A) carries through.
Let us compute eA for
the matrix A from example 7, section 2.8. Recall that Taylors series
P
x =
n
for exponent
e
n=0 x /n! converges for all x. Consequently the matrix exponent
P
n
eA =
n=0 A /n! is defined for all real m-by-m matrices.
10
this notion is beyond the scope of this module and will be discussed in Differentiation
20
e
0
0
0
0 e2
e2
0
.
eA = e2 A + 3e2 I =
2
0
0
e
0
e2
0 2e2 e2
2.13
Let us now consider an initial value problem for an autonomous system with continuous time:
dx(t)
= Ax(t), t [0, ), x(0) = w.
dt
Here A Rnn , w Rn are given, x : R0 Rn is a smooth function to be found. One
thinks of x(t) as a state of the system at time t. The solution to this problem is
x(t) = etA w
because, as one can easily check,
X d tn
X tn1
X tk
d
(x(t)) =
( An w) =
An w = A
Ak w = Ax(t).
dt
dt
n!
(n
1)!
k!
n
n
k
Let us consider a harmonic oscillator described by equation y (t) + y(t) = 0. The general
solution y(t) = sin(t) + cos(t) is well known. Let us obtain it using matrix exponents.
Setting
y(t)
0 1
x(t) =
, A=
1 0
y (t)
the harmonic oscillator becomes the initial value problem with a solution x(t) = etA x(0).
The eigenvalues of A are i and i. Interpolating ezt at these values of z gives the following
condition on h(z) = z +
it
e
= h(i) =
i +
it
e
= h(i) = i +
Solving them gives = (eit eit )/2i = sin(t) and = (eit + eit )/2 = cos(t). It follows that
cos(t) sin(t)
etA = sin(t)A + cos(t)I2 =
sin(t) cos(t)
y1 3y3
y1 =
y1 (0) = 1
y2 =
y1 y2 6y3 , with the initial condition
y2 (0) = 1
y3 = y1 + 2y2 + 5y3
y3 (0) = 0
Using matrices
1
0 3
y1 (t)
1
x(t) = y2 (t) , w = 1 , A = 1 1 6 ,
0
1
2
5
y3 (t)
21
3t 3 6t + 6 9t + 6
4 6 6
etA = e2t 3t 2 6t + 4 9t + 3 + et 2 3 3
t
2t
3t + 1
0
0
0
and
y1 (t)
1
(3 3t)e2t 2et
x(t) = y2 (t) = etA 1 = (2 3t)e2t et .
y3 (t)
te2t
0
3.1
v=
. K , and w = . K .
.
.
xn
ym
Then, by using the equations (i) and (ii) above, we get
m X
m X
n
n
X
X
yi ij xj = wT Av
yi (fi , ej ) xj =
(w, v) =
R2
Then
((y1 , y2 ), (x1 , x2 )) = (y1 , y2 )
(2.1)
i=1 j=1
i=1 j=1
1 1
2
0
22
x1
x2
= y1 x1 y1 x2 + 2y2 x1 .
1 1
.
2
0
3.2
We retain the notation of the previous subsection. As in Subsection 1.2 above, suppose that
of W , and let P = ( ) and Q = ( ) be
we choose new bases e1 , . . . , en of V and f1 , . . . , fm
ij
ij
the associated basis change matrices. Then, by Proposition 1.1, if v and w are the column
vectors representing the vectors v and w with respect to the bases {ei } and {fi }, we have
P v = v and Qw = w, and so
T
wT Av = w QT AP v ,
and hence, by Equation (2.1):
Theorem 3.1 Let A be the matrix of the bilinear map : W V K with respect to the
bases e1 , . . . , en and f1 , . . . , fm of V and W , and let B be its matrix with respect to the bases
of V and W . Let P and Q be the basis change matrices, as defined
e1 , . . . , en and f1 , . . . , fm
T
above. Then B = Q AP .
Compare this result with Theorem 1.2.
We shall be concerned from now on only with the case where V = W . A bilinear map
: V V K is called a bilinear form on V . Theorem 3.1 then becomes:
Theorem 3.2 Let A be the matrix of the bilinear form on V with respect to the basis
e1 , . . . , en of V , and let B be its matrix with respect to the basis e1 , . . . , en of V . Let P the
basis change matrix with original basis {ei } and new basis {ei }. Then B = P T AP .
So, in the example at
if we choose the new basis e1 = (1 1),
of Subsection
3.1,
the end
0 1
1 1
, and
, P T AP =
e2 = (1 0) then P =
2
1
1 0
((y1 e1 + y2 e2 , x1 e1 + x2 e2 )) = y1 x2 + 2y2 x1 + y2 x2 .
Definition. Matrices A and B are called congruent if there exists an invertible matrix P
with B = P T AP .
Definition. A bilinear form on V is called symmetric if (w, v) = (v, w) for all v, w V .
An n n matrix A is called symmetric if AT = A.
Proposition 3.3 The bilinear form is symmetric if and only if its matrix (with respect to
any basis) is symmetric.
The best known example is when V = Rn , and is defined by
((x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn )) = x1 y1 + x2 y2 + + xn yn .
This form has matrix equal to the identity matrix In with respect to the standard basis of
Rn . Geometrically, it is equal to the normal scalar product (v, w) = |v||w| cos , where is
the angle between the vectors v and w.
3.3
A quadratic form on the standard vector space K n is a polynomial function of several variables
x1 , . . . xn in which each term has total degree two, such as 3x2 + 2xz + z 2 4yz + xy. One
23
0.8
0.6
0.4
y
0.2
0.8
0.6
0.4
0.2
0.2
0.4
x
0.6
0.8
0.2
0.4
0.6
0.8
This represents an ellipse, in which the two principal axes are at an angle of /4 with the
x- and y-axes. To study such curves in general, it is desirable to change variables (which
will turn out to be equivalent to a change of basis) so as to make the principal axes of the
ellipse coincide with the x- and y-axes. This is equivalent to eliminating the xy-term in the
equation. We can do this easily by completing the square.
In the example
5x2 + 5y 2 6xy = 2 5(x 3y/5)2 9y 2 /5 + 5y 2 = 2 5(x 3y/5)2 + 16y 2 /5 = 2
so if we change variables, and put x = x 3y/5 and y = y, then the equation becomes
5x2 + 16y 2 /5 = 2 (see Fig. 2).
Here we have allowed an arbitrary basis change. We shall be studying this situation in
Subsection 3.5.
One disadvantage of doing this is that the shape of the curve has become distorted. If we wish
to preserve the shape, then we should restrict our basis changes to those that preserve distance
and angle. These are called orthogonal basis changes, and we shall be studying that situation
24
0.8
0.6
0.4
y
0.2
0.6
0.4
0.2
0.2
0.4
0.6
0.2
0.4
0.6
0.8
0.4
y
0.2
0.5
0.2
0.4
Figure 3: x2 + 4y 2 = 1
25
0.5
x
3.4
Definition. Let V be a vector space over the field K. A quadratic form on V is a function
q : V K that is defined by q(v) = (v, v), where : V V K is a bilinear form.
As this is the official definition of a quadratic form we will use, we do not really need to
observe that it yields the same notion for the standard vector space K n as the definition in
the previous section. However, it is a good exercise that an inquisitive reader should definitely
do. The key is to observe that the function xi xj comes from the bilinear form i,j such that
i,j (ei , ej ) = 1 and zero elsewhere.
In Proposition 3.4 we need to be able to divide by 2 in the field K. This means that we must
assume11 that 1 + 1 6= 0 in K. For example, we would like to avoid the field of two elements.
If you prefer to avoid worrying about such technicalities, then you can safely assume that K
is either Q, R or C.
Let us consider the following three sets. The first set Q(V, K) consists of all quadratic forms
on V . It is a subset of the set of all functions from V to K. The second set Bil(V V, K)
consists of all bilinear forms on V . It is a subset of the set of all functions from V V to K.
Finally, we need Sym(V V, K), the subset of Bil(V V, K) consisting of symmetric bilinear
forms.
There are two interesting functions connecting these sets. We have already defined a square
function : Bil(V V, K) Q(V, K) by ( )(v) = (v, v). The second function :
Q(V, K) Bil(V V, K) is a polarisation12 defined by (q)(u, v) = q(u + v) q(u) q(v).
Proposition 3.4 The following statements hold for all q Q(V, K) and Sym(V
V, K):
(i)
(ii)
(iii)
(iv)
Let A = (ij ) be the matrix of with respect to this basis. We will also call A the matrix
of q with respect to this basis. Then A is symmetric because is, and by Equation (2.1) of
Subsection 3.1, we have
11
Fields with 1 + 1 = 0 are fields of characteristic 2. One can actually do quadratic and bilinear forms over
them but the theory is quite specific. It could be a good topic for a second year essay.
12
Some authors call it linearisation.
13
There is a precise mathematical way of defining natural using Category Theory but it is far beyond the
scope of this course. The only meaning we can endow this word with is that we do not make any choices for
this bijection.
26
q(v) = v Av =
n
n X
X
xi ij xj =
n
X
i=1
i=1 j=1
ii x2i
+2
i1
n X
X
ij xi xj .
(3.1)
i=1 j=1
Conversely, if we are given a quadratic form as in the right hand side of Equation (3.1), then it
2
2
2
is easy to write
down its matrix
A. For example, if n = 3 and q(v) = 3x +y 2z +4xy xz,
3 2 1/2
then A =
2 1
0 .
1/2 0
2
3.5
Our general aim is to make a change of basis so as to eliminate the terms in q(v) that involve
xi xj for i 6= j, leaving only terms of the form ii x2i . In this section, we will allow arbitrary
basis changes; in other words, we allow basis change matrices P from the general linear group
GL(n, K). It follows from Theorem 3.2 that when we make such a change, the matrix A of
q is replaced by P T AP .
As with other results in linear algebra, we can formulate theorems either in terms of abstract
concepts like quadratic forms, or simply as statements about matrices.
Theorem 3.5 Assume that 1 + 1 6= 0 K.
1n
12
x2 + . . .
xn )2 + q1 (v),
11
11
Case 2. 11 = 0 but ii 6= 0 for some i > 1. In this case, we start by interchanging e1 with
ei (or equivalently x1 with xi ), which takes us back to Case 1.
13
1 12
. . . 1n
11
11
11
x1
x1
0
1
0
.
.
.
0
x2
x2
0
0
1 ...
0
.
. =
or equivalently
...
.
.
...
xn
xn
0
0
0 ...
1
13
12
11
. . . 1n
1 11
11
x1
x1
1
0 ...
0
x2 0
x2
0
0
1 ...
0
. =
. .
...
.
.
...
xn
xn
0
0
0 ...
1
13
11
. . . 1n
1 12
11
11
0
1
0 ...
0
P = 0
0
1 ...
0
,
...
0
0
0 ...
1
so by Proposition 1.1, P is the basis change matrix with original basis {ei } and new basis
{ei }.
0 1/2 5/2
Example. Let n = 3 and q(v) = xy + 3yz 5xz, so A = 1/2
0
3/2 .
5/2 3/2
0
Since we are using x, y, z for our variables, we can use x1 , y1 , z1 (rather than x , y , z ) for the
variables with respect to a new basis, which will make things typographically simpler!
We are in Case 3 of the proof above, and so we start with a coordinatechange x =x1 + y1 ,
1
1 0
We are now in Case 1 of the proof above, and the next basis change, from completing the
square, is x2 = x1 z1 , y2 = y1 , z2 = z1 , or equivalently,
x
1 = x2 + z2 , y1 = y2 , z1 = z2 , and
1 0 1
then the associated basis change matrix is P2 = 0 1 0 , and q(v) = x22 y22 8y2 z2 z22 .
0 0 1
We now proceed by induction on the 2-coordinate form in y2 , z2 , and completing the square
again leads to the basis change
x3 =
x2 , y3 = y2 + 4z2 , z3 = z2 , which corresponds to the
1 0
0
basis change matrix P3 = 0 1 4 , and q(v) = x23 y32 + 15z32 .
0 0
1
The total basis change in moving from the original basis with coordinates x, y, z to the final
28
1
1 3
P = P1 P2 P3 = 1 1
5 ,
0
0
1
1
0 0
T
r = rank(q).
can then make a further coordinates change xi = ii xi (1 i r), giving
Pr We
q(v) = i=1 (xi )2 . Hence we have proved:
Proposition 3.6 A quadratic form q over C has the form q(v) =
suitable basis, where r = rank(q).
Pr
2
i=1 xi
with respect to a
Equivalently, given a symmetric matrix A Cn,n , there is an invertible matrix P Cn,n such
that P T AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i r, ii = 0 for
r + 1 i n, and r = rank(A).
When K = R, we cannot take square roots of negative numbers, but we can replace each
positive i by 1 and each negative i by 1 to get:
Proposition
P
Pu3.7 2(Sylvesters Theorem) A quadratic form q over R has the form q(v) =
t
2
x
i=1 i
i=1 xt+i with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a symmetric matrix A Rn,n , there is an invertible matrix P Rn,n such
that P T AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i t, ii = 1
for t + 1 i t + u, and ii = 0 for t + u + 1 i n, and t + u = rank(A).
We shall now prove that the numbers t and u of positive and negative terms are invariants
of q. The difference t u between the numbers of positive and negative terms is called the
signature of q.
Theorem 3.8 Suppose that q is a quadratic form over the vector space V over R, and that
e1 , . . . , en and e1 , . . . , en are two bases of V with associated coordinates xi and xi , such that
q(v) =
t
X
i=1
Then t = t and u = u .
x2i
u
X
x2t+i
t
X
i=1
i=1
29
(xi )2
u
X
(xt +i )2 .
i=1
3.6
It is clear that this is the case if and only if t = n and u = 0 in Proposition 3.7; that is, if
q has rank and signature n. In P
this case, Proposition 3.7 says that there is a basis {ei } of
V with respect to which q(v) = ni=1 x2i or, equivalently, such that the matrix A of q is the
identity matrix In .
The associated symmetric bilinear form is also called positive definite when q is. If we use
a basis such that A = In , then is just the standard scalar (or inner) product on V .
Definition. A vector space V over R together with a positive definite symmetric bilinear
form is called a Euclidean space.
We shall assume from now on that V is a Euclidean space, and that the basis e1 , . . . , en has
been chosen such that the matrix of is In . Since is the standard scalar product, we shall
write v w instead of (v, w).
Note that v w = vT w where, as usual, v and w are the column vectors associated with v
and w.
For v V , define |v| = v v. Then |v| is the length of v. Hence the length, and also the
cosine v w/(|v||w|) of the angle between two vectors can be defined in terms of the scalar
product.
Definition. A linear operator T : V V is said to be orthogonal if it preserves the scalar
product on V . That is, if T (v) T (w) = v w for all v, w V .
Since length and angle can be defined in terms of the scalar product, an orthogonal linear
operator preserves distance and angle, so geometrically it is a rigid map. In R2 , for example,
an orthogonal operator is a rotation about the origin or a reflection about a line through the
origin.
If A is the matrix of T , then T (v) = Av, so T (v) T (w) = vT AT Aw, and hence T is
orthogonal if and only if AT A = In , or equivalently if AT = A1 .
Definition. An n n matrix is called orthogonal if AT A = In .
So we have proved:
Incidentally, the fact that AT A = In tells us that A and hence T is invertible, and so we have
also proved:
Proposition 3.10 An orthogonal linear operator is invertible.
Let c1 , c2 , . . . , cn be the columns of the matrix A. As we observed in Subsection 1.1, ci is
equal to the column vector representing T (ei ). In other words, if T (ei ) = fi then fi = ci .
Since the (i, j)-th entry of AT A is cT
i cj = fi fj , we see that T and A are orthogonal if and
14
only if
fi fi = i,j , 1 i, j n.
()
Definition. A basis f1 , . . . , fn of V that satisfies () is called orthonormal.
By Proposition 3.10, an orthogonal linear operator is invertible, so T (ei ) (1 i n) form a
basis of V , and we have:
Proposition 3.11 A linear operator T is orthogonal if and only if T (e1 ), . . . , T (en ) is an
orthonormal basis of V .
cos sin
Example For any R, let A =
. (This represents a counter-clockwise
sin
cos
rotation through an angle .) Then it is easily checked that AT A = AAT = I2 . Notice that
the columns of A are mutually orthogonal vectors of length 1, and the same applies to the
rows of A.
The following theorem tells us that we can always complete an orthonormal set of vectors to
an orthonormal basis.
Theorem 3.12 (Gram-Schmidt) Let V be a Euclidean space of dimension n, and suppose
that, for some r with 0 r n, f1 , . . . , fr are vectors in V that satisfy the equations () for
1 i, j r. Then f1 , . . . , fr can be extended to an orthonormal basis f1 , . . . , fn of V .
P
Proof: We prove first that f1 , . . . , fr are linearly independent. Suppose that ri=1 xi fi = 0
for some x1 , . . . , xr R. Then, P
for each j with 1 j r, the scalar product of the left hand
side of this equation with fj is ri=1 xi fj fi = xj , by (). Since fj 0 = 0, this implies that
xj = 0 for all j, so the fi are linearly independent.
The proof of the theorem will be by induction on n r. We can start the induction with the
case n r = 0, when r = n, and there is nothing to prove. So assume that n r > 0; i.e.
that r < n. By a result from MA106, we can extend any linearly independent set of vectors
to a basis of V , so there is a basis f1 , . . . , fr , gr+1 , . . . , gn of V containing the fi . The trick is
to define
r
X
If we take the scalar product of this equation by fj for some 0 j r, then we get
fj fr+1
= fj gr+1
r
X
(fi gr+1 )(fj fi )
i=1
and then, by (), fj fi is non-zero only when j = i, so the sum on the right hand side
We are using Kroneckers delta symbol in the next formula. It is just the identity matrix Im = (i,j ) of
sufficiently large size. In laymans terms, i,i = 1 and i,j = 0 if i 6= j.
31
is non-zero by linear independence of the basis, and if we define fr+1 =
The vector fr+1
fr+1 /|fr+1 |, then we still have fj fr+1 = 0 for 1 j r, and we also have fr+1 .fr+1 = 1.
Hence f1 , . . . , fr+1 satisfy the equations (), and the result follows by inductive hypothesis. 2
Recall from MA106 that if T is a linear operator with matrix A, and v is a non-zero vector
such that T (v) = v (or equivalently Av = v), then is called an eigenvalue and v an
associated eigenvector of T and A. It was proved in MA106 that the eigenvalues are the roots
of the characteristic equation det(A xIn ) = 0 of A.
Proposition 3.13 Let A be a real symmetric matrix. Then A has an eigenvalue in R, and
all complex eigenvalues of A lie in R.
Proof: (To simplify the notation, we will write just v for a column vector v in this proof.)
The characteristic equation det(A xIn ) = 0 is a polynomial equation of degree n in x, and
since C is an algebraically closed field, it certainly has a root C, which is an eigenvalue
for A if we regard A as a matrix over C. We shall prove that any such lies in R, which will
prove the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of replacing all
entries of v or B by their complex conjugates. Since the entries of A lie in R, we have A = A.
Let v be a complex eigenvector associated with . Then
Av = v
(1)
(2)
vT A = vT ,
(3)
Proof: We start with a general remark about orthogonal basis changes. The matrix q
represents a quadratic form on V with respect to the initial orthonormal basis e1 , . . . , en of
V , but it also represents a linear operator T : V V with respect to the same basis. When
we make an orthogonal basis change with original basis e1 , . . . , en and a new orthonormal
basis f1 , . . . , fn with the basis change matrix P , then P is orthogonal, so P T = P 1 and
32
and
Av2 = 2 v2 (2).
and similarly
Transposing (4) gives v1T Av2 = 2 v1T v2 and subtracting (3) from this gives (2 1 )v1T v2 = 0.
2
Since 2 1 6= 0 by assumption, we have v1T v2 = 0.
1 3
Example 1. Let n = 2 and q(v) = x2 + y 2 + 6xy, so A =
. Then
3 1
det(A xI2 ) = (1 x)2 9 = x2 2x 8 = (x 4)(x + 2),
so the eigenvalues of A are 4 and 2. Solving Av = v for = 4 and 2, we find corresponding eigenvectors (1 1)T and (1 1)T . Proposition 3.15 tells us that these vectors
are orthogonal to each other (which we can of course check directly!), so if we divide them
1 T
) then we get an
by their lengths to give vectors of length 1, giving ( 12 12 )T and ( 12
2
33
1
2
1
3 2
1
6 2 .
so A = 2
1 2
3
so the eigenvalues are 2 (repeated) and 8. For the eigenvalue 8, if we solve Av = 8v then we
find a solution v = (1 2 1)T . Since 2 is a repeated eigenvalue, we need two corresponding
eigenvectors, which must be orthogonal to each other. The equations Av = 2v all reduce
to x 2y + z = 0, and so any vector (x, y, z)T satisfying this equation is an eigenvector for
= 2. By Proposition 3.15 these eigenvectors will all be orthogonal to the eigenvector for
= 8, but we will have to choose them orthogonal to each other. We can choose the first one
arbitrarily, so lets choose (1 0 1)T . We now need another solution that is orthogonal to
this. In other words, we want x, y and z not all zero satisfying x 2y + z = 0 and x z = 0,
and x = y = z = 1 is a solution. So we now have a basis (1 2 1)T , (1 0 1)T , (1 1 1)T of
three mutually orthogonal eigenvectors. To
basis, we just need to divide
an orthonormal
get
by their lengths, which are, respectively, 6, 2, and 3, and then the basis change matrix
P has these vectors as columns, so
1/ 2 1/3
1/6
P = 2/6
0 1/3 .
1/ 6 1/ 2 1/ 3
It can then be checked that P T P = I3 and that P T AP is the diagonal matrix with entries
8, 2, 2.
3.7
3.7.1
i x2i +
n X
i1
X
ij xi xj +
n
X
i xi + = 0.
i=1
i=1 j=1
This defines a quadric hypersurface15 in n-dimensional Euclidean space. To study the possible shapes of the curves and surfaces defined, we first simplify this equation by applying
coordinate changes resulting from isometries of Rn .
By Theorem 3.14, we can apply an orthogonal basis change (that is, an isometry of Rn that
fixes the origin) which has the effect of eliminating the terms ij xi xj in the above sum.
15
34
Now, whenever i 6= 0, we can replace xi by xi i /(2i ), and thereby eliminate the term
i xi from the equation. This transformation is just a translation, which is also an isometry.
If i = 0, then we cannot eliminate the term i xi . Let us permute the coordinates such that
i 6= 0 for 1 i r, and i 6= 0 for r + 1 i r + s. Then if s > 1, by using Theorem 3.12,
we
Pscan find an orthogonal transformation that leaves
Pxs i unchanged for 1 i r and replaces
i=1 r+j xr+j by xr+1 (where is the length of
i=1 r+j xr+j ), and then we have only a
single non-zero i ; namely r+1 = .
Finally, if there is a non-zero r+1 = , then we can perform the translation that replaces
xr+1 by xr+1 /, and thereby eliminate .
We have now reduced to one of two possible types of equation:
r
X
i x2i + = 0
and
r
X
i x2i + xr+1 = 0.
i=1
i=1
We shall now consider the types of curves and surfaces that can arise in the familiar cases
n = 2 and n = 3. These different types correspond to whether the i are positive, negative
or zero, and whether = 0 or 1.
The case n = 2
(vii) x2 y 2 = 1. A hyperbola.
The case n = 3
When n = 3, we still get the nine possibilities (i) (ix) that we had in the case n = 2, but
now they must be regarded as equations in the three variables x, y, z that happen not to
involve z.
So, in Case (i), we now get the plane x = 0, in case (ii) we get two parallel planes x = 1/ ,
in Case (iv) we get the line x = y = 0 (the z-axis), in case (v) two intersecting planes
35
p
y = /x, and in Cases (vi), (vii) and (ix), we get, respectively, elliptical, hyperbolic
and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit x2 y 2 z 2 = 1, which is empty.
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of the form
x2 + y 2 = c, whereas the cross sections parallel to the other coordinate planes are generally
hyperbolas. Notice also that if a particular point (a, b, c) is on the surface, then so is t(a, b, c)
for any t R. In other words, the surface contains the straight line through the origin and
any of its points. Such lines are called generators. When each point of a 3-dimensional surface
lies on one or more generators, it is possible to make a model of the surface with straight
lengths of wire or string.
(xii) x2 + y 2 + z 2 = 1. An ellipsoid. See Fig. 5.
(xiii) x2 + y 2 z 2 = 1. A hyperboloid. See Fig. 6.
There are two types of 3-dimensional hyperboloids. This one is connected, and is known as
a hyperboloid of one sheet. Although it is not immediately obvious, each point of this surface
lies on exactly two generators; that is, lines that lie entirely on the surface. For each R,
the line defined by the pair of equations
p
p
x z = (1 y);
( x + z) = 1 + y.
lies entirely on the surface; to see this, just multiply the two equations together. The same
applies to the lines defined by the pairs of equations
p
p
y z = (1 x);
( y + z) = 1 + x.
It can be shown that each point on the surface lies on exactly one of the lines in each if these
two families.
(xiv) x2 y 2 z 2 = 1. A hyperboloid. See Fig. 7.
This one has two connected components and is called a hyperboloid of two sheets. It does not
have generators. Besides it is easy to observe that it is disconnected. Substitute x = 0 into
its equation. The resulting equation y 2 z 2 = 1 has no solutions. This means that the
hyperboloid does not intersect the plane x = 0. A closer inspection confirms that the two
parts of the hyperboloid lie one both sides of the plane: intersect the hyperboloid with the
line y = z = 0 to see two points on both sides.
(xv) x2 + y 2 z = 0. An elliptical paraboloid. See Fig. 8.
As in the case of the hyperboloid of one sheet, there are two generators passing through each
point of this surface, one from each of the following two families of lines:
p
p
( x ) y = z;
x + y = .
p
p
( x + ) y = z;
x y = .
3.8
The results in Subsections 3.1 - 3.6 apply vector spaces over the real numbers R. A naive
reformulation of some of the results to complex numbers fails. For instance, the vector
v = (i, 1)T C2 is isotropic, i.e. it has v T v = i2 + 12 = 0, which creates various difficulties.
36
4
2
z0
6
4
4
2
0x
y0
2
4
6
8
Figure 4: x2 /4 + y 2 z 2 = 0
2
1
z0
1
2
2
2
1
1
0x
y0
1
1
2
Figure 5: x2 + 2y 2 + 4z 2 = 7
37
z0
2
2
2
0x
1
2
y0
1
4
Figure 6: x2 /4 + y 2 z 2 = 1
6
4
2
z0
8
4
2
6
4
0x
2
2
4
y0
6
2
4
Figure 7: x2 /4 y 2 z 2 = 1
38
z2
2
0
2
0x
1
2
y0
1
2
Figure 8: z = x2 /2 + y 2
10
8
6
4
2
z0
2
4
6
8
10
4
4
2
2
0x
y0
2
2
4
Figure 9: z = x2 y 2
39
To build the theory for vector spaces over C we have to replace bilinear forms with sesquilinear16 form, that is, maps : W V C such that
(i) (1 w1 + 2 w2 , v) =
1 (w1 , v) +
2 (w2 , v) and
(ii) (w, 1 v1 + 2 v2 ) = 1 (w, v1 ) + 2 (w, v2 )
n
m X
X
yi (fi , ej ) xj =
n
m X
X
yi ij xj = w Av
i=1 j=1
i=1 j=1
Cn
These are the complex analogues of symmetric matrices and symmetric bilinear forms. The
following proposition is an analogue of Proposition 3.3.
Proposition 3.18 A sesquilinear form is hermitian if and only if its matrix is hermitian.
Hermitian matrices A and B are congruent if there exists an invertible matrix P with B =
P AP . A Hermitian quadratic form is a function q : V C given by q(v) = (v, v)
for some sesquilinear form . The following is a hermitian version of Sylvesters Theorem
(Proposition 3.7) and Inertia Law (Theorem 3.8) together.
P
P
Proposition 3.19 A hermitian quadratic form q has the form q(v) = ti=1 |xi |2 ui=1 |xt+i |2
with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a hermitian matrix A Cn,n , there is an invertible matrix P Cn,n such
that P AP = B, where B = (ij ) is a diagonal matrix with ii = 1 for 1 i t, ii = 1
for t + 1 i t + u, and ii = 0 for t + u + 1 i n, and t + u = rank(A).
40
Similarly to the real case, the difference t u is called the signature of q (or A). We say that
a hermitian matrix (or a hermitian form) is positive definite if its signature is equal to the
dimension of the space. By Proposition 3.19 a positive definite hermitian form looks like the
standard inner product on Cn in some choice of a basis. A hermitian vector space is a vector
space over C equipped with a hermitian positive definite form.
Definition. A linear operator T : V V on a hermitian vector space (V, ) is said to be
unitary if it preserves the form . That is, if (T (v), T (w)) = (v, w) for all v, w V .
Definition. A matrix A Cn,n is called unitary if A A = In .
In particular, all Hermitian and and all unitary matrices are normal. Consequently, all real
symmetric and real orthogonal matrices are normal.
Lemma 3.24 If A Cn,n is normal and P Cn,n is unitary then P AP is normal.
Proof: If B = P AP = B then using (BC) = C B we compute that BB = (P AP )(P AP ) =
P AP P A P = P AA P = P A AP = (P A P )(P AP ) = B B.
2
The following theorem is extremely useful as the general criterion for diagonalisability of
matrices.
41
Theorem 3.25 A matrix A Cn,n is normal if and only if there exists a unitary matrix
P Cn,n such that P AP is diagonal17 .
Proof: The if part follows from Lemma 3.24 as diagonal matrices are normal.
For the only if part we proceed by induction on n. If n = 1, there is nothing to prove. Let
us assume we have proved the statement for all dimensions less than n. Matrix A admits
an eigenvector v Cn with an eigenvalue . Let W be the vector subspace of all vectors x
satisfying Ax = x. If W = Cn then A is a scalar matrix and we are done. Otherwise, we
have a nontrivial18 decomposition Cn = W W where W = {v Cn | w W v w = 0}
Now choose orthonormal bases of W and W . Together they form a new orthonormal
of
basis
B
0
Cn . Change of basis matrix P is unitary, hence by Lemma 3.24 the matrix P AP =
0 C
is normal. It follows that the matrices B and C are normal of smaller size and we can use
induction assumption to complete the proof.
2
Theorem 3.25 is an extremely useful criterion for diagonalisability of matrices. To find P in
practice, we use similar methods to those in the real case.
6
2+2i
Example. A =
. Then
22i
4
cA (x) = (6 x)(4 x) (2+2i)(22i) = x2 10x + 16 = (x 2)(x 8),
so the eigenvalues are 2 and 8. (It can be shown that the eigenvalues of any Hermitian matrix
are real.) Corresponding eigenvectors are v1 = (1+i, 2)T and v2 = (1+i, 1)T . We find that
|v1 |2 = v1 v1 = 6 and |v2 |2 = 3, so we divide by their lengths to get an orthonormal basis
v1 /|v1 |, v2 /|v2 | of C2 . Then the matrix
!
P =
1+i
6
2
1+i
3
1
3
3.9
P AP
2 0
0 8
With all the linear algebra we know it is a little step aside to understand basics of quantum
mechanics. We discuss Shrodingers picture19 of quantum mechanics and derive (mathematically) Heisenbergs uncertainty principle.
The main ingredient of quantum mechanics is a hermitian vector space (V, <, >). There are
physical arguments showing that real Euclidean vector spaces are no good and that V must
be infinite-dimensional. Here we just take their conclusions at face value. The states of the
system are lines in V . We denote by [v] the line Cv spanned by v V . We use normalised
vectors, i.e., v such that < v, v >= 1 to present states as this makes formulae slightly easier.
It is impossible to observe the state of the quantum system but we can try to observe some
physical quantities such as momentum, energy, spin, etc. Such physical quantities become
observables, i.e., hermitian linear operators : V V . Hermitian in this context means
17
18
42
that < x, y >=< x, y > for all x, y V . Sweeping a subtle mathematical point under the
carpet20 , we assume that is diagonalisable with eigenvectors e1 , e2 , e3 , . . . and eigenvalues
1 , 2 , . . .. Proof of Proposition 3.22 goes through in the infinite dimensional case, so we
conclude that allPi belong to R. Back to Physics, if we measure on a state [v] with
normalised v = n n en then the measurement will return n as a result with probability
|n |2
One observable is energy H : V V , often called hamiltonian. It is central to the theory
because it determines the time evolution [v(t)] of the system by Shrodingers equation:
dv(t)
1
= Hv(t)
dt
i~
where ~ 1034 Joule per second21 is the reduced Planck constant. We know how to solve
this equation: v(t) = etH/i~ v(0).
As a concrete example, let us look at the quantum oscillator. The full energy of the classical
harmonic oscillator mass m and frequency is
h=
1
p2
+ m 2 x2
2m 2
where x is the position and p = mx is the momentum. To quantise it, we have to play
with this expression. The vector subspace of the space of all smooth functions C (R, C)
x2 /2 | f (x) C[x]}, which we make hermitian by
admits a convenient
R subspace V = {f (x)e
2 /2
(ex )(n) , n = 0, 1, 2 . . .
with eigenvalues n + 1/2 which are discrete energy levels of the quantum oscillator. Notice
that < k , n >= k,n 2n n! , so they are orthogonal but not orthonormal. The states [n ]
are pure states: they do not change with time and always give n + 1/2 as energy. If we take
a system in a state [v] where
X
1
v=
n 4 n/2 n
2
n!
n
is normalised then the measurement of energy will return n + 1/2 with probability |n |2
Notice that the measurement breaks the system!! It changes it to the state [n ] and all
future measurements will return the same energy!
20
If V were finite-dimensional we could have used Proposition 3.23. But V is infinite dimensional! To ensure
diagonalisability V must be complete with respect to the hermitian norm. Such spaces are called Hilbert
spaces. Diagonalisability is still subtle as eigenvectors do not span the whole V but only a dense subspace.
Furthermore, if V admits no dense countably dimensional subspace, further difficulties arise. . . Pandora box
of functional analysis is wide open, so let us try to keep it shut.
21
Notice the physical dimensions: H is energy, t is time, i dimensionless, ~ equalises the dimensions in the
both sides irrespectively of what v is.
43
Alternatively, it is possible to model the quantum oscillator on the vector space W = C[x] of
polynomials. One has to use the natural linear bijection
: W V, (f (x)) = f (x)ex
2 /2
transfer all the formulae to W . The metric becomes < f, g >=< (f ), (g) >=
Rand
x2 dx, the formulae for P and X changes accordingly, and at the end one
f (x)g(x)e
2
2
arrives at Hermite polynomials 1 (n (x)) = (1)n ex (ex )(n) instead of Hermite functions.
Let us go back to an abstract system with two observables P and Q. It is pointless to measure
Q after measuring P as the system is broken. But can we measure them simultaneously?
The answer is given by Heisenbergs uncertainty principle. Mathematically, it is a corollary
of Schwarzs inequality:
||v||2 ||w||2 = < v, v >< w, w > | < v, w > |2 .
Let e1 , e2 , e3 , . . . be eigenvectors for P with eigenvalues
p1 , p2 , . . .. The probability that pj
P
is returned after measuring on [v] with v =
e
depends
on the multiplicity of the
n n n
eigenvalue:
X
Prob(pj is returned) =
|k |2 .
pk =pj
To compute the expected quadratic error we use the shifted observable Pv = P E(P, v)I:
q
p
p
D(P, v) = E(Pv 2 , v) = < v, Pv (Pv (v)) > = < Pv (v), Pv (v) > = ||Pv (v)||
where we use the fact that P and Pv are hermitian. Notice that D(P, v) has a physical
meaning of uncertainty of measurement of P . Notice also that the operator P Q QP is
no longer hermitian in general but we can still talk about its expected value. Here goes
Heisenbergs principle.
Theorem 3.26
D(P, v) D(Q, v)
1
|E(P Q QP, v)|
2
Proof: In the right hand side, E(P Q QP, v) = E(Pv Qv Qv Pv , v) =< v, Pv Qv (v) >
< v, Qv Pv (v) >=< Pv (v), Qv (v) > < Qv (v), Pv (v) >. Remembering that the form is
hermitian,
E(P Q QP, v) =< Pv (v), Qv (v) > < Pv (v), Qv (v) > = 2 Im(< Pv (v), Qv (v) >) ,
twice the imaginary part. So the right hand side is estimated by Schwarzs inequality:
Im(< Pv (v), Qv (v) >) | < Pv (v), Qv (v) > | ||Pv (v)|| ||Qv (v)|| .
2
Two cases of particular physical interest are commuting observables, i.e. P Q = QP and
conjugate observables, i.e. P Q QP = i~I. Commuting observable can be measured simultaneously with any degree of certainty. Conjugate observables obey Heisenbergs uncertainty:
D(P, v) D(Q, v)
44
~
.
2
4.1
Definitions
Groups were introduced in the first year in Foundations, and will be studied in detail next
term in Algebra II: Groups and Rings. In this course, we are only interested in abelian (=
commutative) groups, which are defined as follows.
Definition. An abelian group is a set G together with a binary operation, which we write
as addition, and which satisfies the following properties:
(i) (Closure) for all g, h G, g + h G;
(ii) (Associativity) for all g, h, k G, (g + h) + k = g + (h + k);
(iii) there exists an element 0G G such that:
(a) (Identity) for all g G, g + 0G = g; and
(b) (Inverse) for all g G there exists g G such that g + (g) = 0G ;
Usually we just write 0 rather than 0G . We only write 0G if we need to distinguish between
the zero elements of different groups.
The commutativity axiom (iv) is not part of the definition of a general group, and for general
(non-abelian) groups, it is more usual to use multiplicative rather than additive notation. All
groups in this course should be assumed to be abelian, although many of the definitions in
this section apply equally well to general groups.
Examples. 1. The integers Z.
2. Fix a positive integer n > 0 and let
Zn = {0, 1, 2, . . . , n1} = { x Z | 0 x < n }.
where addition is computed modulo n. So, for example, when n = 9, we have 2 + 5 = 7,
3 + 8 = 2, 6 + 7 = 4, etc. Note that the inverse x of x Zn is equal to n x in this example.
3. Examples from linear algebra. Let K be a field.
(i) The elements of K form an abelian group under addition.
(ii) The non-zero elements of K form an abelian group K under multiplication.
(iii) The vectors in any vector space form an abelian group under addition.
Proposition 4.1 (The cancellation law) Let G be any group, and let g, h, k G. Then
g + h = g + k h = k.
Proof: Add g to both sides of the equation and use the Associativity and Identity axioms.
2
For any group G, g G, and integer n > 0, we define ng to be g + g + + g, with n
occurrences of g in the sum. So, for example, 1g = g, 2g = g + g, 3g = g + g + g, etc.
We extend this notation to all n Z by defining 0g = 0 and (n)g = (ng) for n < 0.
Overall, this defines a scalar action Z G G which allows as to think of abelian groups
as vector spaces over Z (or using precise terminology Z-modules - algebraic modules will
play a significant role in Rings and Modules in year 3).
Definition. A group G is called cyclic if there exists an element x G such that every
element of G is of the form mx for some m Z.
The element x in the definition is called a generator of G. Note that Z and Zn are cyclic with
generator x = 1.
45
Note (exercise) that any isomorphism must satisfy (0G ) = 0H and (g) = (g) for all
g G.
Proposition 4.2 Any cyclic group G is isomorphic either to Z or to Zn for some n > 0.
Proof: Let G be cyclic with generator x. So G = { mx | m Z }. Suppose first that the
elements mx for m Z are all distinct. Then the map : Z G defined by (m) = mx is
a bijection, and it is clearly an isomorphism.
Otherwise, we have lx = mx for some l < m, and so (ml)x = 0 with m l > 0. Let n be
the least integer with n > 0 and nx = 0. Then the elements 0x = 0, 1x, 2x, . . . , (n1)x of G
are all distinct, because otherwise we could find a smaller n. Furthermore, for any mx G,
we can write m = rn + s for some r, s Z with 0 s < n. Then mx = (rn + s)x = sx, so
G = { 0, 1x, 2x, . . . , (n1)x }, and the map : Zn G defined by (m) = mx for 0 m < n
is a bijection, which is easily seen to be an isomorphism.
2
Definition. For an element g G, the least integer n > 0 with nx = 0, if it exists, is called
the order |g| of g. If there is no such n, then g has infinite order and we write |g| = .
is
defined to
be the
set
In general (non-abelian) group theory this is more often known as the direct product of
groups.
The main result of this section, known as the fundamental theorem of finitely generated abelian
groups, is that every finitely generated abelian group is isomorphic to a direct sum of cyclic
groups. (This is not true in general for abelian groups, such as the additive group Q of
rational numbers, which are not finitely generated.)
4.2
a subgroup of G.
H is nonempty; and
h1 , h2 H h1 + h2 H; and
h H h H.
H is nonempty; and
h1 , h2 H h1 h2 H.
To show that (ii) implies (i) we need to verify the four group axioms in H. Two of these,
Closure, and Inverse, are the conditions (b) and (c). The other two axioms are Associativity and Identity. Associativity holds because it holds in G, and H is a subset of G. Since
we are assuming that H is nonempty, there exists h H, and then h H by (c), and
h + (h) = 0 H by (b), and so Identity holds, and H is a subgroup.
2
Examples. 1. There are two standard subgroups of any group G: the whole group G itself,
and the trivial subgroup {0} consisting of the identity alone. Subgroups other than G are
called proper subgroups, and subgroups other than {0} are called non-trivial subgroups.
2. If g is any element of any group G, then the set of all integer multiples { mg | m Z }
forms a subgroup of G called the cyclic subgroup generated by g.
Let us look at a few specific examples. If G = Z, then 5Z, which consists of all multiples of
5, is the cyclic subgroup generated by 5. Of course, we can replace 5 by any integer here, but
note that the cyclic groups generated by 5 and 5 are the same.
If G = hgi is a finite cyclic group of order n and m is a positive integer dividing n, then
the cyclic subgroup generated by mg has order n/m and consists of the elements kmg for
0 k < n/m.
Exercise. What is the order or the cyclic subgroup generated by mg for general m (where
we drop the assumption that m|n)?
Exercise. Show that the groups of non-zero complex numbers C under the operation of
multiplication has finite cyclic subgroups of all possible orders.
Definition. Let g G. Then the coset H + g is the subset { h + g | h H } of G.
(Note: Since our groups are abelian, we have H + g = g + H, but in general group theory
the right and left cosets Hg and gH can be different.)
Examples. 3. G = Z, H = 5Z. There are just 5 distinct cosets H = H + 0 = { 5n | n Z },
H + 1 = { 5n + 1 | n Z }, H + 2, H + 3, H + 4. Note that H + i = H + j whenever i j
(mod 5).
47
Theorem 4.9 (Lagranges Theorem) Let G be a finite (abelian) group and H a subgroup of
G. Then the order of H divides the order of G.
Definition. The number of distinct right cosets of H in G is called the index of H in G and
is written as |G : H|.
Proof: Let |g| = n. We saw in Example 2 above that the integer multiples { mg | m
Z } of g form a subgroup H of G. By minimality of n, the distinct elements of H are
{0, g, 2g, . . . , (n1)g}, so |H| = n and the result follows from Lagranges Theorem.
2
As an application, we can now immediately classify all finite (abelian) groups whose order is
prime.
48
Proposition 4.11 Let G be a (abelian) group having prime order p. Then G is cyclic; that
is, G
= Zp .
Proof: Let g G with 0 6= g. Then |g| > 1, but |g| divides p by Proposition 4.10, so |g| = p.
But then G must consist entirely of the integer multiples mg (0 m < p) of g, so G is cyclic.
2
Definition. If A and B are subsets of a group G, then we define their sum A + B = { a + b |
a A, b B }.
Lemma 4.12 If H is a subgroup of the abelian group G and H + g, H + h are cosets of H
in G, then (H + g) + (H + h) = H + (g + h).
Proof: Since G is abelian, this follows directly from commutativity and associativity.
Theorem 4.13 Let H be a subgroup of an abelian group G. Then the set G/H of cosets
H + g of H in G forms a group under addition of subsets.
Proof: We have just seen that (H + g) + (H + h) = H + (g + h), so we have closure, and
associativity follows easily from associativity of G. Since (H + 0) + (H + g) = H + g for all
g G, H = H + 0 is an identity element, and since (H g) + (H + g) = H g + g = H,
H g is an inverse to H + g for all cosets H + g. Thus the four group axioms are satisfied
and G/H is a group.
2
Definition. The group G/H is called the quotient group (or the factor group) of G by H.
Notice that if G is finite, then |G/H| = |G : H| = |G|/|H|. So, although the quotient group
seems a rather complicated object at first sight, it is actually a smaller group than G.
Examples. 1. Let G = Z and H = mZ for some m > 0. Then there are exactly m distinct
cosets, H, H + 1, . . . , H + (m 1). If we add together k copies of H + 1, then we get H + k.
So G/H is cyclic of order m and with generator H + 1. So by Proposition 4.2, Z/mZ
= Zm .
2. G = R and H = Z. The quotient group G/H is isomorphic to the circle subgroup
S 1 of the multiplicative group C . One writes an explicit isomorphism : G/H S 1 by
(x + Z) = e2xi .
3. G = Q and H = Z. The quotient group G/H features in one of the previous exams. It
has been required to show that this group is infinite, not finitely generated and that every
element of G/H has finite order.
4. The quotient groups play important role in Analysis: they are used to define Lebesgue
spaces. Let p 1 be a real number, U R Ran interval. Consider the vector space V of all
measurable functions f : U R such that U |f (x)|p dx < . It follows from Minkowskis
inequality that this is a vector space and consequently an abelian
group. It contains a vector
R
subspace W of negligible functions, that is, f (x) satisfying U |f (x)|p dx = 0. The quotient
group V /W is actually a vector space called Lebesgues space and denoted Lp (U, R).
4.3
Note that an isomorphism is just a bijective homomorphism. There are two other types of
morphism that are worth mentioning at this stage.
49
Example. Let G be any group, and let n Z. Then : G G defined by (g) = ng for all
g G is a homomorphism.
Kernels and images are defined as for linear transformations of vector spaces.
50
If one denotes the set of all homomorphism from G to H by hom(G, H) there is an elegant
way to reformulate Lemma 4.17. The composition with the quotient map : G G/A
defines a bijection
hom(G/A, H) { hom(G, H) | (A) = {0}},
7 .
4.4
Definition. The direct sum Zn of n copies of Z is known as a (finitely generated) free abelian
group of rank n.
More generally, a finitely generated abelian group is called free abelian if it is isomorphic to
Zn for some n 0.
(The free abelian group Z0 of rank 0 is defined to be the trivial group {0} containing the
single element 0.)
The groups Zn have many properties in common with vector spaces such as Rn , but we must
expect some differences, because Z is not a field.
We can define the standard basis of Zn exactly as for Rn ; that is, x1 , x2 , . . . , xn , where xi
has 1 in its i-th component and 0 in the other components. This has the same properties as
a basis of a vector space; i.e. it is linearly independent and spans Zn .
Definition. Elements x1 , . . . , xn of an abelian group G are called linearly independent if, for
1 , . . . , n Z, 1 x1 + + n xn = 0G implies 1 = 2 = = n = 0Z .
Definition. Elements x1 , . . . , xn form a free basis of the abelian group G if and only if they
are linearly independent and generate (span) G.
Now consider elements x1 , . . . , xn of an abelian group G. It is possible to extend the assignment (xi ) = xiPto a group homomorphism : Zn G. As a function we define
((a1 , a2 , . . . an )T ) = ni=1 ai xi . We leave the proof of the following result as an exercise.
Before Proposition 4.19 we were trying to extend the assignment (xi ) = xi to a group
homomorphism : Zn G. Note that the extension we wrote is unique. This is the key to
the next corollary. The details of the proof are left to the reader.
Corollary 4.20 (Universal property of the free abelian group). Let G be a free group with
a basis x1 , . . . xn . Let H be a group and a1 , . . . an H. Then there exists a unique group
homomorphism : G H such that (xi ) = ai for all i.
As for finite dimensional vector spaces, it turns out that any two free bases of a free abelian
group have the size, but this has to be proved. It will follow directly from the next theorem.
Let x1 , x2 , . . . , xn be the standard free basis of Zn , and let y1 , . . . ym be another free basis.
As in Linear Algebra, we can define the associated change of basis matrix P (with original
basis {xi } and new basis {yi }) , where the columns of P are yiT ; that is, they express
yi in
2 1
terms of xi . For example, if n = m = 2, y1 = (2 7), y2 = (1 4), then P =
. In
7 4
Pn
general, P = (ij ) is an n m matrix with yj = i=1 ij xi for 1 j m.
P
Theorem 4.21 Let y1 , . . . ym Zn with yj = ni=1 ij xi for 1 j m. Then the following
are equivalent:
(i) y1 , . . . ym is a free basis of Zn ;
(ii) n = m and P is an invertible matrix such that P 1 has entries in Z;
(iii) n = m and det(P ) = 1.
m
X
jk yj =
m
X
jk
j=1
j=1
n
X
i=1
m
n X
X
ij jk )xi ,
(
ij xi =
i=1 j=1
Pm
and, since x1 , . . . , xn is a free basis, this implies that j=1 ij jk = 1 when i = k and 0 when
i 6= k. In other words P T = In , and similarly T P = Im , so P and T are inverse matrices.
But we can think of P and T as inverse matrices over the field Q, so it follows from First
Year Linear Algebra that m = n, and T = P 1 has entries in Z.
(ii) (i). If T = P 1 has entries in Z then, again thinking of them as matrices over the field
Q, rank(P ) = n, so the columns of P are linearly independent over Q and hence also over Z.
Since the columns of P are just the column vectors representing y1 , . . . ym , this tells us that
y1 , . . . ym are linearly independent.
Using P T = In , for 1 k n we have
m
X
j=1
Pm
jk yj =
m
X
j=1
jk
n
X
i=1
m
n X
X
ij jk )xi = xk ,
(
ij xi =
i=1 j=1
(ii) (iii). If T = P 1 has entries in Z, then det(P T ) = det(P ) det(T ) = det(In ) = 1, and
since det(P ), det(T ) Z, this implies det(P ) = 1.
1
det(P ) adj(P ),
so det(P ) = 1 implies
2
4.5
We interrupt our discussion of finitely generated abelian groups at this stage to investigate
how the row and column reduction process of Linear Algebra can be adapted to matrices
over Z. Recall from MA106 that we can use elementary row and column operations to reduce
an m n matrix of rank r over a field K to a matrix B = (ij ) with ii = 1 for 1 i r
and ij = 0 otherwise. We called this the Smith Normal Form of the matrix. We can do
something similar over Z, but the non-zero elements ii will not necessarily all be equal to 1.
The reason that we disallowed = 0 for the row and column operations (R3) and (C3)
(multiply a row or column by a scalar ) was that we wanted all of our elementary operations
to be reversible. When performed over Z, (R1), (C1), (R2) and (C2) are reversible, but (R3)
and (C3) are reversible only when = 1. So, if A is an m n matrix over Z, then we define
the three types of unimodular elementary row operations as follows:
(UR1): Replace some row ri of A by ri + trj , where j 6= i and t Z;
The unimodular column operations (UC1), (UC2), (UC3) are defined similarly. Recall from
MA106 that performing elementary row or column operations on a matrix A corresponds
to multiplying A on the left or right, respectively, by an elementary matrix. These elementary matrices all have determinant 1 (1 for (UR1) and 1 for (UR2) and (UR3)), so are
unimodular matrices over Z.
Theorem 4.22 Let A be an m n matrix over Z with rank r. Then, by using a sequence of
unimodular elementary row and column operations, we can reduce A to a matrix B = (ij )
with ii = di for 1 i r and ij = 0 otherwise, and where the integers di satisfy di > 0
for 1 i r, and di |di+1 for 1 i < r. Subject to these conditions, the di are uniquely
determined by the matrix A.
Proof: We shall not prove the uniqueness part here. The fact that the number of non-zero
ii is the rank of A follows from the fact that unimodular row and column operations do not
change the rank. We use induction on m + n. The base case is m = n = 1, where there is
nothing to prove. Also if A is the zero matrix then there is nothing to prove, so assume not.
Let d be the smallest entry with d > 0 in any matrix C = (ij ) that we can obtain from A
by using unimodular elementary row and column operations. By using (R2) and (C2), we
53
can move d to position (1, 1) and hence assume that 11 = d. If d does not divide 1j for
some j > 0, then we can write 1j = qd + r with q, r Z and 0 < r < d, and then replacing
the j-th column cj of C by cj qc1 results in the entry r in position (1, j), contrary to the
choice of d. Hence d|1j for 2 j n and similarly d|i1 for 2 i m.
Now, if 1j = qd, then replacing cj of C by cj qc1 results in entry 0 position (1, j). So
we can assume that 1j = 0 for 2 j n and i1 = 0 for 2 i m. If m = 1 or n = 1,
then we are done. Otherwise, we have C = (d) C for some (m 1) (n 1) matrix C .
By inductive hypothesis, the result of the theorem applies to C , so by applying unimodular
row and column operations to C which do not involve the first row or column, we can reduce
C to D = (ij ), which satisfies 11 = d, ii = di > 0 for 2 i r, and ij = 0 otherwise,
where di |di+1 for 2 i < r. To complete the proof, we still have to show that d|d2 . If not,
then adding row 2 to row 1 results in d2 in position (1,2) not divisible by d, and we obtain a
contradiction as before.
2
42
21
Example 1. A =
.
35 14
The general strategy is to reduce the size of entries in the first row and column, until the
(1,1)-entry divides all other entries in the first row and column. Then we can clear all of
these other entries.
Matrix
42
21
35 14
7 14
0 21
Operation
c1 c1 2c2
c2 c2 2c1
Matrix
0
21
7 14
7 0
0 21
18 18 18 90
54
12
45 48
.
Example 2. A =
9 6
6 63
18
6
15 12
54
Operation
r2 r2
r1 r2
Matrix
18 18 18 90
54
12
45 48
9 6
6 63
18
6
15 12
3
6
15 12
9
12
45 48
3 6
6 63
0 18 18 90
3
0
0 0
0 6
0 12
0 12 9 51
0 18 18 90
3 0
0 0
0 3 9 51
0 6
0 12
0 0 18 90
3 0
0
0
0 3
0
0
0 0
18 90
0 0 18
90
Operation
Matrix
0 18 18 90
9
12
45 48
3 6
6 63
3
6
15 12
3
6
15 12
0 6
0 12
0 12 9 51
0 18 18 90
3 0
0 0
0 6
0 12
0 3 9 51
0 0 18 90
3 0
0
0
0 3 9
51
0 0
18 90
0 0 18
90
3 0 0 0
0 3 0 0
0 0 18 0
0 0 0 0
c1 c1 c3
r2 r2 3r1
r3 r3 r1
c2 c2
c2 c2 + c3
r3 r3 2r2
c4 c4 + 5c3
r4 r4 + r3
Operation
r1 r4
c2 c2 2c1
c3 c3 5c1
c4 c4 4c1
r2 r3
c3 c3 + 3c2
c4 c4 17c2
Note: There is also a generalisation to integer matrices of the the row reduced normal form
from Linear Algebra, where only row operations are allowed. This is known as the Hermite
Normal Form and is more complicated. It will appear on an exercise sheet.
4.6
Proposition 4.23 Any subgroup of a finitely generated abelian group is finitely generated.
Proof: Let K < G with G an abelian group generated by x1 , . . . , xn . We shall prove by
induction on n that K can be generated by at most n elements. If n = 1 then G is cyclic.
Write G = {nx|n Z}. Let m be the smallest positive number such that mx K. If such a
number does not exist then K = {0}. Otherwise, K {nmx|n Z}. The opposite inclusion
follows using division with a remainder: write t = qm + r with 0 r < m. Then tx K if
and only if rx = (t mq)x K if and only if r = 0 due to minimality of m. In both cases
K is cyclic.
Suppose n > 1, and let H be the subgroup of G generated by x1 , . . . , xn1 . By induction,
K H is generated by y1 , . . . , ym1 , say, with m n. If K H, then K = K H and we
are done, so suppose not.
Then there exist elements of the form h + txn K with h H and t 6= 0. Since (h + txn )
K, we can assume that t > 0. Choose such an element ym = h + txn K with t minimal
subject to t > 0. We claim that K is generated by y1 , . . . , ym , which will complete the proof.
Let k K. Then k = h + uxn with h H and u Z. If t does not divide u then we can
write u = tq + r with q, r Z and 0 < r < t, and then k qym = (h qh) + rxn K,
contrary to the choice of t. So t|u and hence u = tq and k qym K H. But K H is
generated by y1 , . . . , ym1 , so we are done.
2
55
Now let H be a subgroup of the free abelian group Zn , and suppose that H is generated by
v1 , . . . , vm . Then H can be represented by an n m matrix A in which the columns are
T.
v1T , . . . , vm
Example
1
A= 3
1
As we saw above, if we use a different free basis y1 , . . . , yn of Zn with basis change matrix
P , then each column vjT of A is replaced by P 1 vjT , and hence A itself is replaced by P 1 A.
So in Example 3, if we
P = 1
0
1 1
1 1 1
1
1
1
0 1 , P = 0
0
1 , P A = 1
1 0
1
0 1
2
1 0) of Zn , then
1
1.
1
1 0
0 3 , so H = hy1 , 3y2 i.
0 0
By keeping track of the unimodular row operations carried out, we can, if we need to, find
the free basis y1 , . . . , yn of Zn . Doing this in Example 3, we get:
56
Matrix
1 2
3 0
1 1
1
2
0 6
3
0
1
0
0 6
3
0
1
0
0
3
0 6
1 0
0 3
0 0
4.7
Operation
r2 r2 3r1
r3 r3 + r1
y1 = (1 3 1), y2 = (0 1 0), y3 = (0 0 1)
c2 c2 2c1
y1 = (1 3 1), y2 = (0 1 0), y3 = (0 0 1)
r2 r3
y1 = (1 3 1), y2 = (0 0 1), y3 = (0 1 0)
r3 r3 + 2r2
y1 = (1 3 1), y2 = (0 2 1), y3 = (0 1 0)
Let G be a finitely generated abelian group. If G has n generators, Proposition 4.19 gives a
surjective homomorphism : Zn G. From the First isomorphism Theorem (Theorem 4.18)
we deduce that G
= Zn /K, where K = ker(). So we have proved that every finitely
generated abelian group is isomorphic to a quotient group of a free abelian group.
From the definition of , we see that
K = { (1 , 2 , . . . , n ) Zn | 1 x1 + + n xn = 0G }.
By Theorem 4.23, this subgroup K is generated by finitely many elements v1 , . . . , vm of Zn .
The notation
h x1 , . . . , xn | v1 , . . . , vm i
is often used to denote the quotient group Zn /K, so we have
G
= h x1 , . . . , xn | v1 , . . . , vm i.
Now we can apply Theorem 4.25 to this subgroup K, and deduce that there is a free basis
y1 , . . . , yn of Zn such that K = h d1 y1 , . . . , dr yr i for some r n, where each di > 0 and
di |di+1 for 1 i < r.
So we also have
G
= h y1 , . . . , yn | d1 y1 , . . . , dr yr i,
57
K = { (1 , 2 , . . . , r , 0, . . . , 0) Zn | di |i for 1 i r }
which is generated by the elements d1 y1 , . . . , dr yr . So
H
= Zn /K = h y1 , . . . , yn | d1 y1 , . . . , dr yr i.
2
Putting all of these results together, we get the main theorem:
Theorem 4.27 (The fundamental theorem of finitely generated abelian groups) If G is a
finitely generated abelian group, then G is isomorphic to a direct sum of cyclic groups. More
precisely, if G is generated by n elements then, for some r with 0 r n, there are integers
d1 , . . . , dr with di > 0 and di |di+1 such that
G
= Zd1 Zd2 . . . Zdr Znr .
and we have G
= Z7 Z21 , a group of order 7 21 = 147.
h x1 , x2 , x3 | x1 + 3x2 x3 , 2x1 + x3 i,
and is isomorphic to Z1 Z3 Z
= Z3 Z, so it is infinite, with a finite subgroup of order 3.
58
4.8
From the uniqueness part of Theorem 4.22 (which we did not prove), it follows that, if di |di+1
for 1 i < r and ei |ei+1 for 1 i < s. then Zd1 Zd2 Zdr
= Ze1 Ze2 Zes if and only if
r = s and di = ei for 1 i r.
So the isomorphism classes of finite abelian groups of order n > 0 are in one-one correspondence with expressions n = d1 d2 dr for which di |di+1 for 1 i < r. This enables us to
classify isomorphism classes of finite abelian groups.
Examples. 1. n = 4. The decompositions are 4 and 2 2, so G
= Z4 or Z2 Z2 .
Although we have not proved in general that groups of the same order but with different
decompositions of the type above are not isomorphic, this can always be done in specific
examples by looking at the orders of elements.
We saw in an exercise above that if : G H is an isomorphism then |g| = |(g)| for all
g G. So isomorphic groups have the same number of elements of each order.
Note also that, if g = (g1 , g2 , . . . , gn ) is an element of a direct sum of n groups, then |g| is the
least common multiple of the orders |gi | of the components of g.
So, in the four groups of order 36, G1 = Z36 , G2 = Z2 Z18 , G3 = Z3 Z12 and G4 = Z6 Z6 ,
we see that only G1 contains elements of order 36. Hence G1 cannot be isomorphic to G2 ,
G3 or G4 . Of the three groups G2 , G3 and G4 , only G2 contains elements of order 18, so G2
cannot be isomorphic to G3 or G4 . Finally, G3 has elements of order 12 but G4 does not, so
G3 and G4 are not isomorphic, and we have now shown that no two of the four groups are
isomorphic to each other.
As a slightly harder example, Z2 Z2 Z4 is not isomorphic to Z4 Z4 , because the former
has 7 elements of order 2, whereas the latter has only 3.
4.9
Tensor products
Given two abelian groups A and G, one can form a new abelian group A B, their tensor
product, do not confuse with the direct product. We consider the direct product X = A B
as a set. Let F be the free abelian group with X as a basis:
F = h A G | i.
P
Elements of F are formal finite Z-linear combinations i ni (ai , gi ), ni Z, ai A, gi G.
Let F0 be the subgroup of F generated by the following elements
(a + b, g) (a, g) (b, g), n(a, g) (na, g),
for various a A, g G. However, it is important to realise that not all tensors are
elementary. Generators for F0 become relations on elementary tensors,
(a + b) g = a g + b g, n(a g) = (na) h,
a (g + h) = a g + a h, n(a g) = a (nh),
P
so a general element of A G is a sum i ai gi .
If
hj span A G. Indeed, given
Pelements bi span A, while elements hj span
PG, then bi P
a
G,
we
can
express
all
a
=
n
b
,
g
=
k
k
k
k k
i ki i
j mkj hj . Then
X
XX
X
X
ak gk =
(
nki bi ) (
mkj hj ) =
nki mkj bi gj
k
k,i,j
In particular, a tensor product of two free groups is free. However, for general groups tensor
products could behave in quite an unpredictable way. For instance, Z2 Z3 = 0. Indeed,
1Z2 1Z3 = 3 1Z2 1Z3 = 1Z2 3 1Z3 = 0.
To help sorting out zero from nonzero elements in tensor products we need to understand a
connection between tensor products and bilinear maps. Let A, G, and H be abelian groups.
(a, g) = a g
is a bilinear map. This bilinear map is universal, i.e. the composition with defines a
bijection
hom(A G, H) Bil(A G, H), 7 .
Proof: The function is a bilinear map: the four properties of a bilinear map easily follow
from the corresponding generators of F0 . For instance, (a + b, g) = (a, g) + (b, g) because
(a + b, g) (a, g) (b, g) F0 .
Let F un denote the set of functions between two sets. By the universal property of a free
abelian group (Corollary 4.20), we have a bijection
hom(F, H) F un(A G, H).
Bilinear maps correspond to functions vanishing on F0 , i.e., to linear maps from F/F0
(Lemma 4.17).
2
In the following section we will need a criterion for elements of RS 1 to be nonzero. The circle
group S 1 is a group under multiplication, creating certain confusion for tensor products. To
avoid this confusion we identify the multiplicative group S 1 with the additive group R/2Z
via the natural isomorphism exi 7 x + 2Z.
60
The required bilinear map is defined using multiplication by a scalar in A = R/Q: (b, z +
2Z) = e1 (b)(z + 2Z). Clearly, (a, x + 2Z) = e1 (e1 )(x + 2Z) = 1 (x + 2Z) =
x + Q 6= 0.
2
Exercise. Zn Zm = Zgcd(n,m) .
4.10
All the hard work we have done is going to pay off now. We will understand a solution of
the third Hilbert problem. In 1900 Hilbert formulated 23 problems that, in his view, would
influence Mathematics of the 20th century. The third problem was the first solved: the same
year 1900 by Dehn, which is quite remarkable as the problem was missing from Hilberts
lecture and appeared in print only in 1902, two years after solution.
In his third problem Hilbert asks whether two 3D polytopes of the same volume are scissor
congruent. Recall that M and N are congruent if there is a motion that moves M to N . M
and N are scissor congruent if one can cut M into pieces Mi (cutting along planes) and N
into pieces Ni such that individual pieces Mi and Ni are congruent for each i.
Let us consider a scissor group generated by all n-dimensional polytopes P
Pn = h P | M N, A B C i
with these relations for each pair M , N of congruent polytopes and each cut A = B C
of a polytope by a hyperplane. For a polytope M , let use denote by [M ] Pn its class in
the scissor group. Clearly, M and N are scissor congruent if and only if [M ] = [N ]. By
Lemma 4.17, n-dimensional volume is a homomorphism
n : Pn R,
n ([M ]) = volume(M ).
61
Proof: For a polygon M , there are triangles T1 , T2 . . . Tn such that [M ] = [T1 ]+[T2 ]+. . . [Tn ].
It follows from triangulation of M illustrated on the next picture.
It suffices to show that if two triangles T and T have the same area, then [T ] = [T ]. Indeed,
using it, one can reshape triangles to T1 , T2 . . . Tn so that they add up to a triangle T :
Then [M ] = [T1 ] + [T2 ] + . . . [Tn ] = [T1 ] + [T2 ] + . . . [Tn ] = [T ] and we supposedly know that
two triangles of the same area are scissors equivalent. The following picture shows that a
triangles with the base b and the height h is equivalent to the rectangle with sides b and h/2.
In particular, any polygon is equivalent to a right-angled triangle. The last picture shows
who two right-angled triangles of the same area are scissors congruent.
A
P
B
The equal area triangles are CAB and CP Q. This means that |CA||CB| = |CP ||CQ|.
Hence, |CA|/|CQ| = |CP |/|CB| and triangles CP B and CAQ are similar. In particular,
AQ and P B are parallel, thus triangles AP B and QP B share the same base and height and,
consequently, scissors congruent. Finally,
[CAB] = [CP B] + [AP B] = [CP B] + [QP B] = [CP Q]
2
17
5983
28545857
7
, x2 = , x3 =
, x4 =
,...
8
9
81
3
316
Now it is natural to ask what exactly the group P3 is. It was proved later (in 1965) that the
joint homomorphism (3 , ) : P3 R (R R/2Z) is injective. It is not surjective but the
image can be explicitly described but we wont do it here.
4.11
If you would like to write an essay taking something further from this course, here are some
suggestions. Ask me if you want more information.
(i) Bilinear and quadratic forms over fields of characteristic 2 (i.e. where 1 + 1 = 0). You
can do hermitian forms too.
(ii) Grassmann algebras, determinants and tensors.
(iii) Matrix Exponents and Baker-Campbell-Hausdorff formula.
(iv) Abelian groups and public key cryptography (be careful not to repeat whatever is covered in Algebra-2).
(v) Lattices (abelian groups with bilinear forms), lattice E8 and Leech lattice.
(vi) Abelian group law on an elliptic curve.
(vii) Groups Pn for other n, including a precise description of P3 .
The End
63