A Concise Text On Advanced Linear Algebra
A Concise Text On Advanced Linear Algebra
This engaging textbook for advanced undergraduate students and beginning graduates
covers the core subjects in linear algebra. The author motivates the concepts by
drawing clear links to applications and other important areas.
The book places particular emphasis on integrating ideas from analysis wherever
appropriate and features many novelties in its presentation. For example, the notion of
determinant is shown to appear from calculating the index of a vector field which leads
to a self-contained proof of the Fundamental Theorem of Algebra; the
CayleyHamilton theorem is established by recognizing the fact that the set of
complex matrices of distinct eigenvalues is dense; the existence of a real eigenvalue of
a self-adjoint map is deduced by the method of calculus; the construction of the Jordan
decomposition is seen to boil down to understanding nilpotent maps of degree two;
and a lucid and elementary introduction to quantum mechanics based on linear algebra
is given.
The material is supplemented by a rich collection of over 350 mostly proof-oriented
exercises, suitable for readers from a wide variety of backgrounds. Selected solutions
are provided at the back of the book, making it ideal for self-study as well as for use as
a course text.
A Concise Text on
Advanced Linear Algebra
YISONG YANG
Polytechnic School of Engineering, New York University
For Sheng,
Peter, Anna, and Julia
Contents
Preface
Notation and convention
page ix
xiii
Vector spaces
1.1
Vector spaces
1.2
Subspaces, span, and linear dependence
1.3
Bases, dimensionality, and coordinates
1.4
Dual spaces
1.5
Constructions of vector spaces
1.6
Quotient spaces
1.7
Normed spaces
1
1
8
13
16
20
25
28
Linear mappings
2.1
Linear mappings
2.2
Change of basis
2.3
Adjoint mappings
2.4
Quotient mappings
2.5
Linear mappings from a vector space into itself
2.6
Norms of linear mappings
34
34
45
50
53
55
70
Determinants
3.1
Motivational examples
3.2
Definition and properties of determinants
3.3
Adjugate matrices and Cramers rule
3.4
Characteristic polynomials and CayleyHamilton
theorem
78
78
88
102
Scalar products
4.1
Scalar products and basic properties
115
115
vii
107
viii
Contents
4.2
4.3
4.4
4.5
120
127
137
142
147
147
151
157
164
170
172
180
180
184
188
194
199
Jordan decomposition
7.1
Some useful facts about polynomials
7.2
Invariant subspaces of linear mappings
7.3
Generalized eigenspaces as invariant subspaces
7.4
Jordan decomposition theorem
205
205
208
211
218
Selected topics
8.1
Schur decomposition
8.2
Classification of skewsymmetric bilinear forms
8.3
PerronFrobenius theorem for positive matrices
8.4
Markov matrices
226
226
230
237
242
248
248
252
257
262
267
311
313
315
Preface
Preface
then to show how the exponential of a linear mapping may be constructed and
understood.
In Chapter 3 we cover determinants. As a non-traditional but highly
motivating example, we show that the calculation of the topological degree
of a differentiable map from a closed curve into the unit circle in R2 involves
computing a two-by-two determinant, and the knowledge gained allows us to
prove the Fundamental Theorem of Algebra. We then formulate the definition
of a general determinant inductively, without resorting to the notion of permutations, and establish all its properties. We end the chapter by establishing the
CayleyHamilton theorem. Two independent proofs of this important theorem
are given. The first proof is analytic and consists of two steps. In the first step,
we show that the theorem is valid for a matrix of distinct eigenvalues. In the
second step, we show that any matrix may be regarded as a limiting point of a
sequence of matrices of distinct eigenvalues. Hence the theorem follows again
by taking the limit. The second proof, on the other hand, is purely algebraic.
In Chapter 4 we discuss vector spaces with scalar products. We start from the
most general notion of scalar products without requiring either non-degeneracy
or positive definiteness. We then carry out detailed studies on non-degenerate
and positive definite scalar products, respectively, and elaborate on adjoint
mappings in terms of scalar products. We end the chapter with a discussion
of isometric mappings in both real and complex space settings and noting their
subtle differences.
In Chapter 5 we focus on real vector spaces with positive definite scalar
products and quadratic forms. We first establish the main spectral theorem for
self-adjoint mappings. We will not take the traditional path of first using the
Fundamental Theorem of Algebra to assert that there is an eigenvalue and then
applying the self-adjointness to show that the eigenvalue must be real. Instead
we shall formulate an optimization problem and use calculus to prove directly
that a self-adjoint mapping must have a real eigenvalue. We then present a
series of characteristic conditions for a symmetric bilinear form, a symmetric
matrix, or a self-adjoint mapping, to be positive definite. We end the chapter
by a discussion of the commutativity of self-adjoint mappings and the usefulness of self-adjoint mappings for the investigation of linear mappings between
different spaces.
In Chapter 6 we study complex vector spaces with Hermitian scalar products
and related notions. Much of the theory here is parallel to that of the real space
situation with the exception that normal mappings can only be fully understood
and appreciated within a complex space formalism.
In Chapter 7 we establish the Jordan decomposition theorem. We start with
a discussion of some basic facts regarding polynomials. We next show how
Preface
xi
to reduce a linear mapping over its generalized eigenspaces via the Cayley
Hamilton theorem and the prime factorization of the characteristic polynomial
of the mapping. We then prove the Jordan decomposition theorem. The key
and often the most difficult step in this construction is a full understanding
of how a nilpotent mapping is reduced canonically. We approach this problem
inductively with the degree of a nilpotent mapping and show that it is crucial to
tackle a mapping of degree 2. Such a treatment eases the subtlety of the subject
considerably.
In Chapter 8 we present four selected topics that may be used as materials for some optional extra-curricular study when time and interest permit. In
the first section we present the Schur decomposition theorem, which may be
viewed as a complement to the Jordan decomposition theorem. In the second
section we give a classification of skewsymmetric bilinear forms. In the third
section we state and prove the PerronFrobenius theorem regarding the principal eigenvalues of positive matrices. In the fourth section we establish some
basic properties of the Markov matrices.
In Chapter 9 we present yet another selected topic for the purpose of
optional extra-curricular study: a short excursion into quantum mechanics
using gadgets purely from linear algebra. Specifically we will use Cn as the
state space and Hermitian matrices as quantum mechanical observables to formulate the over-simplified quantum mechanical postulates including Bohrs
statistical interpretation of quantum mechanics and the Schrdinger equation
governing the time evolution of a state. We next establish Heisenbergs uncertainty principle. Then we prove the equivalence of the Schrdinger description
via the Schrdinger equation and the Heisenberg description via the Heisenberg equation of quantum mechanics.
Also provided in the book is a rich collection of mostly proof-oriented
exercises to supplement and consolidate the main course materials. The
diversity and elasticity of these exercises aim to satisfy the needs and interests of students from a wide variety of backgrounds.
At the end of the book, solutions to some selected exercises are presented.
These exercises and solutions provide additional illustrative examples, extend
main course materials, and render convenience for the reader to master the
subjects and methods covered in a broader range.
Finally some bibliographic notes conclude the book.
This text may be curtailed to meet the time constraint of a semester-long
course. Here is a suggested list of selected sections for such a plan: Sections 1.11.5, 2.12.3, 2.5, 3.1.2, 3.2, and 3.3 (present the concept of adjugate
matrices only), Section 3.4 (give the second proof of the CayleyHamilton theorem only, based on an adjugate matrix expansion), Sections 4.3, 4.4, 5.1, 5.2
xii
Preface
(omit the analytic proof that a self-adjoint mapping must have an eigenvalue
but resort to Exercise 5.2.1 instead), Sections 5.3, 6.1, 6.2, 6.3.1, and 7.17.4.
Depending on the pace of lectures and time available, the instructor may
decide in the later stage of the course to what extent the topics in Sections
7.17.4 (the Jordan decomposition) can be presented productively.
The author would like to take this opportunity to thank Patrick Lin, Thomas
Otway, and Robert Sibner for constructive comments and suggestions, and
Roger Astley of Cambridge University Press for valuable editorial advice,
which helped improve the presentation of the book.
West Windsor, New Jersey
Yisong Yang
{c} = b.
We use i, j, k, l, m, n to denote integer-valued indices or space dimension numbers, a, b, c scalars, u, v, w, x, y, z vectors, A, B, C, D matrices,
P , R, S, T mappings, and U, V , W, X, Y, Z vector spaces, unless otherwise
stated.
We use t to denote the variable in a polynomial or a function or the transpose
operation on a vector or a matrix.
When X or Y is given, we use X Y to denote that Y , or X, is defined to
be X, or Y , respectively.
Occasionally, we use the symbol to express for all.
Let X be a set and Y, Z subsets of X. We use Y \ Z to denote the subset of
elements in Y which are not in Z.
xiii
1
Vector spaces
In this chapter we study vector spaces and their basic properties and structures.
We start by stating the definition and a discussion of the examples of vector
spaces. We next introduce the notions of subspaces, linear dependence, bases,
coordinates, and dimensionality. We then consider dual spaces, direct sums,
and quotient spaces. Finally we cover normed vector spaces.
1.1.1 Fields
The scalars to operate on vectors in a vector space are required to form a field,
which may be denoted by F, where two operations, usually called addition,
denoted by +, and multiplication, denoted by or omitted, over F are performed between scalars, such that the following axioms are satisfied.
(1) (Closure) If a, b F, then a + b F and ab F.
(2) (Commutativity) For a, b F, there hold a + b = b + a and ab = ba.
(3) (Associativity) For a, b, c F, there hold (a + b) + c = a + (b + c) and
a(bc) = (ab)c.
(4) (Distributivity) For a, b, c F, there hold a(b + c) = ab + ac.
(5) (Existence of zero) There is a scalar, called zero, denoted by 0, such that
a + 0 = a for any a F.
(6) (Existence of unity) There is a scalar different from zero, called one,
denoted by 1, such that 1a = a for any a F.
1
Vector spaces
(1.1.1)
It is clear that Z is divided into exactly p cosets, [0], [1], . . . , [p 1]. Use
Zp to denote the set of these cosets and pass the additive and multiplicative
operations in Z over naturally to the elements in Zp so that
[i] + [j ] = [i + j ],
[i][j ] = [ij ].
(1.1.2)
It can be verified that, with these operations, Zp becomes a field with its obvious zero and unit elements, [0] and [1]. Of course, p[1] = [1] + + [1] (p
terms)= [p] = [0]. In fact, p is the smallest positive integer whose multiplication with unit element results in zero element. A number of such a property
is called the characteristic of the field. Thus, Zp is a field of characteristic p.
For Q, R, and C, since no such integer exists, we say that these fields are of
characteristic 0.
a1
.
. or (a1 , . . . , an ) where a1 , . . . , an F.
(1.1.3)
.
an
Furthermore, we can define the addition of two vectors and the scalar multiplication of a vector by a scalar following the rules such as
an
b1
a1 + b1
..
..
. =
.
bn
(1.1.4)
an + bn
a1
a1
. .
.. = .. where F.
an
(1.1.5)
an
The set Fn , modeled over the field F and equipped with the above operations,
is a prototype example of a vector space.
More generally, we say that a set U is a vector space over a field F if U is
non-empty and there is an operation called addition, denoted by +, between
the elements of U , called vectors, and another operation called scalar multiplication between elements in F, called scalars, and vectors, such that the
following axioms hold.
(1) (Closure) For u, v U , we have u + v U . For u U and a F, we
have au U .
(2) (Commutativity) For u, v U , we have u + v = v + u.
(3) (Associativity of addition) For u, v, w U , we have u + (v + w) =
(u + v) + w.
(4) (Existence of zero vector) There is a vector, called zero and denoted by 0,
such that u + 0 = u for any u U .
(5) (Existence of additive inverse) For any u U , there is a vector, denoted
as (u), such that u + (u) = 0.
(6) (Associativity of scalar multiplication) For any a, b F and u U , we
have a(bu) = (ab)u.
(7) (Property of unit scalar) For any u U , we have 1u = u.
(8) (Distributivity) For any a, b F and u, v U , we have (a+b)u = au+bu
and a(u + v) = au + av.
As in the case of the definition of a field, we see that it readily follows from
the definition that zero vector and additive inverse vectors are all unique in
a vector space. Besides, any vector multiplied by zero scalar results in zero
vector. That is, 0u = 0 for any u U .
Other examples of vector spaces (with obviously defined vector addition and
scalar multiplication) include the following.
(1) The set of all polynomials with coefficients in F defined by
P = {a0 + a1 t + + an t n | a0 , a1 , . . . , an F, n N},
(1.1.6)
Vector spaces
dn x
dx
+ + a1
+ a0 x = 0,
dt n
dt
a0 , a1 , . . . , an R.
(1.1.7)
(4) In addition, we can also consider the set of arrays of scalars in F consisting
of m rows of vectors in Fn or n columns of vectors in Fm of the form
(aij ) =
(1.1.8)
1.1.3 Matrices
Here we consider some of the simplest manipulations on, and properties of,
matrices.
Let A be the matrix given in (1.1.8). Then At , called the transpose of A, is
defined to be
A =
(1.1.9)
It will now be useful to introduce the notion of dot product. For any two
vectors u = (a1 , . . . , an ) and v = (b1 , . . . , bn ) in Fn , their dot product uv F
is defined to be
u v = a1 b1 + + an bn .
(1.1.10)
i = 1, . . . , m,
j = 1, . . . , n,
(1.1.11)
where cij is the dot product of the ith row of A and the j th column of B. Thus
AB F(m, n).
Alternatively, if we use u, v to denote column vectors in Fn , then
u v = ut v.
(1.1.12)
That is, the dot product of u and v may be viewed as a matrix product of the
1 n matrix ut and n 1 matrix v as well.
Matrix product (or matrix multiplication) enjoys the following properties.
(1) (Associativity of scalar multiplication) a(AB) = (aA)B = A(aB) for
any a F and any A F(m, k), B F(k, n).
(2) (Distributivity) A(B + C) = AB + AC for any A F(m, k) and B, C
F(k, n); (A + B)C = AC + BC for any A, B F(m, k) and C F(k, n).
(3) (Associativity) A(BC) = (AB)C for any A F(m, k), B F(k, l),
C F(l, n).
Alternatively, if we express A F(m, k) and B F(k, n) as made of m row
vectors and n column vectors, respectively, rewritten as
A1
.
A=
.. ,
Am
B = (B1 , . . . , Bn ),
(1.1.13)
Vector spaces
A1 B1
A1 B2
A2 B1 A2 B2
AB =
Am B1 Am B2
A1
.
=
.. (B1 , . . . , Bn )
A1 Bn
A2 Bn
Am B n
Am
A1 B1
A2 B1
=
A m B1
A1 B2
A2 B2
Am B2
A1 Bn
A2 Bn
,
Am Bn
(1.1.14)
which suggests that matrix multiplication may be carried out with legitimate
multiplications executed over appropriate matrix blocks.
If A F(m, k) and B F(k, n), then At F(k, m) and B t F(n, k) so
that B t At F(n, m). Regarding how AB and B t At are related, here is the
conclusion.
Theorem 1.1 For A F(m, k) and B F(k, n), there holds
(AB)t = B t At .
(1.1.15)
(1.1.16)
In this situation, B is unique (cf. Exercise 1.1.7) and called the inverse of A
and denoted by A1 .
If A, B F(n, n) are such that AB = I , then we say that A is a left inverse
of B and B a right inverse of A. It can be shown that a left or right inverse is
simply the inverse. In other words, if A is a left inverse of B, then both A and
B are invertible and the inverses of each other.
If A R(n, n) enjoys the property AAt = At A = I , then A is called an
orthogonal matrix. For A = (aij ) C(m, n), we adopt the notation A = (a ij )
for taking the complex conjugate of A and use A to denote taking the complex
t
conjugate of the transpose of A, A = A , which is also commonly referred
to as taking the Hermitian conjugate of A. If A C(n, n), we say that A is
Hermitian symmetric, or simply Hermitian, if A = A, and skew-Hermitian
or anti-Hermitian, if A = A. If A C(n, n) enjoys the property AA =
A A = I , then A is called a unitary matrix. We will see the importance of
these notions later.
Exercises
1.1.1 Show that it follows from the definition of a field that zero, unit, additive,
and multiplicative inverse scalars are all unique.
1.1.2 Let p N be a prime and [n] Zp . Find [n] and prove the existence
of [n]1 when [n]
= [0]. In Z5 , find [4] and [4]1 .
1.1.3 Show that it follows from the definition of a vector space that both zero
and additive inverse vectors are unique.
1.1.4 Prove the associativity of matrix multiplication by showing that
A(BC) = (AB)C for any A F(m, k), B F(k, l), C F(l, n).
1.1.5 Prove Theorem 1.1.
1.1.6 Let A F(n, n) (n 2) and rewrite A as
A1 A2
,
(1.1.17)
A=
A3 A4
where A1 F(k, k), A2 F(k, l), A3 F(l, k), A4 F(l, l), k, l 1,
k + l = n. Show that
At1 At3
t
A =
.
(1.1.18)
At2 At4
Vector spaces
1.1.7 Prove that the inverse of an invertible matrix is unique by showing the
fact that if A, B, C F(n, n) satisfy AB = I and CA = I then B = C.
1.1.8 Let A C(n, n). Show that A is Hermitian if and only if iA is antiHermitian.
(1.2.1)
(1.2.2)
(1.2.4)
e2 = (0, 1, 0, . . . , 0),
en = (0, 0, . . . , 0, 1).
(1.2.5)
(1.2.6)
(1.2.7)
For F(m, n), we define Mij F(m, n) to be the vector such that all its
entries vanish except that its entry at the position (i, j ) (at the ith row and j th
column) is 1, i = 1, . . . , m, j = 1, . . . , n. We have
F(m, n) = Span{Mij | i = 1, . . . , m, j = 1, . . . , n}.
(1.2.8)
The notion of spans can be extended to cover some useful situations. Let U
be a vector space and S be a (finite or infinite) subset of U . Define
Span(S) = the set of linear combinations
of all possible finite subsets of S.
(1.2.9)
(1.2.10)
(1.2.11)
10
Vector spaces
(1.2.12)
= 0,
a11 x1 + + a1n xn
(1.2.13)
.................. ... ...
am1 x1 + + amn xn = 0,
over F with unknowns x1 , . . . , xn .
Theorem 1.4 In the system (1.2.13), if m < n, then the system has a nontrivial
solution (x1 , . . . , xn )
= (0, . . . , 0).
Proof We prove the theorem by using induction on m + n.
The beginning situation is m + n = 3 when m = 1 and n = 2. It is clear that
we always have a nontrivial solution.
Assume that the statement of the theorem is true when m + n k where
k 3.
Let m + n = k + 1. If k = 3, the condition m < n implies m = 1, n = 3 and
the existence of a nontrivial solution is obvious. Assume then k 4. If all the
coefficients of the variable x1 in (1.2.13) are zero, i.e. a11 = = am1 = 0,
then x1 = 1, x2 = = xn = 0 is a nontrivial solution. So we may assume
one of the coefficients of x1 is nonzero. Without loss of generality, we assume
a11
= 0. If m = 1, there is again nothing to show. Assume m 2. Dividing the
first equation in (1.2.13) by a11 if necessary, we can further assume a11 = 1.
Then, adding the (ai1 )-multiple of the first equation into the ith equation, in
(1.2.13), for i = 2, . . . , m, we arrive at
= 0,
x1 + a12 x2 + + a1n xn
= 0,
b22 x2 + + b2n xn
(1.2.14)
..................
... ...
bm2 x2 + + bmn xn = 0.
The system below the first equation in (1.2.14) contains m 1 equations and
n 1 unknowns x2 , . . . , xn . Of course, m 1 < n 1. So, in view of the
11
(1.2.15)
for some x1 , . . . , xn F.
Since each vj Span{u1 , . . . , um }, j = 1, . . . , n, there are scalars aij F
(i = 1, . . . , m, j = 1, . . . , n) such that
vj =
m
aij ui ,
j = 1, . . . , n.
(1.2.16)
i=1
m
n
aij xj ui = 0,
i=1
(1.2.17)
i=1
(1.2.18)
j =1
aij xj = 0,
i = 1, . . . , m.
(1.2.19)
j =1
This system of equations is exactly the system (1.2.13) which allows a nontrivial solution in view of Theorem 1.4. Hence the proof follows.
We are now prepared to study in the next section several fundamental properties of vector spaces.
12
Vector spaces
Exercises
1.2.1 Let U1 and U2 be subspaces of a vector space U . Show that U1 U2 is
a subspace of U if and only if U1 U2 or U2 U1 .
1.2.2 Let Pn denote the vector space of the polynomials of degrees up to n
over a field F expressed in terms of a variable t. Show that the vectors
1, t, . . . , t n in Pn are linearly independent.
1.2.3 Show that the vectors in Fn defined in (1.2.5) are linearly independent.
1.2.4 Show that S0 defined in (1.2.2) may also be expressed as
S0 = Span{e1 en , . . . , en1 en },
(1.2.20)
1
1
1
1, , . . . , n2 , 2 n1 1 , e1 en , . . . , en1 en , (1.2.21)
2
2
2
i = 2, . . . , n;
vn = un + u1 .
(1.2.22)
13
a1 , . . . , an F.
(1.3.1)
b1 , . . . , bn F,
(1.3.2)
(1.3.3)
(1.3.4)
14
Vector spaces
(1.3.5)
(1.3.6)
Of course, a
= 0, otherwise it contradicts the assumption that u1 , . . . , un
are linearly independent. So u = (a 1 )(a1 u1 + + an un ). Thus u
Span{u1 , . . . , un }.
Definition 1.8 Let {u1 , . . . , un } be a basis of the vector space U . Given u U
there are unique scalars a1 , . . . , an F such that
u = a1 u1 + + an un .
(1.3.7)
(1.3.8)
n
i=1
aij ui ,
j = 1, . . . , n.
(1.3.9)
15
n
n
n
ai ui =
aij bj ui .
(1.3.10)
i=1
i=1
j =1
n
aij bj ,
i = 1, . . . , n.
(1.3.11)
j =1
Note that the relation (1.3.9) between bases may be formally and conveniently expressed in a matrix form as
(v1 , . . . , vn ) = (u1 , . . . , un )A,
or concisely V = U A, or
v1
.
. = At
.
vn
u1
..
. ,
(1.3.12)
(1.3.13)
un
where multiplications between scalars and vectors are made in a well defined
manner. On the other hand, the relation (1.3.11) between coordinate vectors
may be rewritten as
a1
b1
.
. = A .. ,
(1.3.14)
.
.
an
bn
or
(a1 , . . . , an ) = (b1 , . . . , bn )At .
(1.3.15)
Exercises
1.3.1 Let U be a vector space with dim(U ) = n 2 and V a subspace of U
with a basis {v1 , . . . , vn1 }. Prove that for any u U \ V the vectors
u, v1 , . . . , vn1 form a basis for U .
1.3.2 Show that dim(F(m, n)) = mn.
1.3.3 Determine dim(FS (n, n)), dim(FA (n, n)), and dim(FD (n, n)).
1.3.4 Let P be the vector space of all polynomials with coefficients in a field
F. Show that dim(P) = .
16
Vector spaces
1.3.5 Consider the vector space R3 and the bases U = {e1 , e2 , e3 } and
V = {e1 , e1 + e2 , e1 + e2 + e3 }. Find the basis transition matrix A from
U into V satisfying V = U A. Find the coordinate vectors of the given
vector (1, 2, 3) R3 with respect to the bases U and V, respectively,
and relate these vectors with the matrix A.
1.3.6 Prove that a basis transition matrix must be invertible.
1.3.7 Let U be an n-dimensional vector space over a field F where n 2
(say). Consider the following construction.
(i)
(ii)
(iii)
(iv)
Take u1 U \ {0}.
Take u2 U \ Span{u1 }.
Take (if any) u3 U \ Span{u1 , u2 }.
In general, take ui U \ Span{u1 , . . . , ui1 } (i 2).
u, v U ;
f (au) = af (u),
a F, u U.
(1.4.1)
u U.
(1.4.2)
u U.
(1.4.3)
It is a simple exercise to check that these two operations make the set of all
functionals over U a vector space over F. This vector space is called the dual
space of U , denoted by U .
Let {u1 , . . . , un } be a basis of U . For any f U and any u = a1 u1 + +
an un U , we have
n
n
f (u) = f
ai ui =
ai f (ui ).
(1.4.4)
i=1
i=1
17
(1.4.5)
n
for any u =
ai fi
i=1
n
ai ui U.
(1.4.6)
i=1
f (f1 , . . . , fn ).
(1.4.7)
Especially we may use u1 , . . . , un to denote the elements in U corresponding to the vectors e1 , . . . , en in Fn given by (1.2.5). Then we have
0,
i
= j,
ui (uj ) = ij =
i, j = 1, . . . , n.
(1.4.8)
1,
i = j,
It is clear that u1 , . . . , un are linearly independent and span U because an
element f of U satisfying (1.4.5) is simply given by
f = f1 u1 + + fn un .
(1.4.9)
In other words, {u1 , . . . , un } is a basis of U , commonly called the dual basis
of U with respect to the basis {u1 , . . . , un } of U . In particular, we have seen
that U and U are of the same dimensionality.
Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vector space U . Let their dual bases be denoted by U = {u1 , . . . , un } and
V = {v1 , . . . , vn }, respectively. Suppose that the bases U and V are related
through
uj
n
aij vi ,
j = 1, . . . , n.
(1.4.10)
i=1
ui (vj )
k=1
n
k=1
aki
vk (vj )
(1.4.11)
k=1
n
k=1
aki
kj = aj i ,
(1.4.12)
18
Vector spaces
n
aj i vi ,
j = 1, . . . , n.
(1.4.13)
i=1
u1
.
. = A
un
v1
..
. .
(1.4.14)
(1.4.15)
vn
(1.4.16)
the discussion in the previous section and the above immediately allow us
to get
bi =
n
aj i aj ,
i = 1, . . . , n.
(1.4.17)
j =1
b1
.
. = At
bn
a1
..
. .
(1.4.18)
(1.4.19)
an
Comparing the above results with those established in the previous section,
we see that, with respect to bases and dual bases, the coordinates vectors in U
and U follow opposite rules of correspondence. For this reason, coordinate
vectors in U are often called covariant vectors, and those in U contravariant
vectors.
Using the relation stated in (1.4.8), we see that we may naturally view
u1 , . . . , un as elements in (U ) = U so that they form a basis of U dual
to {u1 , . . . , un } since
uj (ui )
= ij =
0,
i
= j,
1,
i = j,
19
i, j = 1, . . . , n.
(1.4.20)
(1.4.21)
(1.4.22)
(1.4.23)
u U,
u U ,
(1.4.24)
u U,
u U .
(1.4.25)
(1.4.26)
(1.4.27)
Exercises
1.4.1 Let F be a field. Describe the dual spaces F and (F2 ) .
1.4.2 Let U be a finite-dimensional vector space. Prove that for any vectors
u, v U (u
= v) there exists an element f U such that f (u)
= f (v).
1.4.3 Let U be a finite-dimensional vector space and f, g U . For any v
U , f (v) = 0 if and only if g(v) = 0. Show that f and g are linearly
dependent.
20
Vector spaces
(1.4.28)
n
xi ,
(x1 , . . . , xn ) Fn ,
(1.4.29)
i=1
for some c F.
1.4.5 Let U = P2 and f, g, h U be defined by
f (p) = p(1),
g(p) = p(0),
h(p) = p(1),
p(t) P2 .
(1.4.30)
(1.4.31)
(1.5.1)
21
BW = {u1 , . . . , uk , w1 , . . . , wm }. (1.5.2)
(1.5.3)
(1.5.5)
(1.5.6)
is valid for the sum of any two subspaces V and W of finite dimensions in a
vector space.
Of great importance is the situation when dim(V W ) = 0 or V W = {0}.
In this situation, the sum is called direct sum, and rewritten as V W . Thus,
we have
dim(V W ) = dim(V ) + dim(W ).
(1.5.7)
22
Vector spaces
v1 , v2 V ,
w1 , w2 W.
(1.5.8)
(1.5.9)
v V,
w W,
(1.5.10)
w1 , w2 W,
v V,
w W,
(1.5.11)
a F.
(1.5.12)
It is clear that the set U of all vectors of the form (1.5.10) equipped with the
vector addition (1.5.11) and scalar multiplication (1.5.12) is a vector space over
F. Naturally we may identify V and W with the subspaces of U given by
V = {(v, 0) | v V },
W = {(0, w) | w W }.
(1.5.13)
23
(1.5.14)
(1.5.15)
i
= j,
i, j = 1, . . . , k,
(1.5.16)
(1.5.18)
Vi
Vj = {0}, i = 1, . . . , k.
1j k,j
=i
In other words, when the condition (1.5.18) is fulfilled, then (1.5.15) is valid.
The proof of this fact is left as an exercise.
Exercises
1.5.1 For
U = {(x1 , . . . , xn ) Fn | x1 + + xn = 0},
V = {(x1 , . . . , xn ) Fn | x1 = = xn },
prove that Fn = U V .
(1.5.19)
24
Vector spaces
1.5.2 Consider the vector space of all n n matrices over a field F, denoted by
F(n, n). As before, use FS (n, n) and FA (n, n) to denote the subspaces of
symmetric and anti-symmetric matrices. Assume that the characteristic
of F is not equal to 2. For any M F(n, n), rewrite M as
M=
1
1
(M + M t ) + (M M t ).
2
2
(1.5.20)
1
1
Check that (M + M t ) FS (n, n) and (M M t ) FA (n, n). Use
2
2
this fact to prove the decomposition
F(n, n) = FS (n, n) FA (n, n).
(1.5.21)
(i) Prove that if R is identified with the set of all constant functions over
[a, b] then X = R Y .
(ii) For a = 0, b = 1, and f (t) = t 2 + t 1, find the unique c R and
g Y such that f (t) = c + g(t) for all t [0, 1].
1.5.6 Let U be a vector space and V , W its subspaces such that U = V + W .
If X is subspace of U , is it true that X = (X V ) + (X W )?
1.5.7 Let U be a vector space and V , W, X some subspaces of U such that
U = V W,
U = V X.
(1.5.23)
(1.5.24)
25
(1.5.25)
is a basis of V .
(iii) There holds the dimensionality relation
dim(V ) = dim(V1 ) + + dim(Vk ).
(1.5.26)
(1.6.1)
represents the line passing through the origin and along (or opposite to) the
direction of v. More generally, for any u R2 , the coset
[u] = {u + w | w V } = {x R2 | x u V }
(1.6.2)
represents the line passing through the vector u and parallel to the vector v.
Naturally, we define [u1 ] + [u2 ] = {x + y | x [u1 ], y [u2 ]} and claim
[u1 ] + [u2 ] = [u1 + u2 ].
(1.6.3)
In fact, let z [u1 ] + [u2 ]. Then there exist x [u1 ] and y [u2 ] such that
z = x + y. Rewrite x, y as x = u1 + w1 , y = u2 + w2 for some w1 , w2 V .
Hence z = (u1 + u2 ) + (w1 + w2 ), which implies z [u1 + u2 ]. Conversely,
if z [u1 + u2 ], then there is some w V such that z = (u1 + u2 ) + w =
(u1 + w) + u2 . Since u1 + w [u1 ] and u2 [u2 ], we see that z [u1 ] + [u2 ].
Hence the claim follows.
From the property (1.6.3), we see clearly that the coset [0] = V serves as an
additive zero element among the set of all cosets.
Similarly, we may also naturally define a[u] = {ax | x [u]} for a R
where a
= 0. Note that this last restriction is necessary because otherwise 0[u]
would be a single-point set consisting of zero vector only. We claim
a[u] = [au],
a R \ {0}.
(1.6.4)
In fact, if z a[u], there is some x [u] such that z = ax. Since x [u],
there is some w V such that x = u + w. So z = au + aw which implies z
[au]. Conversely, if z [au], then there is some w V such that z = au + w.
Since z = a(u + a 1 w), we get z a[u]. So (1.6.4) is established.
26
Vector spaces
Since the coset [0] is already seen to be the additive zero when adding cosets,
we are prompted to define
0[u] = [0].
(1.6.5)
a R,
u R2 .
(1.6.6)
We may examine how the above introduced addition between cosets and
scalar multiplication with cosets make the set of all cosets into a vector space
over R, denoted by R2 /V , and called the quotient space of R2 modulo V . As
investigated, the geometric meaning of R2 /V is that it is the set of all the lines
in R2 parallel to the vector v and that these lines can be added and multiplied
by real scalars so that the set of lines enjoys the structure of a real vector space.
There is no difficulty extending the discussion to the case of R3 with V a
line or plane through the origin.
More generally, the above quotient-space construction may be formulated
as follows.
Definition 1.12 Let U be a vector space over a field F and V a subspace of U .
The set of cosets represented by u U given by
[u] = {u + w | w V } = {u} + V u + V ,
(1.6.7)
u, v U,
a[u] = [au],
a F,
u U, (1.6.8)
forms a vector space over F, called the quotient space of U modulo V , and is
denoted by U/V .
Let BV = {v1 , . . . , vk } be a basis of V . Extend BV to get a basis of U , say
BU = {v1 , . . . , vk , u1 , . . . , ul }.
(1.6.9)
a1 , . . . , al F.
(1.6.10)
(1.6.11)
27
(1.6.12)
S2 = (2, 1) + V .
(1.6.13)
(a, b, c)
= (0, 0, 0),
(1.6.15)
28
Vector spaces
(1.7.1)
n
i=1
|ai |,
where
u=
n
ai ui .
(1.7.2)
i=1
(1.7.3)
i=1
It can be shown that p also defines a norm over U . We will not check this
fact. What interests us here, however, is the limit
29
(1.7.4)
To prove (1.7.4), we note that the right-hand side of (1.7.4) is simply |ai0 |
for some i0 {1, . . . , n}. Thus, in view of (1.7.3), we have
n
1
p
1
p
up
|ai0 |
= |ai0 |n p .
(1.7.5)
i=1
(1.7.6)
(1.7.7)
Thus,
lim inf up |ai0 |.
(1.7.8)
u U.
(1.7.10)
30
Vector spaces
Theorem 1.16 Let U be a vector space and and two norms over U .
Then norm is stronger than norm if and only if there is a constant
C > 0 such that
u Cu,
u U.
(1.7.11)
k = 1, 2, . . . .
(1.7.12)
Define
vk =
1
uk ,
uk
k = 1, 2, . . . .
(1.7.13)
1
,
k
k = 1, 2, . . . .
(1.7.14)
Consequently, vk 0 as k with respect to norm . However, according to (1.7.13), we have vk = 1, k = 1, 2, . . . , and vk
0 as k with
respect to norm . This reaches a contradiction.
Definition 1.17 Let U be a vector space and and two norms over U .
We say that norms and are equivalent if convergence in norm is
equivalent to convergence in norm .
In view of Theorem 1.16, we see that norms and are equivalent if
and only if the inequality
C1 u u C2 u,
u U,
(1.7.15)
u
n
|ai |ui 2
i=1
n
31
|ai | = 2 u1 ,
(1.7.16)
i=1
(1.7.17)
u U.
(1.7.18)
Suppose otherwise that (1.7.18) fails to be valid. In other words, the set of
ratios
u
u
U,
u
=
0
(1.7.19)
u1
does not have a positive infimum. Then there is a sequence {vk } in U such that
1
vk 1 vk ,
k
vk
= 0,
k = 1, 2, . . . .
(1.7.20)
Now set
wk =
1
vk ,
vk 1
k = 1, 2, . . . .
(1.7.21)
a1,k , . . . , an,k F,
k = 1, 2, . . . . (1.7.22)
as
s ,
i = 1, . . . , n.
(1.7.23)
s = 1, 2, . . . .
(1.7.24)
32
Vector spaces
Summarizing the above study, we see that there are some constants 1 , 2 >
0 such that
1 u1 u 2 u1 ,
u U.
(1.7.25)
u U.
(1.7.26)
u U,
(1.7.27)
a1,k , . . . , an,k F,
k = 1, 2, . . . , (1.7.28)
[u] U/V ,
u U,
(1.7.29)
|u(t)| dt,
33
u C[a, b].
(1.7.30)
u C[a, b].
(1.7.32)
1.7.3 Let (U, ) be a normed space and use U to denote the dual space of
U . For each u U , define
u = sup{|u (u)| | u U, u = 1}.
(1.7.33)
(1.7.34)
2
Linear mappings
34
35
u U.
(2.1.1)
u U.
(2.1.2)
We can directly check that the mapping addition (2.1.1) and scalar-mapping
multiplication (2.1.2) make L(U, V ) a vector space over F. We adopt the
notation L(U ) = L(U, U ).
As an example, consider the space of matrices, F(m, n). For
A = (aij ) F(m, n), define
a11
TA (x) = Ax =
am1
amn
a1n
x1
..
. ,
x1
.
n
x=
.. F .
xn
xn
(2.1.3)
a11
a1n
.
.
(2.1.4)
TA (e1 ) =
.. , . . . , TA (en ) = .. .
am1
amn
In other words, the images of e1 , . . . , en under the linear mapping TA are simply the column vectors of the matrix A, respectively.
Conversely, take any element T L(Fn , Fm ). Let v1 , . . . , vn Fm be
images of e1 , . . . , en under T such that
a11
m
.
=
.
T (e1 ) = v1 =
ai1 ei , . . . ,
.
am1
i=1
36
Linear mappings
a1n
m
.
=
.
ain ei ,
T (en ) = vn =
.
(2.1.5)
i=1
amn
x1
n
n
.
T (x) = T
xj en =
xj vj = (v1 , . . . , vn )
..
j =1
a11
am1
j =1
a1n
amn
xn
x1
..
. .
(2.1.6)
xn
m
aij vi ,
aij F,
i = 1, . . . , m,
j = 1, . . . , n.
(2.1.7)
i=1
j =1
T (x) = T
n
j =1
xj uj =
n
xj T (uj ).
(2.1.8)
j =1
37
Thus we again see, after specifying bases for U and V , that we may identify
L(U, V ) with F(m, n) in a natural way.
After the above general description of linear mappings, especially their identification with matrices, we turn our attention to some basic properties of linear
mappings.
x U.
(2.1.9)
m
l
i=1 k=1
aij bki wk =
i=1
l
m
bik akj wi .
(2.1.10)
i=1 k=1
(2.1.12)
38
Linear mappings
and
((R S) T )(u) = (R S)(T (u)) = R(S(T (u)))
(2.1.13)
for any u U .
Thus, when applying (2.1.11) to the situation of linear mappings between
finite-dimensional vector spaces and using their associated matrix representations, we obtain another proof of the associativity of matrix multiplication.
(2.1.14)
(2.1.15)
(2.1.16)
(2.1.17)
(2.1.18)
39
(2.1.19)
V \ R(T ) =
[v],
(2.1.20)
where the union of cosets on the right-hand side of (2.1.20) is made over all
[v] V /R(T ) except the zero element [0]. In other words, the union of all
cosets in V /R(T ) \ {[0]} gives us the set of all such vectors v in V that the
equation (2.1.17) fails to have a solution in U . With set notation, this last statement is
[v].
(2.1.21)
{v V | T 1 (v) = } =
Thus, loosely speaking, the quotient space or cokernel V /R(T ) measures
the size of the set of vectors in V that are missed by the image of U
under T .
Definition 2.1 Let U and V be finite-dimensional vector spaces over a field F.
The nullity and rank of a linear mapping T : U V , denoted by n(T ) and
r(T ), respectively, are the dimensions of N (T ) and R(T ). That is,
n(T ) = dim(N (T )),
(2.1.22)
40
Linear mappings
(2.1.23)
(2.1.24)
We now show that T (w1 ), . . . , T (wl ) form a basis for R(T ) by establishing
their linear independence. To this end, consider b1 T (w1 ) + + bl T (wl ) = 0
for some b1 , . . . , bl F. Hence T (b1 w1 + + bl wl ) = 0 or b1 w1 + +
bl wl N(T ). So there are a1 , . . . , ak F such that b1 w1 + + bl wl =
a1 u1 + + ak uk . Since u1 , . . . , uk , w1 , . . . , wl are linearly independent, we
arrive at a1 = = ak = b1 = = bl = 0. In particular, r(T ) = l and the
rank equation is valid.
As an immediate application of the above theorem, we have the following.
Theorem 2.4 Let U, V be finite-dimensional vector spaces over a field F. If
there is some T L(U, V ) such that T is both 1-1 and onto, then there must
hold dim(U ) = dim(V ). Conversely, if dim(U ) = dim(V ), then there is some
T L(U, V ) which is both 1-1 and onto.
Proof If T is 1-1 and onto, then n(T ) = 0 and r(T ) = dim(V ). Thus
dim(U ) = dim(V ) follows from the rank equation.
41
(2.1.25)
so that
T (u) =
n
ai vi ,
u =
i=1
n
ai ui U.
(2.1.26)
i=1
(2.1.27)
so that
S(v) =
n
i=1
bi ui ,
v =
n
bi vi V .
(2.1.28)
i=1
42
Linear mappings
(2.1.29)
(2.1.30)
It is interesting that if T is known to be invertible, then the left and right inverses coincide and are simply the inverse of T . To see this, we let T 1
L(V , U ) denote the inverse of T and use the associativity of composition of
mappings to obtain from (2.1.29) and (2.1.30) the results
S = S IV = S (T T 1 ) = (S T ) T 1 = IV T 1 = T 1 ,
(2.1.31)
and
R = IU R = (T 1 T ) R = T 1 (T R) = T 1 IV = T 1 ,
(2.1.32)
respectively.
It is clear that, regarding T L(U, V ), the condition (2.1.29) implies that
T is 1-1, and the condition (2.1.30) implies that T is onto. In view of Theorem 2.5, we see that when U, V are finite-dimensional and dim(U ) = dim(V ),
then T is invertible. Thus we have S = R = T 1 . On the other hand, if
dim(U )
= dim(V ), T can never be invertible and the notion of left and right
inverses is of separate interest.
As an example, we consider the linear mappings S : F3 F2 and T :
2
F F3 associated with the matrices
1 0
1 0 0
A=
, B = 0 1 ,
(2.1.33)
0 1 0
0 0
respectively according to (2.1.3). Then we may examine that S is a left inverse
of T (thus T is a right inverse of S). However, it is absurd to talk about the
invertibility of these mappings.
Let U and V be two vector spaces over the same field. An invertible element
T L(U, V ) (if it exists) is also called an isomorphism. If there is an isomorphism between U and V , they are said to be isomorphic to each other, written
as U V .
43
Tij (uk ) = 0,
1 k n,
k
= j.
(2.1.34)
(2.1.35)
a11 a1n
MA (x) = xA = (x1 , . . . , xm ) , x Fm ,
am1
amn
(2.1.36)
(2.1.37)
necessarily a subspace of U ?
2.1.4 Let U, V be finite-dimensional vector spaces and T L(U, V ). Prove
that r(T ) dim(U ). In particular, if T is onto, then dim(U )
dim(V ).
44
Linear mappings
(2.1.38)
(2.1.39)
(2.1.40)
(2.1.41)
45
k
l
yi vi +
zj w j = 0 .
S = (y1 , . . . , yk , z1 , . . . , zl ) Fk+l
i=1
j =1
(2.1.42)
Show that dim(S) = dim(V W ).
2.1.16 Let T L(U, V ) be invertible and U = U1 Uk . Show that
there holds V = V1 Vk where Vi = T (Ui ), i = 1, . . . , k.
2.1.17 Let U be finite-dimensional and T L(U ). Show that U = N (T )
R(T ) if and only if N (T ) R(T ) = {0}.
j = 1, . . . , n.
(2.2.1)
n
bj k uj ,
k = 1, . . . , n,
(2.2.2)
j =1
where the matrix B = (bj k ) F(n, n) is called the basis transition matrix
from the basis {u1 , . . . , un } to the basis {u 1 , . . . , u n }, which is necessarily
invertible.
Let V be an m-dimensional vector space over F and take T L(U, V ).
Let A = (aij ) and A = (a ij ) be the m n matrices of the linear mapping T
associated with the pairs of the bases
{u1 , . . . , un }
and
{v1 , . . . , vm },
{u 1 , . . . , u n } and
{v1 , . . . , vm },
(2.2.3)
46
Linear mappings
m
T (u j ) =
aij vi ,
m
i=1
a ij vi ,
j = 1, . . . , n.
(2.2.4)
i=1
k=1
=
=
n
m
bkj
aik vi
k=1
i=1
n
m
aik bkj vi ,
i=1
j = 1, . . . , n.
(2.2.5)
j = 1, . . . , n,
(2.2.6)
k=1
n
i = 1, . . . , m,
aik bkj ,
k=1
or
A = AB.
(2.2.7)
m
a ij vi ,
j = 1, . . . , n.
(2.2.8)
i=1
Then, with the mapping R L(U, U ) given in (2.2.1), the second relation in
(2.2.4) simply says T = T R, which leads to (2.2.7) immediately.
Similarly, we may consider a change of basis in V . Let {v1 , . . . , vm } be
another basis of V which is related to the basis {v1 , . . . , vm } through an invertible mapping S L(V , V ) with
S(vi ) = vi ,
i = 1, . . . , m,
(2.2.9)
m
cli vl ,
i = 1, . . . , m.
(2.2.10)
l=1
and
{v1 , . . . , vm },
{u1 , . . . , un } and
{v1 , . . . , vm },
(2.2.11)
47
m
aij vi =
i=1
m
a ij vi ,
j = 1, . . . , n.
(2.2.12)
i=1
m
cil a lj ,
i = 1, . . . , m,
j = 1, . . . , n.
(2.2.13)
l=1
A = C A.
(2.2.14)
m
a ij vi ,
j = 1, . . . , n.
(2.2.15)
i=1
Then
(S T )(uj ) =
m
a ij S(vi ) =
i=1
m
a ij vi ,
j = 1, . . . , n.
(2.2.16)
i=1
j = 1, . . . , n.
(2.2.17)
n
aij ui ,
T (u j ) =
i=1
n
a ij u i ,
j = 1, . . . , n.
(2.2.18)
i=1
n
i=1
a ij ui ,
j = 1, . . . , n.
(2.2.19)
48
Linear mappings
(2.2.20)
n
bij ui ,
j = 1, . . . , n,
(2.2.21)
i=1
then (2.2.20) gives us the manner in which the two matrices A and A are
related:
1 .
A = B AB
(2.2.22)
dp(t)
, p P.
(2.2.23)
dt
(i) Find the matrix that represents D L(P2 , P1 ) with respect to the
standard bases
D(p)(t) =
1
BP
= {1, t, t 2 },
2
1
BP
= {1, t}
1
(2.2.24)
of P2 , P1 , respectively.
(ii) If the basis of P2 is changed into
2
BP
= {t 1, t + 1, (t 1)(t + 1)},
2
(2.2.25)
1
find the basis transition matrix from the basis BP
into the basis
2
2
B P2 .
49
(2.2.26)
2.2.3 Let D : Pn Pn be defined by (2.2.23) and consider the linear mapping T : Pn Pn defined by
T (p) = tD(p) p,
p = p(t) Pn .
(2.2.27)
and
(2.2.29)
a
2n a22 a21
a1n a12 a11
an1 an2 ann
50
Linear mappings
in F(n, n) are similar by realizing them as the matrix representatives of
a certain linear mapping over Fn with respect to two appropriate bases
of Fn .
u U,
(2.3.1)
u U.
(2.3.2)
(2.3.3)
u U,
v V .
(2.3.4)
u U,
i = 1, 2.
(2.3.5)
Thus
u, T (v1 ) + T (v2 ) = T (u), v1 + v2 ,
u U.
(2.3.6)
u U,
v V ,
(2.3.7)
51
(2.3.8)
m
i=1
n
aij vi ,
j = 1, . . . , n,
(2.3.9)
akl
uk ,
l = 1, . . . , m.
(2.3.10)
k=1
(2.3.11)
52
Linear mappings
(2.3.12)
(2.3.13)
Proof Using (2) in Theorem 2.8, we have n(T ) = dim(U ) r(T ). However,
applying the rank equation (2.1.23) or Theorem 2.3, we have n(T ) + r(T ) =
dim(U ). Combining these results, we have r(T ) = r(T ).
As an important application of Theorem 2.8, we consider the notion of rank
of a matrix.
Let T L(Fn , Fm ) be defined by (2.1.6) by the matrix A = (aij ) F(m, n).
Then the images of the standard basis vectors e1 , . . . , en are the column vectors
of A. Thus R(T ) is the vector space spanned by the column vectors of A whose
dimension is commonly called the column rank of A, denoted as corank(A).
Thus, we have
r(T ) = corank(A).
(2.3.14)
On the other hand, since the associated matrix of T is At whose column vectors are the row vectors of A. Hence R(T ) is the vector space spanned by the
row vectors of A (since (Fm ) = Fm ) whose dimension is likewise called the
row rank of A, denoted as rorank(A). Thus, we have
r(T ) = rorank(A).
(2.3.15)
53
Exercises
2.3.1 Let U be a finite-dimensional vector space over a field F and regard a
given element f U as an element in L(U, F). Describe the adjoint of
f , namely, f , as an element in L(F , U ), and verify r(f ) = r(f ).
2.3.2 Let U, V , W be finite-dimensional vector spaces over a field F.
Show that for T L(U, V ) and S L(V , W ) there holds
(S T ) = T S .
2.3.3 Let U, V be finite-dimensional vector spaces over a field F and T
L(U, V ). Prove that for given v V the non-homogeneous equation
T (u) = v,
(2.3.16)
has a solution for some u U if and only if for any v V such that
T (v ) = 0 one has v, v = 0. In particular, the equation (2.3.16) has
a solution for any v V if and only if T is injective. (This statement is
also commonly known as the Fredholm alternative.)
2.3.4 Let U be a finite-dimensional vector space over a field F and a F.
Define T L(U, U ) by T (u) = au where u U . Show that T is then
given by T (u ) = au for any u U .
2.3.5 Let U = F(n, n) and B U . Define T L(U, U ) by T (A) = ABBA
where A U . For f U given by
f (A) = f, A = Tr(A),
A U,
(2.3.17)
i, j, k, k = 1, . . . , n.
(2.3.18)
(ii) Apply (i) to show that for any f U there is a unique element
B U such that f (A) = Tr(AB t ).
(iii) For T L(U, U ), defined by T (A) = At , describe T .
54
Linear mappings
(2.4.1)
from T naturally.
As before, we use [] to denote a coset in U/X or V /Y . Define T : U/
X V /Y by setting
T ([u]) = [T (u)],
[u] U/X.
(2.4.2)
We begin by showing that this definition does not suffer any ambiguity by
verifying
T ([u1 ]) = T ([u2 ]) whenever [u1 ] = [u2 ].
(2.4.3)
(2.4.4)
(2.4.5)
(2.4.6)
holds.
In the special situation when X = {0} U and Y is an arbitrary subspace
of V , we see that U/X = U any T L(U, V ) induces the quotient mapping
T : U V /Y,
T (u) = [T (u)].
(2.4.7)
55
Exercises
2.4.1 Let V , W be subspaces of a vector space U . Then the quotient mapping
I : U/V U/W induced from the identity mapping I : U U is
well-defined and given by
I([u]V ) = [u]W ,
u U,
(2.4.8)
56
Linear mappings
k
i =1
bi i ui , i = 1, . . . , k, T (uj ) =
n
cj j uj , j = k + 1, . . . , n,
j =1
(2.5.1)
where B = (bi i ) F(k, k) and C = (cj j ) F(n, [n k]). With respect to
this basis, the associated matrix A F(n, n) becomes
B C1
C1
A=
,
= C.
(2.5.2)
0 C2
C2
Such a matrix is sometimes referred to as boxed upper triangular.
Thus, we see that a linear mapping T over a finite-dimensional vector space
U has a nontrivial invariant subspace if and only if there is a basis of U so that
the associated matrix of T with respect to this basis is boxed upper triangular.
For the matrix A given in (2.5.2), the vanishing of the entries in the leftlower portion of the matrix indeed reduces the complexity of the matrix. We
have seen clearly that such a reduction happens because of the invariance
property
T (Span{u1 , . . . , uk }) Span{u1 , . . . , uk }.
(2.5.3)
(2.5.4)
57
k
bi i vi , i = 1, . . . , k, T (wj ) =
i =1
l
cj j wj , j = 1, . . . , l,
j =1
(2.5.5)
where the matrices B = (bi i ) and C = (cj j ) are in F(k, k) and F(l, l),
respectively. Thus, we see that, with respect to such a basis of U , the associated matrix A F(n, n) assumes the form
B 0
A=
,
(2.5.6)
0 C
which takes the form of a special kind of matrices called boxed diagonal
matrices.
Thus, we see that a linear mapping T over a finite-dimensional vector space
U is reducible if and only if there is a basis of U so that the associated matrix
of T with respect to this basis is boxed diagonal.
An important and useful family of invariant subspaces are called
eigenspaces which may be viewed as an extension of the notion of null-spaces.
Definition 2.13 For T L(U ), a scalar F is called an eigenvalue
of T if the null-space N (T I ) is not the zero space {0}. Any nonzero
vector in N(T I ) is called an eigenvector associated with the eigenvalue
and N(T I ) is called the eigenspace of T associated with the eigenvalue , and often denoted as E . The integer dim(E ) is called the geometric
multiplicity of the eigenvalue .
In particular, for A F(n, n), an eigenvalue, eigenspace, and the eigenvectors associated with and geometric multiplicity of an eigenvalue of the
matrix A are those of the A-induced mapping TA : Fn Fn , defined by
TA (x) = Ax, x Fn .
Let be an eigenvalue of T L(U ). It is clear that E is invariant under
T and T = I (I is the identity mapping) over E . Thus, let {u1 , . . . , uk }
be a basis of E and extend it to obtain a basis for the full space U . Then
the associated matrix A of T with respect to this basis takes a boxed lower
triangular form
Ik C1
A=
,
(2.5.7)
0 C2
where Ik denotes the identity matrix in F(k, k).
58
Linear mappings
c1 , . . . , cm , cm+1 F.
(2.5.8)
(2.5.9)
Multiplying (2.5.8) by m+1 and subtracting the result from (2.5.9), we obtain
c1 (1 m+1 )u1 + + cm (m m+1 )um = 0.
(2.5.10)
(2.5.11)
(2.5.12)
(2.5.13)
(2.5.14)
59
(2.5.15)
(2.5.16)
(2.5.17)
E3 = Span{(1, 1)t }.
(2.5.18)
Thus, with respect to the basis {(1, 1)t , (1, 1)t }, the associated matrix of T is
1 0
.
(2.5.19)
0 3
So T is reducible and reduced over the pair of eigenspaces E1 and E3 .
Next, consider S L(R2 ) defined by
0 1
x1
(2.5.20)
R2 .
S(x) =
x, x =
1 0
x2
We show that S has no nontrivial invariant spaces. In fact, if V is one, then
dim(V ) = 1. Let x V such that x
= 0. Then the invariance of V requires
S(x) = x for some R. Inserting this relation into (2.5.20), we obtain
x2 = x1 ,
x1 = x2 .
(2.5.21)
Hence x1
= 0, x2
= 0. Iterating the two equations in (2.5.21), we obtain
2 + 1 = 0 which is impossible. So S is also irreducible.
60
Linear mappings
2.5.2 Projections
In this subsection, we study an important family of reducible linear mappings
called projections.
Definition 2.15 Let V and W be two complementary subspaces of U . That is,
U = V W . For any u U , express u uniquely as u = v + w, v V , w W ,
and define the mapping P : U U by
P (u) = v.
(2.5.22)
W = N (T ),
(2.5.23)
R(I T ) = N (T ).
(2.5.24)
61
(2.5.25)
Since
(I P )2 = I 2P + P 2 ,
(2.5.26)
62
Linear mappings
(2.5.28)
(2.5.29)
d
so that D(a0 + a1 t + + an t n ) = a1 + 2a2 t + + nan t n1 .
dt
(2.5.30)
63
(2.5.31)
(2.5.32)
are linearly independent so that they span an m-dimensional T -invariant subspace of U . In particular, m dim(U ).
Proof We only need to show that the set of vectors in (2.5.32) are linearly
independent. If m = 1, the statement is self-evident. So we now assume m 2.
Let c0 , . . . , cm1 F so that
c0 u + c1 T (u) + + cm1 T m1 (u) = 0.
(2.5.33)
(2.5.34)
(2.5.35)
(2.5.36)
64
Linear mappings
(2.5.37)
(2.5.38)
i = 1, . . . , n 1,
T (un ) = 0.
(2.5.39)
Hence, if we use S = (sij ) to denote the matrix of T with respect to the basis
B, then
1, j = i 1, i = 2, . . . , n,
(2.5.40)
sij =
0, otherwise.
That is,
S=
1
..
.
0
..
.
..
.
..
.
0
..
.
(2.5.41)
(2.5.42)
0 1
0 0
0 0
1 0
.. . .
..
..
..
.
.
.
S= .
.
.
.
..
.
.
.
0 1
0
65
(2.5.43)
dim(U1 ) = n1 , . . . , dim(Ul ) = nl ,
(2.5.44)
(2.5.45)
S1 0 0
..
..
.
0
.
0
,
S=
(2.5.46)
..
..
..
...
.
.
.
0
0 Sl
where each Si is an ni ni shift matrix (i = 1, . . . , l).
Let k be the degree of T or the matrix S. From (2.5.46), we see clearly that
the relation
k = max{n1 , . . . , nl }
(2.5.47)
holds.
66
Linear mappings
a0 , a1 , . . . , an F,
(2.5.48)
(2.5.49)
(2.5.50)
because the powers of T follow the same rule as the powers of t. That is,
T k T l = T k+l , k, l N.
For T L(U ), let F be an eigenvalue of T . Then, for any p(t) P
given as in (2.5.48), p() is an eigenvalue of p(T ). To see this, we assume that
u U is an eigenvector of T associated with . We have
p(T )(u) = (an T n + + a1 T + a0 I )(u)
= (an n + + a1 + a0 )(u) = p()u,
(2.5.51)
as anticipated.
If p P is such that p(T ) = 0, then we say that T is a root of p. Hence, if
T is a root of p, any eigenvalue of T must also be a root of p, p() = 0, by
virtue of (2.5.51).
For example, an idempotent mapping is a root of the polynomial p(t) =
t 2 t, and a nilpotent mapping is a root of a polynomial of the form p(t) = t m
(m N). Consequently, the eigenvalues of an idempotent mapping can only
be 0 and 1, and that of a nilpotent mapping, 0.
For T L(U ), let 1 , . . . , k be some distinct eigenvalues of T
such that T reduces over E1 , . . . , Ek . Then T must be a root of the
polynomial
p(t) = (t 1 ) (t k ).
(2.5.52)
p(T )u =
k
67
p(T )ui
i=1
k
(T 1 I ) (T
i I ) (T k I )(T i I )ui
i=1
= 0,
(2.5.53)
(2.5.54)
p(t) = a0 + a1 t + + an t n P.
(2.5.56)
aij = a F,
i = 1, . . . , n.
(2.5.57)
68
Linear mappings
(i) Show that a must be an eigenvalue of A and that a
= 0.
(ii) Show that if A1 = (bij ) then
n
bij =
j =1
1
,
a
i = 1, . . . , n.
(2.5.58)
2.5.6 Let S, T L(U ) and assume that S, T are similar. That is, there is an
invertible element R L(U ) such that S = R T R 1 . Show that
F is an eigenvalue of S if and only if it is an eigenvalue of T .
2.5.7 Let U = F(n, n) and T L(U ) be defined by taking matrix transpose,
T (A) = At ,
A U.
(2.5.59)
1 1 1
T (x) = 1 1 1 x,
1
3 1
x1
x = x2 R3 .
(2.5.60)
x3
69
u U.
(2.5.61)
T
= 0,
a F,
a
= 0, 1.
(2.5.62)
k 2.
(2.5.63)
(2.5.64)
Show that T = I .
2.5.22 Let U be a finite-dimensional vector space over a field F and T L(U )
satisfying
T3 = T
so that 1 cannot be the eigenvalues of T . Show that T = 0.
(2.5.65)
70
Linear mappings
(2.5.66)
with u U a T -cyclic vector of period k, and W is an (l 1)dimensional subspace of N (T ), such that T is reducible over the
pair V and W .
(ii) Describe R(T ) and determine r(T ).
2.5.26 Let U be a finite-dimensional vector space over a field F and S, T
L(U ).
(i) Show that if S is invertible and T nilpotent, S T must also be
invertible provided that S and T commute, S T = T S.
(ii) Find an example to show that the condition in (i) that S and T
commute cannot be removed.
2.5.27 Let U be a finite-dimensional vector space, T L(U ), and V a subspace of U which is invariant under T . Show that if U = R(T ) V
then V = N (T ).
71
i=1
1in
(2.6.1)
where we have used the fact that norms over a finite-dimensional space are all
equivalent. This estimate may also be restated as
T (u)V
C,
uU
u U,
u
= 0.
(2.6.2)
u U.
(2.6.4)
(2.6.5)
72
Linear mappings
u U.
(2.6.7)
Consequently, we get
S T ST .
(2.6.8)
u U.
(2.6.9)
73
(2) If T L(U ) is invertible, then there is some > 0 such that S L(U ) is
invertible whenever S satisfies S T < . This property says that the
subset of invertible mappings in L(U ) is open in L(U ) with respect to the
norm of L(U ).
Proof For any scalar , consider S = T I . If dim(U ) = n, there are at
most n possible values of for which S is not invertible. Now T S =
||I = ||. So for any > 0, there is a scalar , || < , such that S is
invertible. This proves (1).
We next consider (2). Let T L(U ) be invertible. Then (2.6.9) holds for
some c > 0. Let S L(U ) be such that S T < for some > 0. Then,
for any u U , we have
cu T (u) = (T S)(u) + S(u)
T Su + S(u) u + S(u), u U,
(2.6.10)
m
1 k
T ,
k!
(2.6.11)
k=0
m
T k
.
k!
(2.6.12)
k=l+1
lim
(2.6.13)
k=0
1 k
e =
T ,
k!
T
k=0
(2.6.14)
74
Linear mappings
(2.6.15)
(2.6.16)
hk
=T
T k etT
(k + 1)!
k=0
hk
= etT
T k T , h R, h
= 0. (2.6.17)
(k + 1)!
k=0
(2.6.18)
(t) =
d tT
e = T etT = etT T = T
(t) =
(t)T ,
dt
t R.
(2.6.19)
In particular,
(0) = T , which intuitively says that T is the initial rate of
change of the one-parameter group
. One also refers to this relationship as
T generates
or T is the generator of
.
The relation (2.6.19) suggests that it is legitimate to differentiate the series
1 k k
(t) =
t T term by term:
k!
k=0
d 1 k k 1 d k k
1 k k
t T =
(t )T = T
t T ,
dt
k!
k! dt
k!
k=0
k=0
k=0
t R.
(2.6.20)
75
cosh T =
$
1# T
e eT ,
2
$
1 # iT
e eiT ,
sin T =
2i
sinh T =
(2.6.21)
(2.6.22)
are all well defined and enjoy similar properties of the corresponding classical
functions.
The matrix version of the discussion here can also be easily formulated analogously and is omitted.
Exercises
2.6.1 Let U be a finite-dimensional normed space and {Tn } L(U ) a
sequence of invertible mappings that converges to a non-invertible
mapping T L(U ). Show that Tn1 as n .
2.6.2 Let U and V be finite-dimensional normed spaces with the norms U
and V , respectively. For T L(U, V ), show that the induced norm
of T may also be evaluated by the expression
T = sup{T (u)V | u U, uU = 1}.
(2.6.23)
x1
.
n
|aij | .
T = max
(2.6.25)
1im
j =1
76
Linear mappings
T 1 = max
1j n
m
|aij | .
(2.6.26)
i=1
2.6.8 Let A C(n, n). Show that (eA ) = eA and that, if A is antiHermitian, A = A, then eA is unitary.
2.6.9 Let A R(n, n) and consider the initial value problem of the following
system of differential equations
x1 (t)
.
x = x(t) =
.. ;
dx
= Ax,
dt
x1,0
.
x(0) = x0 =
.. .
xn (t)
xn,0
(2.6.27)
(2.6.28)
(This result provides a practical method for computing the matrix exponential, etA , which may also be viewed as the solution of the matrixvalued initial value problem
dX
= AX,
dt
X(0) = In ,
X R(n, n).)
(t) =
, t R.
sin t
cos t
(2.6.29)
(2.6.30)
77
k
t
k=0
k!
Ak
(2.6.31)
and verify
(t) = etA .
(iv) Use the practical method illustrated in Exercise 2.6.9 to obtain
the matrix exponential etA through solving two appropriate initial
value problems as given in (2.6.27).
2.6.11 For the functions of T L(U ) defined in (2.6.21) and (2.6.22), establish the identities
cosh2 T sinh2 T = I,
cos2 T + sin2 T = I.
(2.6.32)
d
(cos tT ) = T sin tT .
dt
(2.6.33)
(2.6.34)
3
Determinants
(3.1.1)
Thus, we may easily resolve the vector v along the direction of u and the
direction perpendicular to u as follows
%
# %
&$
&
, sin
v = (b1 , b2 ) = c1 (cos , sin ) + c2 cos
2
2
(3.1.2)
= (c1 cos c2 sin , c1 sin c2 cos ), c1 , c2 R.
Here c2 may be interpreted as the length of the vector in the resolution that is
taken to be perpendicular to u. Hence, from (3.1.2), we can read off the result
c2 = (b2 cos b1 sin ) = |b2 cos b1 sin |.
78
(3.1.3)
79
Therefore, using (3.1.3) and then (3.1.1), we see that the area of the parallelogram under consideration is given by
= c2 u = |u cos b2 u sin b1 | = |a1 b2 a2 b1 |.
(3.1.4)
Thus we see that the quantity a1 b2 a2 b1 formed from the vectors (a1 , a2 )
and (b1 , b2 ) stands out, that will be called the determinant of the matrix
a1 a2
A=
,
(3.1.5)
b1 b2
written as det(A) or denoted by
a
1
b1
a2
.
b2
(3.1.6)
Since det(A) = , it is also referred to as the signed area of the parallelogram formed by the vectors (a1 , a2 ) and (b1 , b2 ).
We now consider volume. We shall apply some vector algebra over R3 to
facilitate our discussion.
We use and to denote the usual dot and cross products between vectors
in R3 . We use i, j, k to denote the standard mutually orthogonal unit vectors in
R3 that form a right-hand system. For any vectors
u = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k,
v = (b1 , b2 , b3 ) = b1 i + b2 j + b3 k,
(3.1.7)
in R3 , we know that
u v = (a2 b3 a3 b2 )i (a1 b3 a3 b2 )j + (a1 b2 a2 b1 )k
(3.1.8)
is perpendicular to both u and v and u v gives us the area of the parallelogram formed from using u, v as two adjacent edges, which generalizes the
preceding discussion in R2 . To avoid the trivial situation, we assume u and v
are linearly independent. So u v
= 0 and
R3 = Span{u, v, u v}.
(3.1.9)
a, b, c R.
(3.1.10)
From the geometry of the problem, we see that the volume of the parallelepiped
formed from using u, v, w as adjacent edges is given by
= u vc(u v) = |c|u v2
(3.1.11)
80
Determinants
because c(u v) is the height of the parallelepiped, with the bottom area
u v.
From (3.1.10), we have
w (u v) = cu v2 .
(3.1.12)
(3.1.13)
c1 c2 c3
(3.1.14)
A = a1 a2 a3 ,
b1
b2
b3
(3.1.16)
(3.1.18)
81
(3.1.19)
Thus, with the notation of determinants and in view of (3.1.18) and (3.1.19),
we may express the solution to (3.1.17) elegantly as
c a
a c
2
1
1 1
a a
c2 b2
b1 c2
1
2
, x2 =
, if
x1 =
(3.1.20)
= 0.
a a
a a
b1 b2
2
2
1
1
b1 b2
b1 b2
The extension of these formulas to 3 3 systems will be assigned as an
exercise.
(3.1.21)
The function f maps the interval [, ] to cover the interval [a, b].
At a point t0 [, ] where the derivative of f is positive, f (t0 ) > 0, f
maps a small interval around t0 onto a small interval around f (t0 ) and preserves the orientation (from left to right) of the intervals; if f (t0 ) < 0, it
reverses the orientation. If f (t0 )
= 0, we say that t0 is a regular point of f .
For c [a, b], if f (t)
= 0 for any t f 1 (c), we call c a regular value of
f . It is clear that, f 1 (c) is a finite set when c is a regular value of f . If c is a
regular value of f , we define the integer
N (f, c) = N + (f, c) N (f, c),
(3.1.22)
where
N (f, c) = the number of points in f 1 (c) where f > 0.
(3.1.23)
If f (t)
= 0 for any t (, ), then N (f, c) = 1 and N (f, c) = 0
according to f (t) > 0 (t (, )), which leads to N (f, c) = 1 according
to f (t) > 0 (t (, )). In particular, N (f, c) is independent of c.
If f = 0 at some point, such a point is called a critical point of f . We
assume further that f has finitely many critical points in (, ), say t1 , . . . , tm
82
Determinants
f () = a, f () = b,
1,
N(f, c) =
(3.1.25)
1,
f () = b, f () = a,
0,
f () = f ().
This quantity may summarily be rewritten into the form of an integral
1
1
N(f, c) =
f (t) dt,
(3.1.26)
(f () f ()) =
ba
ba
and interpreted to be the number count for the orientation-preserving times
minus the orientation-reversing times the function f maps the interval [, ]
to cover the interval [a, b]. The advantage of using the integral representation
for N(f, c) is that it is clearly independent of the choice of the regular value
c. Indeed, the right-hand side of (3.1.26) is well defined for any differentiable
function f not necessarily having only finitely many critical points and the
specification of a regular value c for f becomes irrelevant. In fact, the integral
on the right-hand side of (3.1.26) is the simplest topological invariant called
the degree of the function f , which may now be denoted by
1
deg(f ) =
f (t) dt.
(3.1.27)
ba
The word topological is used to refer to the fact that a small alteration of f
cannot perturb the value of deg(f ) since deg(f ) may take only integer values and the right-hand side of (3.1.27), however, relies on the derivative of f
continuously.
As a simple application, we note that it is not hard to see that for any c
[a, b] the equation f (t) = c has at least one solution when deg(f )
= 0.
We next extend our discussion of topological invariants to two-dimensional
situations.
Let and C be two closed differentiable curves in R2 oriented counterclockwise and let
u:C
(3.1.28)
83
f 2 + g 2 = 1,
(3.1.30)
(3.1.31)
(3.1.34)
(x, y) R2 .
(3.1.35)
84
Determinants
Let SR1 denote the circle of radius R > 0 in R2 centered at the origin. We
may parameterize SR1 by the polar angle : x = R cos , y = R sin ,
[0, 2 ]. With (3.1.35), we have
1 2
(x y 2 , 2xy)
R2
= (cos2 sin2 , 2 cos sin ) = (cos 2, sin 2 ).
vS 1 =
R
(3.1.36)
d = 2. (3.1.37)
2 cos 2
sin 2
For any closed curve enclosing but not intersecting the origin, we can
continuously deform it into a circle SR1 (R > 0), while staying away from
the origin in the process. By continuity or topological invariance, we obtain
ind(v| ) = ind(v|S 1 ) = 2. The meaning of this result will be seen in the
R
following theorem.
Theorem 3.1 Let v be a vector field that is differentiable over a bounded
domain
in R2 and let be a closed curve contained in
. If v
= 0 on
and ind(v| )
= 0, then there must be at least one point enclosed inside
where v = 0.
Proof Assume otherwise that v
= 0 in the domain enclosed by . Let
be another closed curve enclosed inside . Since may be obtained from
through a continuous deformation and v = 0 nowhere inside , we have
ind(v| ) = ind(v| ). On the other hand, if we parameterize the curve using
its arclength s, then
1
1
v (s) ds, v =
v , (3.1.38)
ind(v| ) = deg(v ) =
2
v
where (s) is the unit tangent vector along the unit circle S 1 at the image point
v (s) under the map v : S 1 . Rewrite v as
v (s) = (f (x(s), y(s)), g(x(s), y(s))).
(3.1.39)
Then
v (s) = fx x (s) + fy y (s), gx x (s) + gy y (s) .
(3.1.40)
The assumption on v gives us the uniform boundedness |fx |, |fy |, |gx |, |gy |
inside . Using this property and (3.1.40), we see that there is a -independent
constant C > 0 such that
85
(3.1.41)
(3.1.42)
which leads to absurdness when the total arclength | | of the curve is made
small enough.
Thus, returning to the example (3.1.35), we conclude that the vector field v
has a zero inside any circle SR1 (R > 0) since we have shown that ind(v|S 1 ) =
R
2
= 0, which can only be the origin as seen trivially in (3.1.35) already.
We now use Theorem 3.1 to establish the celebrated Fundamental Theorem
of Algebra as stated below.
Theorem 3.2 Any polynomial of degree n 1 with coefficients in C of the
form
f (z) = an zn + an1 zn1 + + a0 , a0 , . . . , an1 , an C, an
= 0,
(3.1.43)
must have a zero in C. That is, there is some z0 C such that f (z0 ) = 0.
Proof Without loss of generality and for sake of simplicity, we may assume
an = 1 otherwise we may divide f (z) by an .
Let z = x + iy, x, y R, and write f (z) as
f (z) = P (x, y) + iQ(x, y),
(3.1.44)
(3.1.45)
Then it is clear that v(x, y) = |f (z)| and it suffices to show that v vanishes
at some (x0 , y0 ) R2 .
In order to simplify our calculation, we consider a one-parameter deformation of f (z) given by
f t (z) = zn + t (an1 zn1 + + a0 ),
t [0, 1],
(3.1.46)
and denote the correspondingly constructed vector field by v t (x, y). So on the
circle SR1 = {(x, y) R2 | (x, y) = |z| = R} (R > 0), we have the uniform
lower estimate
86
Determinants
v t (x, y) = |f t (z)|
1
1
R n 1 |an1 | |a0 | n
R
R
C(R),
t [0, 1].
(3.1.47)
Thus, when R is sufficiently large, we have C(R) 1 (say). For such a choice
of R, by topological invariance, we have
#
$
#
$
#
$
ind v|S 1 = ind v 1 |S 1 = ind v 0 |S 1 .
(3.1.48)
R
On the other hand, over SR1 we may again use the polar angle : x =
R cos , y = R sin , or z = Rei , to represent f 0 as f 0 (z) = R n ein . Hence
v 0 = R n (cos n, sin n ). Consequently,
vS0 1 =
R
1 0
v = (cos n, sin n ).
v 0
d = n.
n cos n
sin n
(3.1.49)
(3.1.50)
rem 3.1, we conclude that v must vanish somewhere inside the circle SR1 .
Use to denote a closed surface in R3 and S 2 the standard unit sphere in R3 .
We may also consider a map u : S 2 . Since the orientation of S 2 is given
by its unit outnormal vector, say , we may analogously express the number
count, for the number of times that u covers S 2 in an orientation-preserving
manner minus the number of times that u covers S 2 in an orientation-reversing
manner, in the form of a surface integral, also called the degree of the map
u : S 2 , by
1
deg(u) = 2
d,
(3.1.51)
|S |
where d is the vector area element over S 2 induced from the map u.
To further facilitate computation, we may assume that is parameterized
by the parameters s, t over a two-dimensional domain
and u = (f, g, h),
where f, g, h are real-valued functions of s, t so that f 2 + g 2 + h2 = 1. At the
image point u, the unit outnormal of S 2 at u, is simply u itself. Moreover, the
vector area element at u under the mapping u can be represented as
87
u u
dsdt.
s
t
(3.1.52)
dsdt
deg(u) =
4
s
t
f g h
1
=
fs gs hs dsdt.
4
f g h
t
t
t
(3.1.53)
a1 x1 + a2 x2 + a3 x3
b1 x1 + b2 x2 + b3 x3
c1 x1 + c2 x2 + c3 x3
d1 ,
d2 ,
d3 ,
(3.1.54)
(3.1.56)
88
Determinants
2x
2y
1 x2 y2
,
g
=
,
h
=
, (x, y) R2 ,
1 + x2 + y2
1 + x2 + y2
1 + x2 + y2
(3.1.58)
3.1.6 The hedgehog map is a map S 2 S 2 defined in terms of the parameterization of R2 by polar coordinates r, by the expression
u = (cos(n ) sin f (r), sin(n ) sin f (r), cos f (r)),
(3.1.60)
d dr
(3.1.61)
deg(u) =
4 0
r
0
and explain your result.
89
i, j = 1, . . . , n.
(3.2.1)
n
(3.2.2)
ai1 Ci1 .
i=1
The formula (3.2.2) is also referred to as the cofactor expansion of the determinant according to the first column.
This definition indicates that if a column of an n n matrix A is zero then
det(A) = 0. To show this, we use induction. When n = 1, it is trivial. Assume
that the statement is true at n 1 (n 2). We now prove the statement at
n (n 2). In fact, if the first column of A is zero, then det(A) = 0 simply
by the definition of determinant (see (3.2.2)); if another column rather than
the first column of A is zero, then all the cofactors Ci1 vanish by the inductive
assumption, which still results in det(A) = 0. The definition also implies that if
A = (aij ) is upper triangular then det(A) is the product of its diagonal entries,
det(A) = a11 ann , as may be shown by induction as well.
The above definition of a determinant immediately leads to the following
important properties.
Theorem 3.4 Consider the n n matrices A, B given as
a11
A = ak1
an1
...
...
...
a1n
akn ,
ann
a11
B = rak1
an1
...
...
...
a1n
rakn .
ann
(3.2.3)
90
Determinants
B
A
Ci1
= rCi1
,
i
= k.
(3.2.4)
Therefore, we arrive at
B
B
A
A
ai1 Ci1
+ rak1 Ck1
=
ai1 rCi1
+ rak1 Ck1
= r det(A).
det(B) =
i
=k
i
=k
(3.2.5)
The proof is complete.
This theorem implies that if a row of a matrix A is zero then det(A) = 0.
As an application, we show that if an n n matrix A = (aij ) is lower
triangular then det(A) = a11 ann . In fact, when n = 1, there is nothing
to show. Assume that the formula is true at n 1 (n 2). At n 2, the
first row of the minor Mi1 of A vanishes for each i = 2, . . . , n. So Mi1 = 0,
i = 2, . . . , n. However, the inductive assumption gives us M11 = a22 ann .
Thus det(A) = a11 (1)1+1 M11 = a11 ann as claimed.
Therefore, if an n n matrix A = (aij ) is either upper or lower triangular,
we infer that there holds det(At ) = det(A), although, later, we will show that
such a result is true for general matrices.
Theorem 3.5 For the n n matrices A = (aij ), B = (bij ), C = (cij ) which
have identical rows except the kth row in which ckj = akj + bkj , j = 1, . . . , n,
we have det(C) = det(A) + det(B).
Proof We again use induction on n.
The statement is clear when n = 1.
Assume that the statement is valid for the n 1 case (n 2).
For A, B, C given in the theorem with n 2, with the notation in the proof
of Theorem 3.4 and in view of the inductive assumption, we have
C
A
Ck1
= Ck1
;
C
A
B
Ci1
= Ci1
+ Ci1
,
i
= k.
(3.2.6)
Consequently,
det(C) =
91
C
C
ai1 Ci1
+ ck1 Ck1
i
=k
A
B
A
ai1 (Ci1
+ Ci1
) + (ak1 + bk1 )Ck1
i
=k
A
A
ai1 Ci1
+ ak1 Ck1
+
i
=k
B
A
ai1 Ci1
+ bk1 Ck1
i
=k
= det(A) + det(B),
(3.2.7)
as asserted.
Theorem 3.6 Let A, B be two n n (n 2) matrices so that B is obtained
from interchanging any two rows of A. Then det(B) = det(A).
Proof We use induction on n.
At n = 2, we can directly check that the statement of the theorem is true.
Assume that the statement is true at n 1 2.
Let A, B be n n (n 3) matrices given by
a11 . . . a1n
a11 . . . a1n
a
i1 . . . ain
j 1 . . . aj n
A=
(3.2.8)
, B = ,
aj 1 . . . aj n
ai1 . . . ain
an1 . . . ann
an1 . . . ann
where j = i + k for some k 1. We observe that it suffices to prove the
adjacent case when k = 1 because when k 2 we may obtain B from A simply by interchanging adjacent rows k times downwardly and then k 1 times
upwardly, which gives rise to an odd number of adjacent row interchanges.
For the adjacent row interchange, j = i +1, the inductive assumption allows
us to arrive at the following relations between the minors of the matrices A and
B immediately,
B
A
= Mk1
,
Mk1
k
= i, i + 1;
B
A
Mi1
= Mi+1,1
,
B
A
Mi+1,1
= Mi1
, (3.2.9)
which implies that the corresponding cofactors of A and B all differ by a sign,
B
A
= Ck1
,
Ck1
k
= i, i + 1;
B
A
Ci1
= Ci+1,1
,
B
A
Ci+1,1
= Ci1
. (3.2.10)
92
Determinants
Hence
det(B) =
B
B
B
ak1 Ck1
+ ai+1,1 Ci1
+ ai1 Ci+1,1
k
=i,i+1
A
A
A
ak1 (Ck1
) + ai+1,1 (Ci+1,1
) + ai1 (Ci1
)
k
=i,i+1
= det(A),
(3.2.11)
as expected.
This theorem indicates that if two rows of an n n (n 2) matrix A are
identical then det(A) = 0. Thus adding a multiple of a row to another row of
A does not alter the determinant ofA:
a11 . . . a1n
...
a1n
a11
a
...
ain
ai1
i1 . . . ain
=
rai1 + aj 1 . . . rain + aj n rai1 . . . rain
...
ann
an1
an1 . . . ann
a11 . . . a1n
a
i1 . . . ain
(3.2.12)
+ = det(A).
aj 1 . . . aj n
an1 . . . ann
The above results provide us with practical computational techniques when
evaluating the determinant of an n n matrix A. In fact, we may perform the
following three types of permissible row operations on A.
(1) Multiply a row of A by a nonzero scalar. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
multiplying the corresponding row of the n n identity matrix I by the
same scalar.
(2) Interchange any two rows of A when n 2. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
interchanging the corresponding two rows of the n n identity matrix I .
93
(3) Add a multiple of a row to another row of A when n 2. Such an operation may also be realized by multiplying A from the left by the matrix
obtained from adding the same multiple of the row to another row, correspondingly, of the n n identity matrix I .
The matrices constructed in the above three types of permissible row operations are called elementary matrices of types 1, 2, 3. Let E be an elementary
matrix of a given type. Then E is invertible and E 1 is of the same type. More
precisely, if E is of type 1 and obtained from multiplying a row of I by the
scalar r, then det(E) = r det(I ) = r and E 1 is simply obtained from multiplying the same row of I by r 1 , resulting in det(E 1 ) = r 1 ; if E is of type
2, then E 1 = E and det(E) = det(E 1 ) = det(I ) = 1; if E is of type 3
and obtained from adding an r multiple of the ith row to the j th row (i
= j )
of I , then E 1 is obtained from adding a (r) multiple of the ith row to the
j th row of I and det(E) = det(E 1 ) = det(I ) = 1. In all cases,
det(E 1 ) = det(E)1 .
(3.2.13)
In conclusion, the properties of determinant under permissible row operations may be summarized collectively as follows.
Theorem 3.7 Let A be an n n matrix and E be an elementary matrix of the
same dimensions. Then
det(EA) = det(E) det(A).
(3.2.14)
(3.2.15)
94
Determinants
the row interchange there is replaced with column interchange. That is, we
show that if two columns in an n n matrix A are interchanged, its determinant will change sign. This property is not so obvious since our definition of
determinant is based on the cofactor expansion by the first column vector and
an interchange of the first column with another column alters the first column
of the matrix. The effect of the value of determinant with respect to such an
alteration needs to be examined closely, which will be our task below.
We still use induction.
At n = 2 the conclusion may be checked directly.
Assume the conclusion holds at n 1 1.
We now prove the conclusion at n 3. As before, it suffices to establish the
conclusion for any adjacent column interchange.
If the column interchange does not involve the first column, we see that the
conclusion about the sign change of the determinant clearly holds in view of
the inductive assumption and the cofactor expansion formula (3.2.2) since all
the cofactors Ci1 (i = 1, . . . , n) change their sign exactly once under any pair
of column interchange.
Now consider the effect of an interchange of the first and second columns
of A. It can be checked that such an operation may be carried out through
multiplying A by the matrix F from the right where F is obtained from the
n n identity matrix I by interchanging the first and second columns of I . Of
course, det(F ) = 1.
Let E1 , . . . , Ek be a sequence of elementary matrices and U = (uij ) an
upper triangular matrix so that Ek E1 A = U . Then we have
UF = 0
(3.2.16)
0 u33 u3n .
0
0 unn
Thus the cofactor expansion formula down the first column, as stated in Definition 3.3, and the inductive assumption at n 1 lead us to the result
0 u23
u11 u13
0 u33
0 u33
det(U F ) = u11
u22
0 unn
0 unn
= u11 u22 unn = det(U ).
(3.2.17)
95
(3.2.18)
(3.2.19)
(3.2.20)
96
Determinants
Proof
(3.2.21)
is an upper triangular matrix. Thus from U t = At E1t Ekt and Theorem 3.9
we have
det(U ) = det(U t ) = det(At E1t Ekt ) = det(At ) det(E1t ) det(Ekt ).
(3.2.22)
Comparing (3.2.22) with (3.2.15) and noting det(El ) = det(Elt ), l = 1, . . . , k,
because an elementary matrix is either symmetric or lower or upper triangular,
we arrive at det(A) = det(At ) as claimed.
Next we show that a determinant preserves matrix multiplication.
Theorem 3.11 Let A and B be two n n matrices. Then
det(AB) = det(A) det(B).
(3.2.23)
(3.2.24)
(3.2.25)
97
The formula (3.2.23) can be used immediately to derive a few simple but
basic conclusions about various matrices.
For example, if A F(n, n) is invertible, then there is some B F(n, n)
such that AB = In . Thus det(A) det(B) = det(AB) = det(In ) = 1, which
implies that det(A)
= 0. In other words, the condition det(A)
= 0 is necessary for any A F(n, n) to be invertible. In the next section, we shall show
that this condition is also sufficient. As another example, if A R(n, n) is orthogonal, then AAt = In . Hence (det(A))2 = det(A) det(At ) = det(AAt ) =
det(In ) = 1. In other words, the determinant of an orthogonal matrix can only
take values 1. Similarly, if A C(n, n) is unitary, then the condition
AA = In leads us to the conclusion
t
n
i=1
n
aik Cik
akj Ckj ,
k = 1, . . . , n.
(3.2.27)
j =1
(3.2.28)
98
Determinants
n
(3.2.29)
i=1
n
j =1
n
j =1
which establishes the legitimacy of the cofactor expansion formula along the
first row of A. The validity of the cofactor expansion along an arbitrary row
may be proved by induction as done for the column case.
Assume that A F(n, n) takes a boxed upper triangular form,
A1 A3
,
A=
0 A2
(3.2.31)
where A1 F(k, k), A2 F(l, l), A3 F(k, l), and k + l = n. Then we have
the useful formula
det(A) = det(A1 ) det(A2 ).
(3.2.32)
(3.2.34)
99
(3.2.35)
where A1 F(k, k), A2 F(l, l), A3 F(l, k), and k + l = n, then (3.2.32)
still holds. Indeed, taking transpose in (3.2.35), we have
At1 At3
t
,
(3.2.36)
A =
0 At2
which becomes a boxed upper triangular matrix studied earlier. Thus, using
Theorem 3.10 and (3.2.32), we obtain
det(A) = det(At ) = det(At1 ) det(At2 ) = det(A1 ) det(A2 ),
(3.2.37)
as anticipated.
Exercises
3.2.1 For A C(n, n) use Definition 3.3 to establish the property
det(A) = det(A),
(3.2.38)
.
.
..
..
.
.
.
.
.
..
..
..
.
.
.
0
.
,
,
.
.
.
.
.
.
..
..
..
..
..
..
..
0
an1 0
an1 an2 ann
0
(3.2.39)
respectively. Establish the formulas to express the determinants of these
matrices in terms of the anti-diagonal entries an1 , . . . , a1n .
100
Determinants
1
det
1
1
y
0
1 1
0 0
= txyz yz tz ty,
z 0
0 t
x, y, z, t R.
(3.2.40)
3.2.5 Let A, B F(3, 3) and assume that the first and second columns of
A are same as the first and second columns of B. If det(A) = 5 and
det(B) = 2, find det(3A 2B) and det(3A + 2B).
3.2.6 (Extension of Exercise 3.2.5) Let A, B F(n, n) such that only the j th
columns of them are possibly different. Establish the formula
det(aA + bB) = (a + b)n1 (a det(A) + b det(B)) ,
a, b F.
(3.2.41)
3.2.7 Let A(t) = (aij (t)) R(n, n) be such that each entry aij (t) is a differentiable function of t R. Establish the differentiation formula
n
daij (t)
d
(3.2.42)
det(A(t)) =
Cij (t),
dt
dt
i,j =1
where Cij (t) is the cofactor of the entry aij (t), i, j = 1, . . . , n, in the
matrix A(t).
3.2.8 Prove the formula
x a1 a2 an
a
1 x a2 an
n
n
'
a1 a2 x an
det
ai
(x ai ).
= x +
.
..
.. . .
..
i=1
i=1
.
. .
.
.
.
a1
a2
a3
x
(3.2.43)
p1 (c2 )
p1 (cn+2 )
p1 (c1 )
p2 (c1 )
p2 (c2 )
p2 (cn+2 )
det
= 0. (3.2.44)
..
..
..
..
.
.
.
.
101
x 1 0
0
0
x 1
0
0
0
..
0
.
0
x
0
0
det
..
..
..
..
..
..
.
.
.
.
.
.
0
0
x
1
0
a1
a0
a2
= an x + an1 x
n
n1
an1
an
+ + a1 x + a0 .
a2
an
a1
a1
a2
an
det
.
.
..
..
..
..
.
.
a1
a2
an
n
n n1
ai .
= (1)
(3.2.45)
(3.2.46)
i=1
3.2.12 Let A F(n, n) be such that AAt = In and det(A) < 0. Show that
det(A + In ) = 0.
3.2.13 Let A F(n, n) such that the entries of A are either 1 or 1. Show that
det(A) is an even integer when n 2.
3.2.14 Let A = (aij ) R(n, n) satisfy the following diagonally dominant
condition:
|aii | >
|aij |, i = 1, . . . , n.
(3.2.47)
j
=i
Show that det(A) > 0. (This result is also known as the Minkowski
theorem.)
102
Determinants
(3.2.49)
100
x
1
f (x1 , . . . , xn ) = det x2
.
.
.
xn
x1
x2
0
..
.
1
..
.
..
.
..
xn
(3.2.50)
aik Cil = 0,
k
= l,
(3.3.1)
akj Clj = 0,
k
= l.
(3.3.2)
i=1
n
j =1
In fact, it is easy to see that the left-hand side of (3.3.1) is the cofactor expansion of the determinant along the lth column of such a matrix that is obtained
from A through replacing the lth column by the kth column of A whose value
must be zero and the left-hand side of (3.3.2) is the cofactor expansion of the
determinant along the lth row of such a matrix that is obtained from A through
replacing the lth row by the kth row of A whose value must also be zero.
We can summarize the properties stated in (3.2.27), (3.3.1), and (3.3.2) by
the expressions
C t A = det(A)In ,
AC t = det(A)In .
(3.3.3)
103
(3.3.4)
(3.3.5)
1
adj(A).
det(A)
(3.3.6)
Proof If det(A)
= 0, from (3.3.5) we arrive at (3.3.6). Conversely, if A1
exists, then from AA1 = In and Theorem 3.11 we have det(A) det(A1 ) = 1.
Thus det(A)
= 0.
As an important application, we consider the unique solution of the system
Ax = b
(3.3.7)
1
adj(A)b.
det(A)
(3.3.8)
1
1
Aij bj =
bj Cj i
det(A)
det(A)
n
j =1
j =1
det(Ai )
=
,
det(A)
i = 1, . . . , n,
(3.3.9)
where Ai is the matrix obtained from A after replacing the ith column of A by
the vector b, i = 1, . . . , n.
104
Determinants
The formulas stated in (3.3.9) are called Cramers formulas. Such a solution
method is also called Cramers rule.
Let A F(m, n) and of rank k. Then there are k row vectors of A which are
linearly independent. Use B to denote the submatrix of A consisting of those k
row vectors. Since B is of rank k we know that there are k column vectors of B
which are linearly independent. Use C to denote the submatrix of B consisting
of those k column vectors. Then C is a submatrix of A which lies in F(k, k)
and is of rank k. In particular det(C)
= 0. In other words, we have shown that
if A is of rank k then A has a k k submatrix of nonzero determinant.
To end this section, we consider a practical problem as an application of
determinants: The unique determination of a polynomial by interpolation.
Let p(t) be a polynomial of degree (n 1) 1 over a field F given by
p(t) = an1 t n1 + + a1 t + a0 ,
(3.3.10)
p1 ,
a0 + a1 t2 + + an2 t2n2
p2 ,
+ an1 t2n1
(3.3.11)
pn ,
t1
t12
t1n2
t2
t22
t2n2
tn
tn2
tnn2
t1n1
t2n1
.
t n1
(3.3.12)
Adding the (t1 ) multiple of the second last column to the last column,. . . ,
the (t1 ) multiple of the second column to the third column, and the (t1 )
multiple of the first column to the second column, we get
det(A)
1
1
=
1
(t2 t1 )
t22 (t2 t1 )
t2n3 (t2 t1 )
tnn3 (tn t1 )
t2n3 t2n2
t3n3 t3n2
(tn t1 ) tn (tn t1 )
1
t22
t2
n
'
1
t3
t32
(ti t1 )
=
i=2
1
tn
tn2
tnn3
105
n2
t2 (t2 t1 )
tnn2 (tn t1 )
0
.
t n2
(3.3.13)
(ti t1 )
(tj t2 )
(tk tn2 ) (tn tn1 )
=
j =3
i=2
'
(tj ti ).
k=n1
(3.3.14)
1i<j n
106
Determinants
(3.3.15)
n,
r(adj(A)) =
1,
0,
r(A) = n,
r(A) = n 1,
(3.3.16)
r(A) n 2.
a11 x1 + + a1n xn
an1 x1 + + ann xn
3.3.11
3.3.12
3.3.13
3.3.14
0,
0.
(3.3.17)
107
3.4 Characteristic
polynomials and CayleyHamilton theorem
We first consider the concrete case of matrices.
Let A = (aij ) be an n n matrix over a field F. We first consider the linear
mapping TA : Fn Fn induced from A defined by
x1
.
n
(3.4.1)
TA (x) = Ax, x =
.. F .
xn
Recall that an eigenvalue of TA is a scalar F such that the null-space
E = N (TA I ) = {x Fn | TA (x) = x},
(3.4.2)
(3.4.3)
Of course the converse is true as well: If satisfies (3.4.3) then (In A)x = 0
has a nontrivial solution which indicates that is an eigenvalue of A. Consequently the eigenvalues of A are the roots of the function
pA () = det(In A),
(3.4.4)
(3.4.5)
108
Proof
Determinants
Write pA () = an n + an1 n1 + + a0 . Then
a0 = pA (0) = det(A) = (1)n det(A)
(3.4.6)
as asserted. Besides, using Definition 3.3 and induction, we see that the two
leading-degree terms in pA () containing n and n1 can only appear in the
In A
of the entry a11
product of the entry a11 and the cofactor C11
in the matrix In A. Let An1 be the submatrix of A obtained by deleting
In A
=
the row and column vectors of A occupied by the entry a11 . Then C11
n1
n2
and
det(In1 An1 ) whose two leading-degree terms containing
can only appear in the product of a22 and its cofactor in the matrix In1
An1 . Carrying out this process to the end we see that the two leading-degree
terms in pA () can appear only in the product
( a11 ) ( ann ),
(3.4.7)
(3.4.8)
A
.
(3.4.9)
n
n
Thus, replacing
1
by t, we get
q(t) an + an1 t + + a0 t n
1 ta11
ta12
ta21
1 ta22
=
tan1
tan2
ta1n
ta2n
1 tann
.
(3.4.11)
Therefore
an1 = q (0) =
n
i=1
aii
(3.4.12)
109
as before.
We now consider the abstract case.
Let U be an n-dimensional vector space over a field F and T L(U ).
Assume that u U is an eigenvector of T associated to the eigenvalue F.
Given a basis U = {u1 , . . . , un } we express u as
u = x1 u1 + + xn un ,
x = (x1 , . . . , xn )t Fn .
(3.4.13)
n
aij ui ,
j = 1, . . . , n.
(3.4.14)
i=1
aij xj = xi ,
i = 1, . . . , n,
(3.4.15)
j =1
n
bij vi ,
i = 1, . . . , n.
(3.4.16)
i=1
n
cij ui ,
j = 1, . . . , n.
(3.4.17)
i=1
Following the study of the previous chapter we know that A, B, C are related through the similarity relation A = C 1 BC. Hence we have
pA () = det(In A) = det(In C 1 BC)
= det(C 1 [In B]C)
= det(In B) = pB ().
(3.4.18)
That is, two similar matrices have the same characteristic polynomial. Thus
we may use pA () to define the characteristic polynomial of linear mapping
T L(U ), rewritten as pT (), where A is the matrix representation of T with
respect to any given basis of U , since such a polynomial is independent of the
choice of the basis.
110
Determinants
The following theorem, known as the CayleyHamilton theorem, is of fundamental importance in linear algebra.
Theorem 3.16 Let A C(n, n) and pA () be its characteristic polynomial.
Then
pA (A) = 0.
(3.4.19)
n
'
( i ).
(3.4.20)
i=1
n
'
pA (A)ui = (A j In ) ui
j =1
'
(A j In ) (A i In )ui = 0,
(3.4.21)
j
=i
n
i=1
(3.4.22)
That is, the matrix B = (bij ) of TA with respect to the basis {u1 , . . . , un } is of
the form
1 b0
B=
,
(3.4.23)
0 B0
111
n
uij ei ,
j = 1, . . . , n.
(3.4.25)
i=1
(3.4.26)
(3.4.27)
112
Determinants
To proceed, we need to recall and re-examine the definition of polynomials in the most general terms. Given a field F, a polynomial p over F is an
expression of the form
p(t) = a0 + a1 t + + an t n ,
a0 , a1 , . . . , an F,
(3.4.28)
(3.4.29)
(3.4.30)
On the other hand, we may expand the left-hand side of (3.4.29) into the form
(In A)adj(In A)
= (In A)(An1 n1 + + A1 + A0 )
= An1 n +(An2 AAn1 )n1 +(A0 AA1 ) AA0 ,
(3.4.31)
A0 AA1 = a1 In ,
...,
AA0 = a0 In .
(3.4.32)
Multiplying from the left the first relation in (3.4.32) by An , the second by
An1 ,. . . , and the second last by A, and then summing up the results, we obtain
0 = An + an1 An1 + + a1 A + a0 In = pA (A),
as anticipated.
(3.4.33)
113
(3.4.34)
(3.4.35)
(1)n+1 n1
+ an1 An2 + + a1 In ),
(A
det(A)
(3.4.36)
or alternatively,
adj(A) = (1)n+1 (An1 + an1 An2 + + a1 In ),
det(A)
= 0.
(3.4.37)
n
Cii ,
(3.4.38)
i=1
where Cii is the cofactor of the entry aii of the matrix A (i = 1, . . . , n).
3.4.2 Consider the subset D of C(n, n) defined by
D = {A C(n, n) | A has n distinct eigenvalues}.
(3.4.39)
114
Determinants
A=
(3.4.40)
(3.4.41)
3.4.7 Let F be a field and , F(1, n). Find the characteristic polynomial
of the matrix t F(n, n).
3.4.8 Consider the matrix
1
2
0
(3.4.42)
A= 0
2
0 .
2
1 1
4
Scalar products
(4.1.1)
116
Scalar products
Let u, v U so that u is not null. Then we can resolve v into the sum of
two mutually perpendicular vectors, one in Span{u}, say cu for some scalar
c, and one in Span{u} , say w. In fact, rewrite v as v = w + cu and require
(u, w) = 0. We obtain the unique solution c = (u, v)/(u, u). In summary, we
have obtained the orthogonal decomposition
v=w+
(u, v)
u,
(u, u)
w=v
(u, v)
u Span{u} .
(u, u)
(4.1.2)
(4.1.3)
(w1 , vi )
w1 ,
(w1 , w1 )
i = 2, . . . , l.
(4.1.4)
Then wi
= 0 since v1 , vi are linearly independent for all i = 2, . . . , l. It is
clear that wi w1 (i = 2, . . . , l). If (wi , wi )
= 0 for some i = 2, . . . , l,
we may assume i = 2 after renaming the basis vectors {v1 , . . . , vl } of V if
necessary. If (wi , wi ) = 0 for all i = 2, . . . , l, there must be some j
=
i, i, j = 2, . . . , l, such that (wi , wj )
= 0, otherwise wi U0 for i = 2, . . . , l,
which is false. Without loss of generality, we may assume (w2 , w3 )
= 0 and
consider
w2 + w3 = (v2 + v3 )
(w1 , v2 + v3 )
w1 .
(w1 , w1 )
(4.1.5)
117
=
Span{w1 , w2 , v3 , . . . , vl }. Now set
wi = vi
0. Of course Span{v1 , . . . , vl }
(w2 , vi )
(w1 , vi )
w2
w1 ,
(w2 , w2 )
(w1 , w1 )
i = 3, . . . , l.
(4.1.7)
(w2 , v3 )
(w1 , v3 )
w2
w1 ,
(w2 , w2 )
(w1 , w1 )
(4.1.8)
(4.1.9)
wi = vi
i1
(wj , vi )
wj ,
(wj , wj )
i = 2, . . . , l,
(4.1.10)
j =1
i = 1, . . . , l,
(wi , wj ) = 0,
i
= j, i, j = 1, . . . , l,
(4.1.11)
Span{w1 , . . . , wl } = Span{v1 , . . . , vl }.
In other words {u1 , . . . , uk , w1 , . . . , wl } is seen to be an orthogonal basis of U .
The method described in the proof of Theorem 4.2, especially the scheme
given by the formulas in (4.1.10)(4.1.11), is known as the GramSchmidt
procedure for basis orthogonalization.
118
Scalar products
(4.1.12)
(vi , vi ) > 0,
(wi , wi ) < 0,
i = 1, . . . , n .
(4.1.13)
U = Span{w1 , . . . , wn },
(4.1.14)
(4.1.16)
i = 1, . . . , m+ ,
(w i , w i ) < 0,
i = 1, . . . , m .
(4.1.17)
To proceed, we assume n+ m+ for definiteness and we need to establish n+ m+ . For this purpose, we show that u1 , . . . , un0 , v1 , . . . , vn+ ,
w 1 , . . . , w m are linearly independent. Indeed, if there are scalars a1 , . . . , an0 ,
b1 , . . . , bn+ , c1 , . . . , cm in R such that
a1 u1 + + an0 un0 + b1 v1 + + bn+ vn+ = c1 w 1 + + cm w m ,
(4.1.18)
then we may take the scalar products of both sides of (4.1.18) with themselves
to get
2
(w m , w m ).
b12 (v1 , v1 ) + + bn2+ (vn+ , vn+ ) = c12 (w 1 , w 1 ) + + cm
(4.1.19)
119
(4.1.20)
i = 1, . . . , n+ ,
(wi , wi ) = 1,
i = 1, . . . , n . (4.1.21)
a1
b1
.
, v = .. Rn ,
.
(u, v) = ut Av, u =
(4.1.22)
.
.
an
bn
(4.1.23)
120
Scalar products
4.1.5 In special relativity, one equips the space R4 with the Minkowski scalar
product or Minkowski metric given by
a1
b1
a2
b2
4
(u, v) = a1 b1 a2 b2 a3 b3 a4 b4 , u =
, v = b R .
a3
3
a4
b4
(4.1.24)
Find an orthogonal basis and determine the indices of nullity, positivity,
and negativity of R4 equipped with this scalar product.
4.1.6 With the notation of the previous exercise, consider the following modified scalar product
(u, v) = a1 b1 a2 b3 a3 b2 a4 b4 .
(4.1.25)
(4.1.26)
uU
(4.2.1)
121
i = 1, . . . , n,
(4.2.2)
we may take
vi =
1
ui ,
ci
i = 1, . . . , n,
(4.2.3)
i, j = 1, . . . , n.
(4.2.4)
to achieve
(ui , vj ) = ij ,
Thus, if we define fi U by setting
fi (u) = (u, vi ),
u U,
i = 1, . . . , n,
(4.2.5)
(4.2.6)
(4.2.8)
u, v U.
(4.2.9)
u U,
(4.2.10)
122
Scalar products
(4.2.11)
u, v U,
v U .
(4.2.12)
On the other hand, for T L(U ), recall that the dual of T , T L(U ),
satisfies
u, T (v ) = T (u), v ,
u U,
v U .
(4.2.13)
u, v U.
(4.2.14)
(4.2.15)
u, v U.
(4.2.16)
u, v U,
(4.2.17)
is called the dual or adjoint mapping of T , with respect to the scalar product
(, ).
If T = T , T is said to be a self-dual or self-adjoint mapping with respect to
the scalar product (, ).
123
Definition 4.5 Let T L(U ) where U is a vector space equipped with a scalar
product (, ). We say that T is an orthogonal mapping if (T (u), T (v)) = (u, v)
for any u, v U .
As an immediate consequence of the above definition, we have the following
basic results.
Theorem 4.6 That T L(U ) is an orthogonal mapping is equivalent to one
of the following statements.
(1) (T (u), T (u)) = (u, u) for any u U .
(2) For any orthogonal basis {u1 , . . . , un } of U the vectors T (u1 ), . . . , T (un )
are mutually orthogonal and (T (ui ), T (ui )) = (ui , ui ) for i = 1, . . . , n.
(3) T T = T T = I , the identity mapping over U .
Proof If T is orthogonal, it is clear that (1) holds.
Now assume (1) is valid. Using the properties of the scalar product, we have
the identity
2(u, v) = (u + v, u + v) (u, u) (v, v),
u, v U.
(4.2.18)
n
ai ui ,
v=
n
i=1
bi ui ,
ai , bi F,
i = 1, . . . , n.
(4.2.19)
i=1
Therefore we have
n
n
(T (u), T (v)) = T
ai ui , T
bj uj
j =1
i=1
=
=
=
n
ai bj (T (ui ), T (uj ))
i,j =1
n
ai bi (T (ui ), T (ui ))
i=1
n
ai bi (ui , ui )
i=1
n
n
=
ai ui ,
bj uj = (u, v),
i=1
j =1
(4.2.20)
124
Scalar products
125
=u
which implies
T (u) =
=
t
a, b, c, d R,
Av
1 1
1
1 1
1
ab+cd
1
1
v,
(4.2.28)
A
u
1 1
1 1
1 a+b+c+d a+bcd
2
(4.2.27)
(4.2.26)
abc+d
u,
u R2 .
(4.2.29)
126
Scalar products
Let
V = Span
1
0
.
(4.2.31)
d
: U U is anti-self-dual or
dt
(4.2.35)
4.2.7 Let U be a finite-dimensional vector space equipped with a scalar product, (, ), and let V be a subspace of U . Show that (, ) is a nondegenerate scalar product over V if and only if V V = {0}.
127
(4.2.36)
(4.2.37)
4.2.9 Let U be a finite-dimensional space with a non-degenerate scalar product and let V be a subspace of U . Use (4.2.37) to establish V = (V ) .
4.2.10 Let U be a finite-dimensional space with a non-degenerate scalar product and let V , W be two subspaces of U . Establish the relation
(V W ) = V + W .
(4.2.38)
b1
a1
.
.
n
u=
.. , v = .. R ,
an
(4.3.1)
bn
b1
a1
.
.
n
u=
.. , v = .. C .
an
bn
(4.3.2)
128
Scalar products
Since the real case is contained as a special situation of the complex case,
we shall focus our discussion on the complex case, unless otherwise stated.
Needless to say, additivity regarding the second argument in (, ) still holds
since
(u, v + w) = (v + w, u) = (v, u) + (w, u) = (u, v) + (u, w),
u, v, w U.
(4.3.3)
On the other hand, homogeneity regarding the first argument takes a modified
form,
(au, v) = (v, au) = a(v, u) = a(u, v),
a C,
u U.
(4.3.4)
We will extend our study carried out in the previous two sections for general
scalar products to the current situation of a positive definite scalar product that
is necessarily non-degenerate since (u, u) > 0 for any nonzero vector u in U .
First, we see that for u, v U we can still use the condition (u, v) = 0 to
define u, v to be mutually perpendicular vectors. Next, since for any u U we
have (u, u) 0, we can formally define the norm of u as in Cn by
(
(4.3.5)
u = (u, u).
129
It is clearly seen that the norm so defined enjoys the positivity and homogeneity
conditions required of a norm. That it also satisfies the triangle inequality will
be established shortly. Thus (4.3.5) indeed gives rise to a norm of the space U
that is said to be induced from the positive definite scalar product (, ).
Let u, v U be perpendicular. Then we have
u + v2 = u2 + v2 .
(4.3.6)
(4.3.10)
(4.3.11)
130
Scalar products
vi = ui
i1
(vj , ui )
vj ,
(vj , vj )
i = 2, . . . , n,
(4.3.12)
j =1
k
(vj , uk+1 )
j =1
(vj , vj )
vj .
(4.3.13)
(4.3.14)
Of course vk+1
= 0 otherwise uk+1 Span{v1 , . . . , vk } = Span{u1 , . . . , uk }.
Thus we have obtained k + 1 nonzero mutually orthogonal vectors
v1 , . . . , vk , vk+1 that make up a basis for Span{u1 , . . . , uk , uk+1 } as asserted.
Thus, we have seen that, in the positive definite scalar product situation,
from any basis {u1 , . . . , un } of U , the GramSchmidt procedure (4.3.12) provides a scheme of getting an orthogonal basis {v1 , . . . , vn } of U so that each
of its subsets {v1 , . . . , vk } is an orthogonal basis of Span{u1 , . . . , uk } for
k = 1, . . . , n.
Let {v1 , . . . , vn } be an orthogonal basis for U . The positivity property allows
us to modify the basis further by setting
wi =
1
vi ,
vi
i = 1, . . . , n,
(4.3.15)
v U,
(4.3.16)
v U.
131
(4.3.17)
(4.3.18)
Consequently, we have
f (v) = a1 f1 (v) + + an fn (v)
= a1 (u1 , v) + + an (un , v)
= (a 1 u1 + + a n un , v)
(u, v), u = a 1 u1 + + a n un .
(4.3.19)
i = 1, . . . , n;
f = a1 f1 + + an fn # a 1 u1 + + a n un .
(4.3.20)
u U,
(4.3.21)
(4.3.22)
u U.
(4.3.23)
132
Scalar products
u U.
(4.3.24)
(4.3.25)
(4.3.26)
1
1
(u + v2 u2 v2 ) + i(iu + v2 u2 v2 ),
2
2
u, v U.
(4.3.27)
133
(T (u), T (v)) =
Hence T is unitary.
The rest of the proof is similar to that of Theorem 4.6 and thus skipped.
If T L(U ) is orthogonal or unitary and C an eigenvalue of T , then it
is clear that || = 1 since T is norm-preserving.
Let A F(n, n) and define TA L(Fn ) in the usual way TA (u) = Au for
any column vector u Fn .
When F = R, let the positive define scalar product be the Euclidean one
given in (4.3.1). That is,
(u, v) = ut v,
u, v Rn .
(4.3.29)
Thus
(u, TA (v)) = ut Av = (At u)t v = (TA (u), v),
u, v Rn .
(4.3.30)
n
aij ui ,
j = 1, . . . , n.
(4.3.31)
i=1
i, j = 1, . . . , n.
(4.3.32)
134
Scalar products
u, v Cn .
(4.3.33)
Thus
# t $t
(u, TA (v)) = ut Av = A u v = (TA (u), v),
u, v Cn .
(4.3.34)
Definition 4.10 A real matrix A R(n, n) is said to be orthogonal if its transpose At is its inverse. That is, AAt = At A = In . It is easily checked that
A is orthogonal if and only if its sets of column and row vectors both form
orthonormal bases of Rn with the standard Euclidean scalar product.
A complex matrix A C(n, n) is said to be unitary if the complex conjugate
t
t
of its transpose A , also called its Hermitian conjugate denoted as A = A ,
135
v1 = u 1 ,
(v1 , u2 )
v1 ,
v2 = u2
(v1 , v1 )
(4.3.35)
(v , u )
(vn1 , un )
vn = un 1 n v1
vn1 .
(v1 , v1 )
(vn1 , vn1 )
Then {v1 , . . . , vn } is an orthogonal basis of Cn . Set wi = (1/vi )vi for
i = 1, . . . , n. We see that {w1 , . . . , wn } is an orthonormal basis of Cn .
Therefore, inverting (4.3.35) and rewriting the resulting relations in terms of
{w1 , . . . , wn }, we get
u1 = v1 w1 ,
(v1 , u2 )
(v1 , un )
(vn1 , un )
un =
v1 w1 + +
vn1 wn1 + vn wn .
(v1 , v1 )
(vn1 , vn1 )
(4.3.36)
For convenience, we may express (4.3.36) in the compressed form
u1 = r11 w1 ,
u = r w +r w ,
2
un
12
22
(4.3.37)
= r1n w1 + + rn1,n wn1 + rnn wn ,
A = (u1 , . . . , un ) = (w1 , . . . , wn )
= QR,
r11
r12
r1n
0
..
.
r22
..
.
..
.
r2n
..
.
rnn
(4.3.38)
136
Scalar products
(4.3.39)
Indeed, let denote the right-hand side of (4.3.39). From the Schwarz inequality (4.3.10), we get |(u, v)| u for any v U satisfying v = 1. So
u. To show that u, it suffices to consider the nontrivial situation
u
= 0. In this case, we have
1
u = u.
(4.3.40)
u,
u
Hence, in conclusion, = u and (4.3.39) follows.
Exercises
4.3.1 Let U be a vector space with a positive definite scalar product (, ).
Show that u1 , . . . , uk U are linearly independent if and only if their
associated metric matrix, also called the Gram matrix,
(u1 , u1 ) (u1 , uk )
M =
(4.3.41)
(uk , u1 )
(uk , uk )
is nonsingular.
4.3.2 (Continued from Exercise 4.3.1) Show that if u U lies in
Span{u1 , . . . , uk } then the column vector ((u1 , u), . . . , (uk , u))t lies in
the column space of the metric matrix M. However, the converse is not
true when k < dim(U ).
4.3.3 Let U be a complex vector space with a positive definite scalar product
and B = {u1 , . . . , un } an orthonormal basis of U . For T L(U ), let
A, A C(n, n) be the matrices that represent T , T , respectively, with
respect to the basis B. Show that A = A .
4.3.4 Let U be a finite-dimensional complex vector space with a positive definite scalar product and S L(U ) be anti-self-adjoint. That is, S = S.
Show that I S must be invertible.
4.3.5 Consider the complex vector space C(m, n). Show that
(A, B) = Tr(A B),
A, B C(m, n)
(4.3.42)
137
defines a positive definite scalar product over C(m, n) that extends the
m
traditional Hermitian scalar product over
( C = C(m, 1). With such a
scalar product, the quantity A = (A, A) is sometimes called the
HilbertSchmidt norm of the matrix A C(m, n).
4.3.6 Let (, ) be the standard Hermitian scalar product on Cm and A
C(m, n). Establish the following statement known as the Fredholm
alternative for complex matrix equations: Given b Cm the nonhomogeneous equation Ax = b has a solution for some x Cn if and
only if (y, b) = 0 for any solution y Cm of the homogeneous equation
A y = 0.
4.3.7 For A C(n, n) show that if the column vectors of A form an orthonormal basis of Cn with the standard Hermitian scalar product so do the
row vectors of A.
4.3.8 For the matrix
1 1 1
(4.3.43)
A = 1 1
2 ,
2
obtain a QR factorization.
(4.4.1)
(4.4.2)
138
Scalar products
v V,
w V ,
(4.4.3)
k
(vi , u)
vi .
(vi , vi )
(4.4.4)
i=1
Moreover, the vector v given in (4.4.4) is the unique solution of the minimization problem
inf{u x | x V }.
(4.4.5)
Proof The validity of the expression (4.4.3) for some unique v V and
w V is already ensured by Theorem 4.11.
We rewrite v as
v=
k
(4.4.6)
ai vi .
i=1
(vi , v)
,
(vi , vi )
i = 1, . . . , k,
(4.4.7)
k
k
bi vi = w +
(ai bi )vi ,
i=1
i=1
x=
k
i=1
bi vi V .
(4.4.8)
139
Consequently,
"
"2
k
"
"
"
"
u x2 = "u
bi vi "
"
"
i=1
"
"2
k
"
"
"
"
= "w +
(ai bi )vi "
"
"
i=1
= w2 +
k
|ai bi |2 vi 2
i=1
w2 ,
(4.4.9)
and the lower bound w2 is attained only when bi = ai for all i = 1, . . . , k,
or x = v.
So the proof is complete.
Definition 4.13 Let {v1 , . . . , vk } be a set of orthogonal vectors in U . For u
U , the sum
k
ai vi ,
ai =
i=1
(vi , u)
,
(vi , vi )
i = 1, . . . , k,
(4.4.10)
(4.4.11)
140
Scalar products
(4.4.12)
i=1
ai ui ,
ai = (ui , u),
i = 1, . . . , n,
(4.4.13)
i=1
(4.4.14)
i=1
(4.4.15)
i=1
141
4.4.2 Let V be a nontrivial subspace of a vector space U with a positive definite scalar product. Show that, if {v1 , . . . , vk } is an orthogonal basis of
V , then the mapping P : U U given by its Fourier expansion,
P (u) =
k
(vi , u)
vi ,
(vi , vi )
u U,
(4.4.16)
i=1
u U,
(4.4.17)
[u] U/V ,
(4.4.18)
142
Scalar products
f (t) = 2,
f (t 2 ) = 6,
f (t 3 ) = 5,
(4.4.20)
u P3 .
(4.4.21)
u, v U.
(4.5.1)
In other words, the distance of the images of any two vectors in U under T
is the same as that between the two vectors. A mapping from U into itself
satisfying such a property is called an isometry or isometric. In this section, we
show that any zero-vector preserving mapping from a real vector space U with
a positive definite scalar product into itself satisfying the property (4.5.1) must
be linear. Therefore, in view of Theorems 4.6, it is orthogonal. In other words,
in the real setting, being isometric characterizes a mapping being orthogonal.
Theorem 4.16 Let U be a real vector space with a positive definite scalar
product. A mapping T from U into itself satisfying the isometric property
(4.5.1) and T (0) = 0 if and only if it is orthogonal.
Proof Assume T satisfies T (0) = 0 and (4.5.1). We show that T must be
linear. To this end, from (4.5.1) and replacing v by 0, we get T (u) = u
for any u U . On the other hand, the symmetry of the scalar product (, )
gives us the identity
(u, v) =
1
(u + v2 u2 v2 ),
2
u, v U.
(4.5.2)
143
(T (u), T (v)) =
Hence (T (u), T (v)) = (u, v) for any u, v U . Using this result, we have
T (u + v) T (u) T (v)2
= T (u + v)2 + T (u)2 + T (v)2
2(T (u + v), T (u)) 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 2(u + v, u) 2(u + v, v) + 2(u, v)
= (u + v) u v2 = 0,
(4.5.4)
u, v U.
(4.5.5)
r Q,
u U.
(4.5.6)
144
Scalar products
u, v U,
(4.5.8)
u U.
(4.5.9)
(T (u), T (v)) =
That is, (T (u), T (v)) = (u, v) for u, v U . Thus a direct expansion gives us
the result
T (u + v) T (u) T (v)2
= T (u + v)2 + T (u)2 + T (v)2
2(T (u + v), T (u)) 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 2(u + v, u) 2(u + v, v) + 2(u, v)
= (u + v) u v2 = 0,
(4.5.11)
145
p, q Q,
u U.
(4.5.12)
u, v U,
(4.5.13)
u, v U,
(4.5.14)
x1
xn
.
.
n
(4.5.15)
T (x) =
.. , x = .. R ,
x1
xn
146
Scalar products
where Rn is equipped with the standard Euclidean scalar product.
(i) Show that T is an isometry.
(ii) Determine all the eigenvalues of T .
(iii) Show that eigenvectors associated to different eigenvalues are mutually perpendicular.
(iv) Determine the minimal polynomial of T .
5
Real quadratic forms and self-adjoint mappings
In this chapter we exclusively consider vector spaces over the field of reals
unless otherwise stated. We first present a general discussion on bilinear and
quadratic forms and their matrix representations. We also show how a symmetric bilinear form may be uniquely represented by a self-adjoint mapping.
We then establish the main spectrum theorem for self-adjoint mappings based
on a proof of the existence of an eigenvalue using calculus. We next focus on
characterizing the positive definiteness of self-adjoint mappings. After these
we study the commutativity of self-adjoint mappings. In the last section we
show the effectiveness of using self-adjoint mappings in computing the norm
of a mapping between different spaces and in the formalism of least squares
approximations.
148
f (u, v) = f
n
i=1
xi ui ,
n
yj uj =
j =1
n
i,j =1
where A = (aij ) = (f (ui , uj )) R(n, n) is referred to as the matrix representation of the bilinear form f with respect to the basis B.
Let B = {u 1 , . . . , u n } be another basis of U so that A = (a ij ) =
If x,
y R are the coordinate vectors of u, v U with respect to B and the
basis transition matrix between B and B is B = (bij ) so that
u j =
n
bij ui ,
j = 1, . . . , n,
(5.1.2)
i=1
then x = B x,
y = B y (cf. Section 1.3). Hence we arrive at
f (u, v) = x t A y = x t Ay = x t (B t AB)y,
(5.1.3)
which leads to the relation A = B t AB and gives rise to the following concept.
Definition 5.2 For A, B F(n, n), we say that A and B are congruent if there
is an invertible element C F(n, n) such that
A = C t BC.
(5.1.4)
u U,
(5.1.5)
which is called the quadratic form associated with the bilinear form f . The
quadratic form q is homogeneous of degree 2 since q(tu) = t 2 q(u) for any
t R and u U .
Of course q is uniquely determined by f through (5.1.5). However, the converse is not true, which will become clear after the following discussion.
To proceed, let B = {u1 , . . . , un } be a basis of U and u U any given
vector whose coordinate vector with respect to B is x = (x1 , . . . , xn )t . Then,
from (5.1.1), we have
1
1
1
(A + At ) + (A At ) x = x t (A + At )x,
q(u) = x t Ax = x t
2
2
2
(5.1.6)
149
u, v U.
(5.1.7)
u, v U.
(5.1.8)
Thus, if q is the quadratic form associated with f , we derive from (5.1.8) the
relation
f (u, v) =
1
(q(u + v) q(u) q(v)),
2
u, v U,
(5.1.9)
1
(q(u + v) q(u v)),
4
u, v U.
(5.1.10)
As in the situation of scalar products, the relations of the types (5.1.9) and
(5.1.10) are often referred to as polarization identities for symmetric bilinear
forms.
From now on we will concentrate on symmetric bilinear forms.
Let f : U U R be a symmetric bilinear form. If x, y Rn are the
coordinate vector of u, v U with respect to a basis B, then f (u, v) is given
by (5.1.1) so that matrix A R(n, n) is symmetric. Recall that (x, y) = x t y is
the Euclidean scalar product over Rn . Thus, if we view A as a linear mapping
Rn Rn given by x # Ax, then the right-hand side of (5.1.1) is simply
(x, Ay). Since A = At , the right-hand side of (5.1.1) is also (Ax, y). In other
words, A defines a self-adjoint mapping over Rn with respect to the standard
Euclidean scalar product over Rn .
Conversely, if U is a vector space equipped with a positive definite scalar
product (, ) and T L(U ) is a self-adjoint or symmetric mapping, then
f (u, v) = (u, T (v)),
u, v U,
(5.1.11)
150
is a symmetric bilinear form. Thus, in this way, we see that symmetric bilinear
forms are completely characterized.
In a more precise manner, we have the following theorem, which relates
symmetric bilinear forms and self-adjoint mappings over a vector space with a
positive definite scalar product.
Theorem 5.4 Let U be a finite-dimensional vector space with a positive definite scalar product (, ). For any symmetric bilinear form f : U U R,
there is a unique self-adjoint or symmetric linear mapping, say T L(U ),
such that the relation (5.1.11) holds.
Proof For each v U , the existence of a unique vector, say T (v), so that
(5.1.11) holds, is already shown in Section 4.2. Since f is bilinear, we have
T L(U ). The self-adjointness or symmetry of T follows from the symmetry
of f and the scalar product.
Note that, in view of Section 4.1, a symmetric bilinear form is exactly a kind
of scalar product as well, not necessarily positive definite, though.
Exercises
5.1.1 Let f : U U R be a bilinear form such that f (u, u) > 0 and
f (v, v) < 0 for some u, v U .
(i) Show that u, v are linearly independent.
(ii) Show that there is some w U, w
= 0 such that f (w, w) = 0.
5.1.2 Let A, B F(n, n). Show that if A, B are congruent then A, B must
have the same rank.
5.1.3 Let A, B F(n, n). Show that if A, B are congruent and A is symmetric
then so is B.
5.1.4 Are the matrices
A=
,
B=
(5.1.12)
151
x1
x = x2 R3 ,
x3
(5.1.13)
152
n
aij ui ,
i = 1, . . . , n.
(5.2.1)
i=1
Then set
n
n
xi ui ,
xj T (uj )
Q(x) = (u, T (u)) =
n
xi ui ,
n
n
j =1
i=1
j =1
i=1
xj
n
n
akj uk =
k=1
xi xj akj ik
i,j,k=1
xi xj aij = x t Ax.
(5.2.2)
i,j =1
(5.2.3)
i=1
which may also be identified as the unit sphere in Rn centered at the origin
and commonly denoted by S n1 . Since S n1 is compact, the function Q given
in (5.2.2) attains its minimum over S n1 at a certain point on S n1 , say x 0 =
(x10 , . . . , xn0 )t .
We may assume xn0
= 0. Without loss of generality, we may also assume
0
xn > 0 (the case xn0 < 0 can be treated similarly). Hence, near x 0 , we may
represent the points on S n1 by the formulas
0
2
xn = 1 x12 xn1
where (x1 , . . . , xn1 ) is near (x10 , . . . , xn1
).
(5.2.4)
Therefore, with
1 x12
2
xn1
(5.2.5)
0
) is a critical point of P . Thus we have
we see that (x10 , . . . , xn1
0
(P )(x10 , . . . , xn1
) = 0.
(5.2.6)
153
n1
aij xi xj
i,j =1
+2
n1
2
ain xi 1 x12 xn1
i=1
2
+ ann (1 x12 xn1
).
(5.2.7)
Thus
n1
P
2
=2
aij xj + 2ain 1 x12 xn1
xi
j =1
2
n1
xi
2
1 x12 xn1
aj n xj 2ann xi , i = 1, . . . , n 1.
j =1
(5.2.8)
Using (5.2.8) in (5.2.6), we arrive at
n
j =1
n
1
aij xj0 = 0
aj n xj0 xi0 ,
xn
i = 1, . . . , n 1.
(5.2.9)
j =1
n
1
anj xj0 = 0
aj n xj0 xn0
xn
(5.2.10)
j =1
n
1
0 = 0
aj n xj0 ,
xn
(5.2.11)
j =1
154
n
n
0
xj uj =
xj0 T (uj )
T (v1 ) = T
j =1
n
aij ui xj0 =
i,j =1
= 0
n
j =1
n
n
aij xj0 ui
j =1
i=1
xi0 ui = 0 v1 ,
(5.2.12)
i=1
(5.2.13)
i <0
n+ = dim(E+ ),
n = dim(E ),
(5.2.15)
are exactly what were previously called the indices of nullity, positivity, and
negativity, respectively, in the context of a real scalar product in Section 4.2,
which is simply a symmetric bilinear form here and may always be represented
155
(5.2.16)
(5.2.17)
where D R(n, n) is a diagonal matrix and the diagonal entries of D are the
eigenvalues of A.
Proof The right-hand side of (5.2.17) is of course symmetric.
Conversely, let A be symmetric. The linear mapping T L(Rn ) defined by
T (x) = Ax, with x Rn any column vector, is self-adjoint with respect to the
standard scalar product over Rn . Hence there are column vectors u1 , . . . , un
in Rn consisting of eigenvectors of T or A associated with real eigenvalues,
say 1 , . . . , n , which form an orthonormal basis of Rn . The relations Au1 =
1 u1 , . . . , Aun = n un , may collectively be rewritten as AQ = QD where
D = diag{1 , . . . , n } and Q R(n, n) is made of taking u1 , . . . , un as the
respective column vectors. Since uti uj = ij , i, j = 1, . . . , n, we see that Q is
an orthogonal matrix. Setting P = Qt , we arrive at (5.2.17).
Note that the proof of Theorem 5.6 gives us a practical way to construct the
matrices P and D in (5.2.17).
Exercises
5.2.1 The fact that a symmetric matrix A R(n, n) has and can have only
real eigenvalues may also be proved algebraically more traditionally as
follows. Consider A as an element in C(n, n) and let C be any of
its eigenvalue whose existence is ensured by the Fundamental Theorem
156
(5.2.18)
x1
y1
157
(5.2.21)
is an orthogonal matrix.
5.2.14 Let A R(n, n) be an idempotent symmetric matrix such that r(A) =
r. Show that the characteristic polynomial of A is
pA () = det(In A) = ( 1)r nr .
(5.2.22)
5.2.15 Let A R(n, n) be a symmetric matrix whose eigenvalues are all nonnegative. Show that det(A + In ) > 1 if A
= 0.
5.2.16 Let u Rn be a nonzero column vector. Show that there is an orthogonal matrix Q R(n, n) such that
Qt (uut )Q = diag{ut u, 0, . . . , 0}.
(5.2.23)
u U.
(5.3.1)
In this section, we apply the results of the previous section to investigate the
situation when q stays positive, which is important in applications.
Definition 5.7 The positive definiteness of various subjects of concern is
defined as follows.
158
u U,
u
= 0.
(5.3.2)
u U,
u
= 0.
(5.3.3)
(5.3.4)
i, j = 1, . . . , n.
(5.3.5)
u U.
(5.3.6)
159
n
n
n
n
ai ui ,
j aj uj =
i ai2 0
ai2 = 0 u2 ,
(u, T (u)) =
i=1
j =1
i=1
i=1
(5.3.7)
which establishes (2). It is obvious that (2) implies the positive definiteness
of T .
We now show that the positive definiteness of T is equivalent to the statement (3). In fact, let {u1 , . . . , un } be an orthonormal basis of U consisting of
eigenvectors with {1 , . . . , n } the corresponding eigenvalues. In view of (1),
i > 0 for i = 1, . . . , n. Now define S L(U ) to satisfy
(
S(ui ) = i ui , i = 1, . . . , n.
(5.3.8)
It is clear that T = S 2 .
Conversely, if T = S 2 for some positive definite mapping S, then 0 is not
an eigenvalue of S. So S is invertible. Therefore
(u, T (u)) = (u, S 2 (u)) = (S(u), S(u)) = S(u)2 > 0, u U, u
= 0,
(5.3.9)
and the positive definiteness of T follows.
Let {u1 , . . . , un } be an arbitrary basis of U and x = (x1 , . . . , xn )t Rn a
n
nonzero vector. For u =
xi ui , we have
i=1
n
n
n
xi ui ,
xj T (uj ) =
xi (ui , T (uj ))xj = x t Ax,
(u, T (u)) =
i=1
j =1
i,j =1
(5.3.10)
which is positive if T is positive definite, and vice versa. So the equivalence of
the positive definiteness of T and the statement (4) follows.
Suppose that A is positive definite. Let be any eigenvalue of A and x an
associated eigenvector. Then x t Ax = x t x > 0 implies > 0 since x t x > 0.
So (5) follows.
Now assume (5). Using Theorem 5.6, we see that A = P t DP where D is a
diagonal matrix in R(n, n) whose diagonal entries are the positive eigenvalues
160
n
i yi2 > 0,
(5.3.11)
i=1
whenever x
= 0 since P is nonsingular. Hence (5) implies (4) as well.
Finally, assume (5) holds and set
(
(
D1 = diag{ 1 , . . . , n }.
(5.3.12)
Then D12 = D. Thus
A = P t DP = P t D12 P = (P t D1 P )(P t D1 P ) = B 2 ,
B = P t D1 P .
(5.3.13)
Using (5) we know that B is positive definite. So (6) follows. Conversely, if (6)
holds, we may use the symmetry of B to get x t Ax = x t B 2 x = (Bx)t (Bx) > 0
whenever x Rn with x
= 0 because B is nonsingular, which implies Bx
=
0. Thus (4) holds.
The proof is complete.
We note that, if in (5.3.5) the basis {u1 , . . . , un } is orthonormal, then
T (uj ) =
n
n
(ui , T (uj ))ui =
aij ui ,
i=1
j = 1, . . . , n,
(5.3.14)
i=1
161
Now assume A is positive definite. Then, combining Theorem 5.6 and (1),
we can rewrite A as A = P t DP where P R(n, n) is orthogonal and
D R(n, n) is diagonal whose diagonal entries, say 1 , . . . , n , are all positive. Define the diagonal matrix D1 by (5.3.12) and set D1 P = B. Then B is
nonsingular and A = B t B as asserted.
Similarly, we say that the quadratic form q, self-adjoint mapping T , or
symmetric matrix A R(n, n) is positive semi-definite or non-negative if in
the condition (5.3.2), (5.3.3), or (5.3.4), respectively, the greater than sign
(>) is replaced by greater than or equal to sign (). For positive semidefinite or non-negative mappings and matrices, the corresponding versions
of Theorems 5.8 and 5.9 can similarly be stated, simply with the word positive there being replaced by non-negative and In in Theorem 5.9 (3) by
diag{1, . . . , 1, 0, . . . , 0}.
Besides, we can define a quadratic form q, self-adjoint mapping T , or symmetric matrix A to be negative definite or negative semi-definite (non-positive),
if q, T , or A is positive definite or positive semi-definite (non-negative).
Thus, a quadratic form, self-adjoint mapping, or symmetric matrix is called
indefinite or non-definite if it is neither positive semi-definite nor negative
semi-definite.
As an illustration, we consider the quadratic form
x1
.
4
a 1
1 a
A=
1 0
1 1
0 0
,
(5.3.16)
a 0
0 a
(5.3.17)
Consequently
the eigenvalues of
A are 1 = 2 = a, 3 = a + 3, 4 =
162
(u1 , u1 ) (u1 , uk )
M =
(5.3.18)
,
(uk , u1 )
(uk , uk )
(5.3.19)
(5.3.20)
5.3.12
5.3.13
5.3.14
5.3.15
163
x1
2
n
n
.
n
xi2
xi , x =
q(x) = (n + 1)
.. R .
i=1
i=1
xn
(5.3.21)
(5.3.22)
(5.3.23)
1
(u, T (u)) (u, b),
2
u U.
(5.3.24)
164
(5.3.25)
u, v U.
(5.3.26)
Proof
nn
a11 a1k
(5.4.2)
Ak = .
ak1
akk
k
i,j =1
aij yi yj = x t Ax > 0.
(5.4.3)
165
x Ax = y
t
In1
ann
x (y1 , . . . , yn1 , yn )t ,
(5.4.7)
2
= y12 + + yn1
+ 2(b1 y1 + + bn1 yn1 )yn + ann yn2
(5.4.8)
166
i1 , . . . , ik = 1, . . . , n,
i1 < < ik ,
(5.4.9)
y t Ai1 ,...,ik y =
ail im yl ym = x t Ax,
(5.4.10)
l,m=1
where
ai1 i1
Ai1 ,...,ik =
aik i1
ai1 ik
aik ik
(5.4.11)
is a submatrix of A obtained from deleting all the ith rows and j th columns
of A for i, j = 1, . . . , n and i, j
= i1 , . . . , ik . The quantity det(Ai1 ,...,ik )
is referred to as a principal minor of A of order k. Such a principal minor
becomes a leading principal minor when i1 = 1, . . . , ik = k with A1,...,k = Ak .
For A = (aij ) the principal minors of order 1 are all its diagonal entries,
a11 , . . . , ann . It is clear that if A is positive definite then so is Ai1 ,...,ik . Hence
det(Ai1 ,...,ik ) > 0. Therefore we arrive at the following slightly strengthened
version of Theorem 5.10.
Theorem 5.11 Let A = (aij ) R(n, n) be symmetric. The matrix A is positive
definite if and only if all its principal minors are positive, that is, if and only if
ai1 i1 ai1 ik
aii > 0, > 0,
(5.4.12)
a
a
ik i1
ik ik
167
( a11 ).
Assume that the assertion is valid at n 1 (n 2).
We establish the decomposition at n (n 2).
To proceed, we rewrite A in the form (5.4.4). In view of Theorem 5.10, the
matrix An1 is positive definite. Thus there is a unique lower triangular matrix
L1 R(n 1, n 1), with positive diagonal entries, so that An1 = L1 Lt1 .
Now take L R(n, n) with
L2 0
L=
,
(5.4.13)
t a
where L2 R(n 1, n 1) is a lower triangular matrix, Rn1 is a column
vector, and a R is a suitable number. Then, if we set A = LLt , we obtain
An1
L2 0
Lt2
A=
=
ann
t
t a
0 a
t
L2 L2
L2
=
.
(5.4.14)
t
t
(L2 ) + a 2
Therefore we arrive at the relations
An1 = L2 Lt2 ,
= L2 ,
t + a 2 = ann .
(5.4.15)
If we require that all the diagonal entries of L2 be positive, then the inductive
assumption leads to L2 = L1 . Hence the vector is also uniquely determined,
= L1
1 . So it remains to show that the number a may be uniquely determined as well. For this purpose, we need to show in (5.4.15) that
ann t > 0.
(5.4.16)
168
An1
ann
=
L1
In1
ann
Lt1
.
(5.4.17)
Hence (5.4.6), namely, (5.4.16), follows immediately. Therefore the last relation in (5.4.15) leads to the unique determination of the number a:
(
a = ann t > 0.
(5.4.18)
The proof follows.
The above-described characterization that a positive definite n n matrix
can be decomposed as the product of a unique lower triangular matrix L, with
positive diagonal entries, and its transpose, is known as the Cholesky decomposition theorem, which has wide applications in numerous areas. Through
resolving the Cholesky relation A = LLt , it may easily be seen that the lower
triangular matrix L = (lij ) can actually be constructed from A = (aij ) explicitly following the formulas
l11 = a11 ,
li1 =
ai1 , i = 2, . . . , n,
11
i1
+
,aii
l
=
lij2 , i = 2, . . . , n,
ii
j =1
j 1
1
l
aij
=
lik lj k , i = j + 1, . . . , n, j = 2, . . . , n.
ij
ljj
k=1
(5.4.19)
Thus, if A R(n, n) is nonsingular, the product AAt may always be reduced
into the form LLt for a unique lower triangular matrix L with positive diagonal
entries.
Exercises
5.4.1 Let a1 , a2 , a3 be real numbers and set
a1 a2
A = a2 a3
a3
a1
a3
a1 .
(5.4.20)
a2
169
x1
a11
a
11
0,
a21
a12
a22
a11
a
n1
0, . . . ,
a1n
0. (5.4.22)
ann
An1
ann
(5.4.23)
In1
=
t
An1
ann
An1
ann + a
In1
(5.4.24)
170
(5.4.25)
(5.4.26)
(5.4.27)
i, j = 1, . . . , n.
(5.4.28)
| det(A)| a n n 2 .
5.4.8 Consider the matrix
A= 1
1
3
1 1
(5.4.29)
1 .
(5.4.30)
171
Theorem 5.13 Let S, T L(U ) be self-adjoint. Then there is an orthonormal basis of U consisting of eigenvectors of both S, T if and only if S and T
commute, or S T = T S.
Proof Let 1 , . . . , k be all the distinct eigenvalues of T and E1 , . . . , Ek
the associated eigenspaces which are known to be mutually perpendicular. Assume that S, T L(U ) commute. Then, for any u Ei (i = 1, . . . , k), we
have
T (S(u)) = S(T (u)) = S(i u) = i S(u).
(5.5.1)
Thus S(u) Ei . In other words, each Ei is invariant under S. Since S is selfadjoint, each Ei has an orthonormal basis, say {ui,1 , . . . , ui,mi }, consisting of
the eigenvectors of S. Therefore, the set of vectors
{u1,1 , . . . , u1,m1 , . . . , uk,1 , . . . , uk,mk }
(5.5.2)
T (ui ) = i ui ,
i , i R,
i = 1, . . . , n.
(5.5.3)
B = P t DB P ,
(5.5.4)
where DA , DB are diagonal matrices in R(n, n) whose diagonal entries are the
eigenvalues of A, B, respectively.
172
Exercises
5.5.1 Let U be a finite-dimensional vector space over a field F. Show that for
T L(U ), if T commutes with any S L(U ), then there is some a F
such that T = aI .
5.5.2 If A, B R(n, n) are positive definite matrices and AB = BA, show
that AB is also positive definite.
5.5.3 Let U be an n-dimensional vector space over a field F, where F = R or
C and T , S L(U ). Show that, if T has n distinct eigenvalues in F and
S commutes with T , then there is a polynomial p(t), of degree at most
(n 1), of the form
p(t) = a0 + a1 t + + an1 t n1 ,
a0 , a1 , . . . , an1 F, (5.5.5)
(5.5.6)
u U,
(5.6.1)
u U, v V .
(5.6.2)
173
u U,
(5.6.3)
v V,
(5.6.4)
(5.6.5)
On the other hand, since T T L(U ) is self-adjoint and positive semidefinite, there is an orthonormal basis {u1 , . . . , un } of U consisting of eigenvectors of T T , associated with the corresponding non-negative eigenvalues
n
n
1 , . . . , n . Therefore, for u =
ai ui U with u2U =
ai2 = 1, we
i=1
i=1
have
T (u)2V = (T (u), T (u))V = (u, (T T )(u))U =
n
i ai2 0 , (5.6.6)
i=1
where
0 = max {i },
1in
which proves T
Then (5.6.5) leads to
(5.6.7)
(5.6.8)
0 ,
In particular, the above also gives us a practical method to compute the norm
of a linear mapping T from U into itself by using the generated self-adjoint
mapping T T .
174
(5.6.10)
T L(U, V ).
(5.6.11)
(5.6.12)
In fact, let denote the right-hand side of (5.6.12). Then the Schwarz
inequality (4.3.10) implies that |(u, T (u))| T (u) T for u U satisfying u = 1. So T . On the other hand, let be the eigenvalue of T
such that T = || and u U an associated eigenvector with u = 1. Then
T = || = |(u, T (u))| . Hence (5.6.12) is verified.
We now show how to extend (5.6.12) to evaluate the norm of a general linear
mapping between vector spaces with scalar products.
Theorem 5.14 Let U, V be finite-dimensional vector spaces equipped with
positive definite scalar products (, )U , (, )V , respectively. For T L(U, V ),
we have
T = sup{|(T (u), v)V | | u U, v V , uU = 1, vV = 1}.
Proof
(5.6.13)
(5.6.14)
(5.6.15)
(5.6.16)
(5.6.17)
175
(5.6.18)
176
(5.6.19)
For simplicity, we first assume that (5.6.19) has a solution, that is, there is
some x U such that T (x) v2V = , and we look for some appropriate
condition the solution x will fulfill. For this purpose, we set
f () = T (x + y) v2V ,
y U,
R.
(5.6.20)
Then f (0) = f (). Hence we may expand the right-hand side of (5.6.20)
to obtain
df
= (T (x) v, T (y))V + (T (y), T (x) v)V
0=
d =0
= 2((T T )(x) T (v), y)U ,
y U,
(5.6.21)
v V.
(5.6.22)
(5.6.23)
177
which shows that the quantity T (x) v2V is independent of the solution x
of (5.6.22). Besides, for any test element u U , we rewrite u as u = x + w
where x is a solution to (5.6.22) and w U . Then we have
T (u) v2V =T (x) v2V + 2(T (x) v, T (w))V + T (w)2V
=T (x) v2V + 2((T T )(x) T (v), w)U
+ T (w)2V
=T (x) v2V + T (w)2V
T (x) v2V .
(5.6.24)
obtain
k
ai i ui = T (v).
(5.6.25)
i=1
1
(ui , T (v))U ,
i
i = 1, . . . , k,
(5.6.26)
which leads to the following general solution formula for the equation (5.6.22):
x=
k
1
(ui , T (v))U ui + x0 ,
i
x0 N (T ),
(5.6.27)
i=1
since N(T T ) = N (T ).
Exercises
5.6.1 Show that the mapping T : V U defined in (5.6.2) is linear.
5.6.2 Let U, V be finite-dimensional vector spaces with positive definite scalar
products. For any T L(U, V ) show that the mapping T T L(U )
is positive definite if and only if n(T ) = 0.
178
A R(m, n).
(5.6.28)
x1
x2
= 1 1
2
x1
x2
,
x1
x2
R2 ,
(5.6.29)
where R2 and R3 are equipped with the standard Euclidean scalar products.
(i) Find the eigenvalues of T T and compute T .
(ii) Find the eigenvalues of T T and verify that T T is positive
semi-definite but not positive definite (cf. part (ii) of (2)).
(iii) Check to see that the largest eigenvalues of T T and T T are
the same.
(iv) Compute all eigenvalues of T T and T T and explain the results
in view of Theorem 5.16.
5.6.6 Let A F(m, n) and B F(n, m) where m < n and consider AB
F(m, m) and BA F(n, n). Show that BA has at most m + 1 distinct
eigenvalues in F.
5.6.7 (A specialization of the previous exercise) Let u, v Fn be column
vectors. Hence ut v F and vut F(n, n).
(i) Show that the matrix vut has a nonzero eigenvalue in F only if
ut v
= 0.
(ii) Show that, when ut v
= 0, the only nonzero eigenvalue of vut in F
is ut v so that v is an associated eigenvector.
(iii) Show that, when ut v
= 0, the eigenspace of vut associated with
the eigenvalue ut v is one-dimensional.
179
5.6.8 Consider the Euclidean space Rk equipped with the standard inner product and let A R(m, n). Formulate a solution of the following optimization problem
.
(5.6.30)
inf Ax b2 | x Rn , b Rm ,
by deriving a matrix version of the normal equation.
5.6.9 Consider a parametrized plane in R3 given by
x1 = y1 + y2 ,
x2 = y1 y2 ,
x3 = y1 + y2 ,
y1 , y2 R.
(5.6.31)
Use the least squares approximation to find a point in the plane that is the
closest to the point in R3 with the coordinates x1 = 2, x2 = 1, x3 = 3.
6
Complex quadratic forms
and self-adjoint mappings
In this chapter we extend our study on real quadratic forms and self-adjoint
mappings to the complex situation. We begin with a discussion on the complex version of bilinear forms and the Hermitian structures. We will relate the
Hermitian structure of a bilinear form with representing it by a unique selfadjoint mapping. Then we establish the main spectrum theorem for self-adjoint
mappings. We next focus again on the positive definiteness of self-adjoint mappings. We explore the commutativity of self-adjoint mappings and apply it to
obtain the main spectrum theorem for normal mappings. We also show how to
use self-adjoint mappings to study a mapping between two spaces.
180
f (u, v) = f
n
i=1
xi ui ,
n
yj uj =
j =1
n
181
i,j =1
(6.1.1)
where A = (aij ) = (f (ui , uj )) lies in C(n, n) which is the matrix representation of the sesquilinear form f with respect to B.
Let B = {u 1 , . . . , u n } be another basis of U so that the coordinate vectors of
Hence, using A = (a ij ) = (f (u i , u j ))
u, v are x,
y Cn with respect to B.
C(n, n) to denote the matrix representation of the sesquilinear form f with
respect to B and B = (bij ) C(n, n) the basis transition matrix from B into
B so that
u j =
n
bij ui ,
j = 1, . . . , n,
(6.1.2)
i=1
f (u, v) = x A y = x Ay = x (B AB)y.
(6.1.3)
(6.1.4)
u U.
(6.1.5)
As in the real situation, we may call q the quadratic form associated with f .
We have the following homogeneity property
q(zu) = |z|2 q(u),
u U,
z C.
(6.1.6)
182
(A + A ) + (A A ) x
q(u) =x Ax = x
2
2
1
1
= x (A + A )x + x (A A )x = {q(u)} + i{q(u)}. (6.1.10)
2
2
f (u, v) =
u, v U.
(6.1.11)
(6.1.13)
x, y Cn ,
(6.1.14)
is also a Hermitian sesquilinear form. Conversely, the relation (6.1.1) indicates that a Hermitian sesquilinear form over an n-dimensional complex vector
183
v U,
(6.1.15)
u, v U,
(6.1.16)
u, v U.
(6.1.17)
1
((u + v, T (u + v)) (u, T (u)) (v, T (v)))
2
i
((u + iv, T (u + iv)) + (u, T (u)) + (v, T (v))) ,
2
u, v U.
(6.1.18)
If f is Hermitian, then
f (u, v) = f (v, u) = (T (v), u) = (u, T (v)),
u, v U.
(6.1.19)
184
i, j = 1, . . . , n.
(6.1.20)
185
(6.2.1)
(6.2.2)
which gives us = so that R. This establishes (1). Note that this proof
does not assume that U is finite dimensional.
To establish (2), we use induction on dim(U ).
If dim(U ) = 1, there is nothing to show.
Assume that the statement (2) is valid if dim(U ) n 1 for some n 2.
Consider dim(U ) = n, n 2.
Let 1 be an eigenvalue of T in C. From (1), we know that actually 1 R.
Use E1 to denote the eigenspace of T associated with 1 :
E1 = {u U | T (u) = 1 u} = N (1 I T ).
(6.2.3)
(6.2.4)
(6.2.5)
where 2 , . . . , k are all the distinct eigenvalues of T over the invariant subspace (E1 ) , which are real.
Finally, we need to show that 1 , . . . , k obtained above are all possible
eigenvalues of T . For this purpose, let be an eigenvalue of T and u an
associated eigenvector. Then there are u1 E1 , . . . , uk Ek such that
u = u1 + + uk . Hence the relation T (u) = u gives us 1 u1 + + k uk =
(u1 + + uk ). That is,
(1 )u1 + + (k )uk = 0.
(6.2.6)
186
Since u
= 0, there exists some i = 1, . . . , k such that ui
= 0. Thus, taking
scalar product of the above equation with ui , we get (i )ui 2 = 0, which
implies = i . Therefore 1 , . . . , k are all the possible eigenvalues of T .
To establish (3), we simply construct an orthonormal basis over each
eigenspace Ei , obtained in (2), denoted by Bi , i = 1, . . . , k. Then B =
B1 Bk is a desired orthonormal basis of U as stated in (3).
Let A = (aij ) C(n, n) and consider the mapping T L(Cn ) defined by
T (x) = Ax,
x Cn .
(6.2.7)
Using the mapping (6.2.7) and Theorem 6.4, we may obtain the following
characterization of a Hermitian matrix, which may also be regarded as a matrix
version of Theorem 6.4.
Theorem 6.5 A matrix A C(n, n) is Hermitian if and only if there is a
unitary matrix P C(n, n) such that
A = P DP ,
(6.2.8)
where D R(n, n) is a real diagonal matrix whose diagonal entries are the
eigenvalues of A.
Proof If (6.2.8) holds, it is clear that A is Hermitian, A = A .
Conversely, assume A is Hermitian. Using B0 = {e1 , . . . , en } to denote the
standard basis of Cn equipped with the usual Hermitian positive definite scalar
product (x, y) = x y for x, y Cn , we see that B0 is an orthonormal basis of
Cn . With the mapping T defined by (6.2.7), we have (T (x), y) = (Ax) y =
x Ay = (x, T (y)) for any x, y Cn , and
T (ej ) =
n
aij ei ,
j = 1, . . . , n.
(6.2.9)
i=1
187
u U.
(6.2.10)
(i) Show that T (u)(t) = tu(t) for t [0, 1] defines a self-adjoint mapping in over U .
(ii) Show that T does not have an eigenvalue whatsoever.
6.2.8 Let T L(U ) be self-adjoint where U is a finite-dimensional complex
vector space with a positive definite scalar product.
188
u U,
u
= 0.
(6.3.1)
u U,
u
= 0.
(6.3.2)
x Cn ,
x
= 0.
(6.3.3)
u U.
(6.3.4)
Thus we see that the positive definiteness of q and that of T are equivalent.
n
xi ui
Besides, let {u1 , . . . , un } be any basis of U and write u U as u =
i=1
A = (f (ui , uj )),
(6.3.5)
189
u U.
(6.3.6)
i, j = 1, . . . , n,
(6.3.7)
i, j = 1, . . . , n.
(6.3.8)
Note also that if {u1 , . . . , un } is an orthonormal basis of U , then the quantities aij (i, j = 1, . . . , n) in (6.3.7) simply give rise to the matrix representation
n
of T with respect to this basis. That is, T (uj ) =
aij ui (j = 1, . . . , n).
i=1
190
ann
where = (a1n , . . . , an1,n )t Cn1 . By the inductive assumption at n 1,
we know that the Hermitian matrix An1 is positive definite. Hence, applying
191
ann
ann
0 1
(6.3.11)
where = (B )1 (b1 , . . . , bn1 )t Cn1 . Since det(A) > 0, we obtain
with some suitable row operations the result
In1
det(A)
= det
= ann |b1 |2 |bn1 |2 > 0,
det(An1 )
ann
(6.3.12)
with det(B B) = det(An1 ). Taking x Cn and setting
B 0
y=
x (y1 , . . . , yn1 , yn )t ,
0 1
we have
x Ax = y
In1
ann
(6.3.13)
= (y 1 + b1 y n , . . . , y n1 + bn1 y n ,
y1
.
b1 y 1 + + bn1 y n1 + ann y n )
..
yn
(6.3.14)
192
a
ann
0 a
L 2 L2
L2
=
.
(6.3.16)
(L2 ) + a 2
Therefore we arrive at the relations
An1 = L2 L2 ,
= L2 ,
+ a 2 = ann .
(6.3.17)
Thus, if we require that all the diagonal entries of L2 are positive, then the
inductive assumption gives us L2 = L1 . So the vector is also uniquely
determined, = L1
1 . Thus it remains only to show that the number a may
be uniquely determined as well. To this end, we need to show in (6.3.17) that
ann > 0.
In fact, inserting L1 = B in (6.3.11), we have = . That is,
An1
In1
L1 0
L1
A=
=
0 1
ann
ann
0
(6.3.18)
0
1
.
(6.3.19)
193
Hence (6.3.18) follows as a consequence of det(A) > 0. Thus the third equation in (6.3.17) leads to the unique determination of the positive number a as
in the real situation:
a = ann > 0.
(6.3.20)
The inductive proof is now complete.
Exercises
6.3.1 Prove Theorem 6.7.
6.3.2 Prove Theorem 6.8.
6.3.3 Let A C(n, n) be nonsingular. Prove that there is a unique lower
triangular matrix L C(n, n) with positive diagonal entries so that
AA = LL .
6.3.4 Show that if A1 , . . . , Ak C(n, n) are unitary matrices, so is their
product A = A1 Ak .
6.3.5 Show that if A C(n, n) is Hermitian and all its eigenvalues are 1
then A is unitary.
6.3.6 Assume that u Cn is a nonzero column vector. Show that there is a
unitary matrix Q C(n, n) such that
Q (uu )Q = diag{u u, 0, . . . , 0}.
(6.3.21)
A = i
2+i
i
4
1+i
2i
1i
(6.3.23)
positive definite?
6.3.9 Let A = (aij ) C(n, n) be a positive semi-definite Hermitian matrix.
Show that aii 0 for i = 1, . . . , n and that
det(A) a11 ann .
(6.3.24)
(6.3.25)
194
(6.3.27)
(ii) Show that the equality in (6.3.27) occurs if and only if u, v are
linearly dependent.
195
(6.4.1)
1
1
(T + T ) + (T T ) = P + Q,
2
2
(6.4.2)
1
1
(T + T ) L(U ) is self-adjoint or Hermitian and Q = (T
2
2
T ) L(U ) is anti-self-adjoint or anti-Hermitian since it satisfies Q = Q.
From
where P =
u, v U, (6.4.3)
i = 1, . . . , n,
(6.4.4)
Therefore, we have
(T (uj ), ui ) =
n
bkj uk , ui
= bij ,
i, j = 1, . . . , n,
(6.4.6)
k=1
and
(T (uj ), ui ) = (uj , T (ui )) = (uj , i ui ) = i ij ,
i, j = 1, . . . , n. (6.4.7)
196
i = 1, . . . , n,
(6.4.8)
(6.4.9)
197
The matrix versions of Definition 6.12 and Theorems 6.13 and 6.14 may be
stated as follows.
Definition 6.15 A matrix A C(n, n) is said to be normal if it satisfies the
property AA = A A.
Theorem 6.16 Normal matrices have the following characteristic properties.
(1) A matrix A C(n, n) is diagonalizable through a unitary matrix, that is,
there is a unitary matrix P C(n, n) and a diagonal matrix D C(n, n)
such that A = P DP , if and only if A is normal.
(2) Two normal matrices A, B C(n, n) are simultaneously diagonalizable
through a unitary matrix, that is, there is a unitary matrix P C(n, n)
and two diagonal matrices D1 , D2 C(n, n) such that A = P D1 P ,
B = P D2 P , if and only if A, B are commutative: AB = BA.
To prove the theorem, we may simply follow the standard way to associate
a matrix A C(n, n) with the mapping it generates over Cn through x # Ax
for x Cn as before and apply Theorems 6.13 and 6.14.
Moreover, if A C(n, n) is unitary, AA = A A = In , then there is a
unitary matrix P C(n, n) and a diagonal matrix D = diag{1 , . . . , n } with
|i | = 1, i = 1, . . . , n, such that A = P DP .
Exercises
6.4.1 Let U be a finite-dimensional complex vector space with a positive
definite scalar product (, ) and T L(U ). Show that T is normal if
and only if T satisfies the identity
T (u) = T (u),
u U.
(6.4.10)
198
6.4.4 Assume that A C(n, n) is triangular and normal. Show that A must
be diagonal.
6.4.5 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T L(U ) be normal.
(i) Show that T is self-adjoint if and only if all the eigenvalues of T
are real.
(ii) Show that T is anti-self-adjoint if and only if all the eigenvalues of
T are imaginary.
(iii) Show that T is unitary if and only if all the eigenvalues of T are of
absolute value 1.
6.4.6 Let T L(U ) be a normal mapping where U is a finite-dimensional
complex vector space with a positive definite scalar product.
(i) Show that if T k = 0 for some integer k 1 then T = 0.
(ii) (A sharpened version of (i)) Given u U show that if T k (u) = 0
for some integer k 1 then T (u) = 0.
(This is an extended version of Exercise 6.2.8.)
6.4.7 If R, S L(U ) are Hermitian and commutative, show that T = R iS
is normal.
6.4.8 Consider the matrix
A=
(6.4.11)
199
6.4.11 Let U be a complex vector space with a positive definite scalar product
(, ). A mapping T L(U ) is called hyponormal if it satisfies
(u, (T T T T )(u)) 0,
u U.
(6.4.12)
(i) Show that T is normal if any only if T and T are both hyponormal.
(ii) Show that T being hyponormal is equivalent to
T (u) T (u),
u U.
(6.4.13)
u U,
(6.5.1)
defines an element f in U which depends on v V . So there is a unique element in U depending on v, say T (v), such that f (u) = (T (v), u)U . That is,
(v, T (u))V = (T (v), u)U ,
u U, v V .
(6.5.2)
200
u U,
v V,
(6.5.3)
(6.5.4)
On the other hand, using the fact that T T L(U ) is self-adjoint and positive semi-definite, we know that there is an orthonormal basis {u1 , . . . , un } of
U consisting of eigenvectors of T T , with the corresponding non-negative
n
eigenvalues, 1 , . . . , n , respectively. Hence, for any u =
ai ui U with
u2U
n
i=1
|ai | = 1, we have
2
i=1
n
i |ai |2 0 ,
i=1
(6.5.5)
where
0 = max {i } 0,
1in
which shows T
Then (6.5.4) gives us
(6.5.6)
(6.5.7)
(6.5.9)
201
T L(U, V ).
(6.5.10)
(6.5.11)
(6.5.12)
202
and of dimensions n and m, respectively. Let 1 , . . . , n be all the eigenvalues of the positive semi-definite mapping T T L(U ), among which
1 , . . . , k are positive, say. Use {u1 , . . . , uk , . . . , un } to denote an orthonormal basis of U consisting of eigenvectors of T T associated with the eigenvalues 1 , . . . , k , . . . , n . Then we have
(T (ui ), T (uj ))V = (ui , (T T )(uj ))U
= j (ui , uj )U = j ij ,
i, j = 1, . . . , n.
(6.5.13)
This simple expression indicates that T (ui ) = 0 for i > k (if any) and that
{T (u1 ), . . . , T (uk )} forms an orthogonal basis of R(T ). In particular, k =
r(T ).
Now set
1
T (ui ), i = 1, . . . , k.
(6.5.14)
vi =
T (ui )V
Then {v1 , . . . , vk } is an orthonormal basis for R(T ). Taking i = j = 1, . . . , k
i = 1, . . . , k,
T (uj ) = 0,
j >k
(if any).
(6.5.16)
AP =Q or A = QP ,
(6.5.17)
D 0
=
R(m, n), D = diag{1 , . . . , k },
0 0
203
204
Note that most of the exercises in Chapter 5 may be restated in the context
of the complex situation of this chapter and are omitted.
7
Jordan decomposition
(7.1.1)
206
Jordan decomposition
(7.1.2)
(7.1.3)
207
(7.1.4)
is valid for arbitrary t. This fact will be our starting point in the subsequent
development.
A polynomial p P of degree at least 1 is called a prime polynomial or an
irreducible polynomial if p cannot be factored into the product of two polynomials in P of degrees at least 1. Two polynomials are said to be equivalent if
one is a scalar multiple of the other.
Exercises
7.1.1 For f, g P show that f |g if and only if I(f ) I(g).
7.1.2 Consider I P over a field F given by
I = {f P | f (a1 ) = = f (ak ) = 0},
(7.1.5)
(7.1.6)
7.1.6 Let U be a finite-dimensional vector space over a field F and P the vector space of all polynomials over F. Show that I = {p P | p(T ) = 0}
is an ideal of P. Let g P be such that I = I(g) and the coefficient of
the highest-degree term of g is 1. Is g the minimal polynomial of T that
has the minimum degree among all the polynomials that annihilate T ?
208
Jordan decomposition
the
basis
(7.2.2)
(7.2.3)
209
Vi = N (pini (T )),
i = 1, . . . , k.
(7.2.5)
(7.2.6)
f1 (T )g1 (T ) + + fk (T )gk (T ) = I.
(7.2.7)
Therefore, we have
ui = fi (T )gi (T )u,
i = 1, . . . , k.
(7.2.8)
(7.2.9)
Vj = {0}.
Wi Vi
1j k,j
=i
In fact, since pini and gi are relatively prime, there are polynomials qi and ri
in P such that
qi pini + ri gi = 1.
(7.2.10)
qi (T )pini (T ) + ri (T )gi (T ) = I.
(7.2.11)
(7.2.12)
1j k,j
=i
(7.2.13)
210
Jordan decomposition
Exercises
7.2.1 Let S, T L(U ), where U is a finite-dimensional vector space over a
field F. Show that if the characteristic polynomials pS () and pT () of
S and T are relatively prime then pS (T ) and pT (S) are both invertible.
7.2.2 Let S, T L(U ) where U is a finite-dimensional vector space over
a field F. Use the previous exercise to show that if the characteristic
polynomials of S and T are relatively prime and R L(U ) satisfies
R S = T R then R = 0.
7.2.3 Let U be a finite-dimensional vector space over a field F and T L(U ).
Show that T is idempotent, T 2 = T , if and only if
r(T ) + r(I T ) = dim(U ).
(7.2.14)
(7.2.15)
(7.2.16)
211
(7.3.3)
(7.3.4)
(7.3.5)
212
Jordan decomposition
u1 , T (u1 ), . . . , uk , T (uk )
are linearly independent vectors in U . To see this, consider
a1 u1 + b1 T (u1 ) + + ak uk + bk T (uk ) = 0,
ai , bi F,
i = 1, . . . , k.
(7.3.7)
(7.3.8)
(7.3.10)
ml 1
(v1 ), . . . , vl , TV (vl ), . . . , TV
(vl )
(7.3.11)
213
(7.3.12)
mi
l0
l
(ai wi + bi vi0 ) +
bij T j (ui ) = 0,
(7.3.14)
i=1 j =0
i=1
where ai s, bi s, and bij s are scalars. Applying T to (7.3.14) and using (7.3.12)
we arrive at
l0
ai vi0 +
l m
i 1
bij TV (vi ) = 0,
(7.3.15)
i=1 j =0
i=1
i = 1, . . . , l0 ;
bij = 0,
j = 0, 1, . . . , mi 1,
i = 1, . . . , l.
(7.3.16)
(7.3.17)
ml 1
(v1 ) + + blml TV
(vl ) = 0. (7.3.18)
(7.3.19)
as well, which establishes that the vectors given in (7.3.13) are linearly independent.
It is clear that
(7.3.20)
(7.3.21)
214
Jordan decomposition
l0
bi vi0
bij TV (vi )
i=1 j =0
i=1
l0
l m
i 1
bi T (wi ) +
mi
l
bij T j (ui )
(7.3.22)
i=1 j =1
i=1
for some unique bi s and bij s in F, which gives rise to the result
u
l0
bi wi
i=1
l m
i 1
bij T j (ui ) N (T ).
(7.3.23)
i=1 j =0
l0
i=1
bi wi
l m
i 1
bij T j (ui ) =
i=1 j =0
k0
i=1
ai u0i +
l0
ci vi0 +
i=1
l
di T mi (ui )
i=1
(7.3.24)
for some unique ai s, ci s, and di s in F. Thus we conclude that the vectors
u01 , . . . , u0k0 , w1 , T (w1 ), . . . , wl0 , T (wl0 ),
ui , T (ui ), . . . , T mi (ui ), i = 1, . . . , l,
(7.3.25)
215
n1 , . . . , nk N.
(7.3.26)
Vi = N ((T i I )ni ),
i = 1, . . . , k. (7.3.27)
(7.3.28)
(7.3.29)
u01 , . . . , u0m0 ,
(7.3.30)
(7.3.31)
216
Jordan decomposition
0 0
0 S1
0 ...
0
0
,
0
Sl
(7.3.32)
where, in the diagonal of the above matrix, 0 is the zero matrix of size m0 m0
and S1 , . . . , Sl are shift matrices of sizes m1 m1 , . . . , ml ml , respectively.
Therefore the matrix of T = (T i I ) + i I with respect to the same basis is
simply
0 0 0
0 S1 0
Ai =
(7.3.33)
+ i Idi .
0 ... 0
0
0 Sl
Consequently, it follows immediately that the characteristic polynomial of T
restricted to Vi may be computed by
pVi () = det(Idi Ai ) = ( i )di .
(7.3.34)
(7.3.35)
(7.3.36)
217
(7.3.37)
In other words, the geometric multiplicity is less than or equal to the algebraic
multiplicity, of any eigenvalue, of a linear mapping.
Exercises
7.3.1 Let U be a finite-dimensional vector space over a field F and T L(U ).
Assume that 0 F is an eigenvalue of T and consider the eigenspace
E0 = N(T 0 I ) associated with 0 . Let {u1 , . . . , uk } be a basis of E0
and extend it to obtain a basis of U , say B = {u1 , . . . , uk , v1 , . . . , vl }.
Show that, using the matrix representation of T with respect to the
basis B, the characteristic polynomial of T may be shown to take the
form
pT () = ( 0 )k q(),
(7.3.38)
where q() is a polynomial of degree l with coefficients in F. In particular, use (7.3.38) to infer again, without relying on Theorem 7.6, that the
geometric multiplicity does not exceed the algebraic multiplicity, of the
eigenvalue 0 .
7.3.2 Let U be a finite-dimensional vector space over a field F and T L(U ).
We say that U is a cyclic vector space with respect to T if there is a
vector u U such that the vectors
u, T (u), . . . , T n1 (u),
(7.3.39)
(7.3.40)
with specifying
T (T n1 (u)) = an1 T n1 (u) + + a1 T (u) + a0 u,
a0 , a1 , . . . , an1 F.
(7.3.41)
218
Jordan decomposition
(7.3.42)
(7.4.1)
(7.4.2)
u0i,1 , . . . , u0i,l 0 ,
u , (T I )(u ), . . . , (T )mi,li 1 (u ).
i,li
i,li
i,li
219
(7.4.4)
i u0i,s , s = 1, . . . , li0 ,
T (u0i,s ) =
T (ui,1 ) =
(T i I )(ui,1 ) + i ui,1 ,
T ((T i I )(ui,1 )) =
(T i I )2 (ui,1 )
+i (T i I )(ui,1 ),
T ((T )mi,1 1 (u )) =
i (T i )mi,1 1 (ui,1 ),
i
i,1
(7.4.5)
(T i I )(ui,li ) + i ui,li ,
T (ui,li ) =
T ((T i I )(ui,li )) =
(T i I )2 (ui,li )
+i (T i I )(ui,li ),
mi,li 1
(ui,li )) =
i (T i )mi,li 1 (ui,li ).
T ((T i )
From (7.4.5), we see that, as an element in L(Vi ), the matrix representation of
T with respect to the basis (7.4.3) is
0
0
Ji,0
0 Ji,1
0
Ji
(7.4.6)
,
..
0
.
0
0
0 Ji,li
where Ji,0 = i Il 0 and
i
Ji,s
1
..
.
..
.
i
..
.
0
..
..
.
..
..
..
.
i
(7.4.7)
220
Jordan decomposition
u0i,1 , . . . , u0i,l 0 ,
(T )mi,li 1 (u ), . . . , (T I )(u ), u .
i
i,li
i,li
i,li
With the choice of these reordered basis vectors, the submatrix Ji,s instead
takes the following updated form,
1
0 0
i
.
..
0
. ..
1
.. . .
.
.
.
.
Ji,s = .
C(mi,s , mi,s ), s = 1, . . . , li .
.
.
. 0
..
..
.
.
. 1
.
0
i
(7.4.9)
The submatrix Ji,s given in either (7.4.7) or (7.4.9) is called a Jordan block.
To simplify this statement, Ji,0 = i Il 0 is customarily said to consist of li0
i
1 1 (degenerate) Jordan blocks.
Consequently, if we choose
B = B1 Bk ,
(7.4.10)
J1 0 0
.
..
..
.
. ..
0
,
J =
(7.4.11)
. .
..
.. ... 0
0
0 Jk
which is called a Jordan canonical form or a Jordan matrix.
We may summarize the above discussion into the following theorem, which
is the celebrated Jordan decomposition theorem.
Theorem 7.7 Let U be an n-dimensional vector space over C and T L(U )
so that its distinct eigenvalues are 1 , . . . , k with respective algebraic multiplicities n1 , . . . , nk . Then the following hold.
221
(7.4.12)
u1 V1 , . . . , uk Vk .
(7.4.13)
Thus, we have
mT (T )(u) = (T 1 I )m1 (T k I )nk (u1 + + uk )
=
k #
i I )mi ]
(T 1 I )m1 [(T
i=1
(T k I )nk (T i I )mi (ui )
= 0,
(7.4.14)
(7.4.15)
222
Jordan decomposition
223
(7.4.16)
x1
xn
.
.
n
(7.4.17)
T (x) =
.. , x = .. C .
x1
xn
A= 1
0
0 0
0 1 ,
(7.4.18)
1 0
where a R.
(i) For what value(s) of a can or cannot the matrix A be diagonalized?
(ii) Find the Jordan forms of A corresponding to various values of a.
7.4.9 Let A C(n, n) and k 2 be an integer such that A Ak .
(i) Show that, if is an eigenvalue of A, so is k .
(ii) Show that, if in addition A is nonsingular, then each eigenvalue of
A is a root of unity. In other words, if C is an eigenvalue of A,
then there is an integer s 1 such that s = 1.
7.4.10 Let A C(n, n) satisfy Am = aIn for some integer m 1 and nonzero
a C. Use the information about the minimal polynomial of A to
prove that A is diagonalizable.
224
Jordan decomposition
1 1 1
n 0 0
1 1 1
b2 0 0
A= . . .
, (7.4.19)
, B = . . .
.. ..
.. ..
. . ...
. . ...
bn 0 0
1 1 1
where b2 , . . . , bn R, are similar and diagonalizable.
7.4.13 Show that the matrices
1 0 0
2 0 0
A = 0 0 1 , B = 0 1 0 ,
0
(7.4.20)
(7.4.21)
n ni for i = 1, . . . , k.
7.4.16 Show that the matrix
0
A=
0
a4
0
,
1
(7.4.22)
225
Avi = ui ,
i = 1, . . . , m.
(7.4.23)
8
Selected topics
In this chapter we present a few selected subjects that are important in applications as well, but are not usually included in a standard linear algebra course.
These subjects may serve as supplemental or extracurricular materials. The
first subject is the Schur decomposition theorem, the second is about the classification of skewsymmetric bilinear forms, the third is the PerronFrobenius
theorem for positive matrices, and the fourth concerns the Markov or stochastic
matrices.
j
bij ui ,
j = 1, . . . , n,
(8.1.1)
i=1
227
(8.1.2)
(8.1.3)
j
bij ui ,
j = 1, . . . , n 1.
(8.1.4)
i=1
j 1
(8.1.5)
bij ui .
i=1
j 1
= (T b11 I ) (T bj 1,j 1 I )
bij ui = 0,
(8.1.6)
i=1
(8.1.7)
we have
pT (T )(ui ) = (T b11 I ) (T bnn I )(ui ) = 0,
i = 1, . . . , n. (8.1.8)
228
Selected topics
Consequently, pT (T ) = 0, as anticipated.
It is clear that, in Theorem 8.1, if the upper triangular matrix B C(n, n)
is diagonal, then the orthonormal basis B is made of the eigenvectors of T .
In this situation T is normal. Likewise, if B is diagonal and real, then T is
self-adjoint.
We remark that Theorem 8.1 may also be proved without resorting to the
adjoint mapping.
In fact, use the notation of Theorem 8.1 and proceed to the nontrivial situation dim(U ) = n 2 directly. Let 1 C be an eigenvalue of T and u1 an
associated unit eigenvector. Then we have the orthogonal decomposition
U = Span{u1 } V ,
V = (Span{u1 }) .
(8.1.9)
j
bij ui ,
j = 2, . . . , n,
(8.1.10)
i=2
for some bij s in C. Of course the vectors u1 , u2 , . . . , un now form an orthonormal basis of U . Moreover, in view of (8.1.10) and R(I P ) = Span{u1 },
we have
T (u1 ) = 1 u1 b11 u1 ,
T (uj ) = ((I P ) T )(uj ) + (P T )(uj )
= b1j u1 +
j
bij ui ,
b1j C,
(8.1.11)
j = 2, . . . , n,
i=2
(8.1.12)
229
The proof of Theorem 8.2 may be obtained by applying Theorem 8.1, where
we take U = Cn with the standard Hermitian scalar product and define T
L(Cn ) by setting T (u) = Au for u Cn . In fact, with B = {u1 , . . . , un } being
the orthonormal basis of Cn stated in Theorem 8.1, the unitary matrix P in
(8.1.12) is such that the ith column vector of P is simply ui , i = 1, . . . , n.
From (8.1.12) we see immediately that A is normal if and only if B is diagonal and that A is Hermitian if and only if B is diagonal and real.
Exercises
8.1.1 Show that the matrix B may be taken to be lower triangular in Theorems 8.1 and 8.2.
8.1.2 Let A = (aij ), B = (bij ) C(n, n) be stated in the relation (8.1.12).
Use the fact Tr(A A) = Tr(B B) to infer the identity
n
i,j =1
|aij |2 =
|bij |2 .
(8.1.13)
1ij n
(8.1.14)
8.1.4 For A R(n, n) assume that the roots of the characteristic polynomial
of A are all real. Establish the real version of the Schur decomposition
theorem, which asserts that there is an orthogonal matrix P R(n, n)
and an upper triangular matrix B R(n, n) such that A = P t BP and
that the diagonal entries of B are all the eigenvalues of A. Can you prove
a linear mapping version of the theorem when U is a real vector space
with a positive definite scalar product?
8.1.5 Show that if A R(n, n) is normal and all the roots of its characteristic
polynomial are real then A must be symmetric.
8.1.6 Let U be a finite-dimensional complex vector space with a positive definite scalar product and S, T L(U ). If S and T are commutative, show
that U has an orthonormal basis, say B, such that, with respect to B, the
matrix representations of S and T are both upper triangular.
8.1.7 Let U be an n-dimensional (n 2) complex vector space with a positive
definite scalar product and T L(U ). If C is an eigenvalue of T u
U an associated eigenvector, then the quotient space V = U/Span{u} is
of dimension n 1.
(i) Define a positive definite scalar product over V and show that T
induces an element in L(V ).
230
Selected topics
,
(8.1.15)
Q1 AQ1 =
0 An1
where An1 C(n 1, n 1) and Cn1 is a row vector.
(ii) Apply the inductive assumption to get a unitary element Q C(n1,
n 1) so that Q An1 Q is upper triangular. Show that
1 0
(8.1.16)
Q2 =
0 Q
is a unitary element in C(n, n) such that
(Q1 Q2 ) A(Q1 Q2 ) = Q2 Q1 AQ1 Q2
(8.1.17)
u, v U.
(8.2.1)
n
n
n
f (u, v) = f
xi ui ,
yj uj =
xi f (ui , uj )yj = x t Ay, (8.2.2)
i=1
j =1
i,j =1
231
(8.2.3)
n
j = 1, . . . , n,
bij ui ,
(8.2.4)
i=1
0 0
1
.
..
..
..
..
..
.
.
.
.
.
.
.
.
..
..
..
.
.
k
(8.2.5)
= CAC t ,
..
0
0 .
.
.
.
..
..
..
.
.
.
. .
.
.
.
.
0 0
where
i =
ai
ai
i = 1, . . . , k,
(8.2.6)
232
Selected topics
t
EAE =
, 1 =
,
(8.2.8)
a1 0
t An2
where F(2, n 2) and An2 F(n 2, n 2) is skewsymmetric.
Consider P F(n, n) of the form
0
I2
,
(8.2.9)
P =
t In2
where F(2, n 2) is to be determined. Then det(P ) = 1 and
I2
t
P =
.
0 In2
Consequently, we have
I2
t t
P EAE P =
t
=
In2
An2
1 +
1
t
I2
0
In2
1 t + t + An2
t
(8.2.10)
(8.2.11)
233
(8.2.12)
(8.2.14)
where
G = t 1 t + t + An2
(8.2.15)
QP EAE P Q =
t
DGD t
= ,
(8.2.17)
0
Dk 0
(8.2.18)
= Dk 0 0 F(n, n),
0
234
Selected topics
(8.2.19)
0
1 0
1
.
0
..
..
..
ai
.
.
. , =
, i = 1, . . . , k.
R= .
i
1
0
0
k
0
ai
0 0 In2k
(8.2.20)
Then
J2
.
.
RR t =
.
0
..
.
0
..
.
J2
where
J2 =
0
..
. ,
(8.2.21)
(8.2.22)
Ik
Ik
(8.2.24)
(8.2.25)
235
(8.2.26)
(8.2.27)
f (vi , wi ) = ai
= 0, i = 1, . . . , k,
(8.2.28)
f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k,
f (vi , wj ) = 0, i
= j, i, j = 1, . . . , k.
Of particular interest is when f is non-degenerate over U . In such a situation, dim(U ) must be an even number, 2k.
Definition 8.4 Let U be a vector space of 2k dimensions. A skewsymmetric
bilinear form f : U U F is called symplectic if f is non-degenerate. An
even dimensional vector space U equipped with a symplectic form is called a
symplectic vector space. A basis {v1 , . . . , vk , w1 , . . . , wk } of a 2k-dimensional
symplectic vector space U equipped with the symplectic form f is called symplectic if
f (vi , wi ) = 1, i = 1, . . . , k,
(8.2.29)
f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k,
f (vi , wj ) = 0, i
= j, i, j = 1, . . . , k.
A symplectic basis is also called a Darboux basis.
Therefore, we have seen that a real symplectic vector space always has a
symplectic basis.
Definition 8.5 Let U be a symplectic vector space equipped with a symplectic
form f . A subspace V of U is called isotropic if f (u, v) = 0 for any u, v V .
236
Selected topics
W = Span{w1 , . . . , wk }
(8.2.30)
u, v U,
(8.2.31)
u, v U,
(8.2.32)
is Hermitian. It is clear that the matrix representation, say A C(n, n), with
respect to any basis of U of a Hermitian skewsymmetric form is anti-Hermitian
or skew-Hermitian, A = A. Hence iA is Hermitian. Applying the knowledge about Hermitian forms and Hermitian matrices studied in Chapter 6, it is
not hard to come up with a complete understanding of skew-Hermitian forms
and matrices, in the same spirit of Theorem 8.3. We leave this as an exercise.
Exercises
8.2.1 Let f be a bilinear form over a vector space U . Show that f is skewsymmetric if and only if f (u, u) = 0 for any u U .
8.2.2 Let f be a skewsymmetric bilinear form over U and S U a non-empty
subset. Show that S defined in (8.2.25) is a subspace of U .
8.2.3 Let U be a finite-dimensional vector space and f a skewsymmetric bilinear form over U . Define U0 by (8.2.26) and use A to denote the matrix
representation of f with respect to any given basis of U . Show that
dim(U0 ) = dim(U ) r(A).
(8.2.33)
237
(8.3.1)
x1
.
n
x=
.. R ,
xn
(8.3.2)
238
Selected topics
(8.3.3)
(8.3.4)
of R . Then define
n
We can show that is an interval in R of the form [0, r] for some r > 0.
In fact, take a test vector, say y = (1, . . . , 1)t Rn . Since A is positive, it
is clear that Ay y if satisfies
n
aij i = 1, . . . , n .
0 < min
(8.3.5)
j =1
n
aij xj i = 1, . . . , n
= x = Ax = max
j =1
n
aij i = 1, . . . , n ,
(8.3.6)
max
j =1
(8.3.7)
Then we have seen that r satisfies 0 < r < . Hence there is a sequence
{k } such that
r = lim k .
k
(8.3.8)
k = 1, 2, . . . .
(8.3.9)
(8.3.10)
239
n
j =1
aij aj rai
(8.3.11)
j =1
s > 0.
(8.3.12)
j =1
(8.3.13)
j =1
which leads to
n
(8.3.14)
j =1
Thus Az > rz. Set v = z/z. Then v S and Av > rv. Hence we may
choose > 0 small such that Av > (r + )v. So r + , which contradicts
the definition of r made in (8.3.7).
Therefore Au = ru and r is a positive eigenvalue of A.
The positivity of u = (ai ) follows easily since u S and A > 0 so that
ru = Au leads to
ai =
n
aij aj > 0,
i = 1, . . . , n,
(8.3.15)
j =1
because u
= 0. That is, a non-negative eigenvector of A associated to a positive
eigenvalue must be positive.
Let v be any non-negative eigenvector of A associated to r. We show that
there is a positive number a such that v = au. For this purpose, we construct
the vector
us = u sv,
s > 0.
(8.3.16)
(8.3.17)
240
Selected topics
v Cn ,
(8.3.18)
has no solution.
Suppose otherwise that the dimension of the eigenspace Er is greater than
one. Then there is some v Er that is not a scalar multiple of u. Write v as
v = v1 + iv2 with v1 , v2 Rn . Then Av1 = rv1 and Av2 = rv2 . We assert
that one of the sets of vectors {u, v1 } and {u, v2 } must be linearly independent
over R. Otherwise there are a1 , a2 R such that v1 = a1 u and v2 = a2 u which
imply v = (a1 + ia2 )u, a contradiction. Use w to denote either v1 or v2 which
is linearly independent from u over R. Then we know that w can never be
non-negative. Thus there are components of w that have different signs. Now
consider the vector
us = u + sw,
s > 0.
(8.3.19)
It is clear that us > 0 when s > 0 is sufficiently small since u > 0. So there
is some s0 > 0 such that us0 0 but a component of us0 is zero. However,
because us0
= 0 owing to the presence of a positive component in w, we arrive
at a contradiction because us0 is seen to be an eigenvector of A associated to r.
We next show that (8.3.18) has no solution. Since u Rn , we need only to
consider v Rn in (8.3.18). We proceed as follows.
Consider
ws = v + su,
s > 0.
(8.3.20)
Since Au = ru, we see that ws also satisfies (8.3.18). Take s > 0 sufficiently
large so that ws > 0. Hence (A rIn )ws = u > 0 or
Aws > rws .
(8.3.21)
241
n
aij |bj |,
i = 1, . . . , n.
(8.3.22)
j =1
|b1 |
.
w=
.. S.
(8.3.23)
|bn |
Since (8.3.22) implies Aw rw, we see in view of the definition of that
|| . Using (8.3.7), we have || r.
If || < r, there is nothing more to do. If || = r, the discussion just made
in the earlier part of the proof shows that w is a non-negative eigenvector of
A associated to r and thus equality holds in (8.3.22) for all i = 1, . . . , n.
Therefore, since A > 0, the complex numbers b1 , . . . , bn must share the same
phase angle, R, so that
bi = |bi |ei ,
i = 1, . . . , n.
(8.3.24)
(8.3.25)
(8.3.26)
242
Selected topics
Exercises
8.3.1 Let A = (aij ) R(n, n) be a positive matrix and r > 0 the Perron
Frobenius eigenvalue of A. Show that r satisfies the estimate
n
n
min
aij r max
aij .
(8.3.27)
1in
1in
j =1
A= 2
3
j =1
1 .
(8.3.28)
n
1
(8.3.29)
r = 0n
aj k ak .
i=1 ai
j,k=1
aij = 1,
i = 1, . . . , n,
(8.4.1)
243
that is, the components of each row vector in A sum up to 1, then A is called
a Markov or stochastic matrix. If there is an integer m 1 such that Am is a
positive matrix, then A is called a regular Markov or regular stochastic matrix.
A few immediate consequences follow directly from the definition of a
Markov matrix and are stated below.
Theorem 8.8 Let A = (aij ) R(n, n) be a Markov matrix. Then 1 is an
eigenvalue of A which enjoys the following properties.
(1) The vector u = (1, . . . , 1)t Rn is an eigenvector of A associated to the
eigenvalue 1.
(2) Any eigenvalue C of A satisfies
|| 1.
(8.4.2)
Proof Using (8.4.1), the fact that u = (1, . . . , 1)t satisfies Au = u may be
checked directly.
Now let C be any eigenvalue of A and v = (bi ) Cn an associated
eigenvector. Then there is some i0 = 1, . . . , n such that
|bi0 | = max{|bi | | i = 1, . . . , n}.
(8.4.3)
n
(8.4.4)
ai0 j bj ,
j =1
n
j =1
n
ai0 j = |bi0 |.
(8.4.5)
j =1
244
Selected topics
(8.4.6)
(8.4.7)
J =
..
.
0
..
.
..
.
..
.
1
..
..
..
..
0
..
.
,
0
(8.4.8)
(8.4.9)
245
J m = (I + P )m
=
m
s=0
l1
s=0
m!
ms P s
s!(m s)!
m!
ms P s
s!(m s)!
= m I + m1 mP + + ml+1
m(m 1) (m [l 2]) l1
P ,
(l 1)!
(8.4.10)
which is an upper triangular matrix with the diagonal entries all equal to m .
From (8.4.6) and (8.4.7), we have
Am = Cdiag{J1m , . . . , Jkm }C 1 .
(8.4.11)
(8.4.12)
n
bi = 1.
(8.4.13)
i=1
246
Selected topics
(8.4.14)
(8.4.15)
We see that the first row vector of D, say w for some w C , satisfies
w t A = w t . Hence At w = w. Since 1 is a simple root of the characteristic
polynomial of At , we conclude that there is some b C, b
= 0, such that
w = bv where v satisfies (8.4.13).
Since D = C 1 , we have w t u = 1, which leads to
t
n
bi = ab = 1.
(8.4.16)
i=1
(8.4.17)
Since each of the Jordan blocks J1 , . . . , Jk is of the form (8.4.9) for some
C satisfying || < 1, we have seen that
J1m , . . . , Jkm 0
as m .
(8.4.18)
(8.4.19)
Inserting the results that the first column vector of C is u = a(1, . . . , 1)t , the
first row vector of D is w t = bv t , where v satisfies (8.4.13), and ab = 1 into
(8.4.19), we obtain
b1 bn
.
..
..
(8.4.20)
lim Am =
. = K,
m
b1
bn
as asserted.
For a Markov matrix A R(n, n), the power of A, Am , may or may not
approach a limiting matrix as m . If the limit of Am as m exists
and is some K R(n, n), then it is not hard to show that K is also a Markov
matrix, which is called the stable matrix of A and A is said to be a stable
247
Markov matrix. Theorem 8.10 says that a regular Markov matrix A is stable
and, at the same time, gives us a constructive method to find the stable matrix
of A.
Exercises
8.4.1 Let A R(n, n) be a stable Markov matrix and K R(n, n) the stable
matrix of A. Show that K is also a Markov matrix and satisfies AK =
KA = K.
8.4.2 Show that the matrix
0 1
A=
(8.4.21)
1 0
is a Markov matrix which is not regular. Is A stable?
8.4.3 Consider the Markov matrix
1 1 0
1 1
1
.
A=
1
2
2
2
0
(8.4.22)
1
3
m2
m1
m2
1)
2
1)
+
(2
+
(2
2
1
m
m2
m1
m2
,
A = m
2
2
2
1
3
m2
m1
m2
1) 2
1)
+ (2
+ (2
2
2
m = 1, 2, . . . .
(8.4.23)
Take m in (8.4.23) and verify your result obtained in (ii).
8.4.4 Let A R(n, n) be a Markov matrix. If At is also a Markov matrix, A
is said to be a doubly Markov matrix. Show that the stable matrix K of
a regular doubly Markov matrix is simply given by
1 1
1 .
.. ...
K=
(8.4.24)
.
n
1 1
8.4.5 Let A1 , . . . , Ak R(n, n) be k doubly Markov matrices. Show that their
product A = A1 Ak is also a doubly Markov matrix.
9
Excursion: Quantum mechanics in a nutshell
The content of this chapter may serve as yet another supplemental topic to meet
the needs and interests beyond those of a usual course curriculum. Here we
shall present an over-simplified, but hopefully totally transparent, description
of some of the fundamental ideas and concepts of quantum mechanics, using
a pure linear algebra formalism.
a1
n
.
=
.
u=
ai ei ,
.
an
i=1
b1
n
.
=
.
v=
bi ei ,
.
bn
(9.1.1)
i=1
n
a i bi ,
(9.1.2)
i=1
|u =
n
ai |ei .
249
(9.1.3)
i=1
n
a i ei |.
(9.1.4)
i=1
i, j = 1, . . . , n,
(9.1.5)
and the Hermitian scalar product of the vectors |u and |v assumes the form
u|v =
n
a i bi = v|u.
(9.1.6)
i=1
i = 1, . . . , n.
(9.1.7)
n
|ei ai ,
(9.1.8)
i=1
(9.1.9)
i=1
n
i=1
|ei ei | = I,
(9.1.10)
i=1
i=1
i=1
Thus (9.1.10) can be applied to both bra and ket vectors symmetrically and
what it expresses is simply the fact that |e1 , . . . , |en form an orthonormal
basis of Cn .
250
We may reexamine some familiar linear mappings under the new notation.
Let |u be a unit vector in Cn . Use P|u to denote the mapping that projects
n
C onto Span{|u} along (Span{|u}) . Then we have
P|u |v = u|v|u,
|v Cn .
(9.1.12)
Placing the scalar number u|v to the right-hand side of the above expression,
we see that the mapping P|u can be rewritten as
P|u = |uu|.
(9.1.13)
(9.1.14)
It is clear that
I=
n
(9.1.15)
P|ui ,
i=1
n
i |ui ui |,
(9.1.16)
i=1
a1
b1
.
.
n
(9.1.17)
|u =
.. , |v = .. C ,
an
bn
(9.1.18)
Consequently,
u|v = (|u) |v =
n
i=1
a i bi ,
(9.1.19)
251
a1
.
|uv| = |u(|v) =
.. (b1 , . . . , bn )
an
a1 b1
an b1
a1 bn
= (ai bj ).
n
0
|ei ei | =
0
i=1
(9.1.20)
an bn
1
0
..
.
.
0
(9.1.21)
(9.1.22)
Exercises
9.1.1 Consider the orthonormal basis of C2 consisting of the ket vectors
1+i
1 1+i
1
|u1 =
, |u2 =
.
(9.1.23)
2 1i
2 1 + i
Verify the identity
|u1 u1 | + |u2 u2 | = I2 .
9.1.2 Consider the Hermitian matrix
A=
3+i
3i
(9.1.24)
(9.1.25)
in C(2, 2).
(i) Find an orthonormal basis of C2 consisting of eigenvectors, say
|u1 , |u2 , associated with the eigenvalues, say 1 , 2 , of A.
(ii) Find the matrices that represent the orthogonal projections
P|u1 = |u1 u1 |,
(9.1.26)
252
(9.1.27)
(9.2.1)
|i |2 ,
if is an eigenvalue,
i =
(9.2.2)
P ({XA = }) =
0,
if is not an eigenvalue,
253
n
i |ui .
(9.2.3)
i=1
n
i |i |2 ,
(9.2.4)
i=1
when i
= ,
(9.2.5)
d
| = H |,
dt
(9.2.7)
254
(9.2.8)
1 2
P .
2m
(9.2.9)
In quantum mechanics in our context here, both P and V are taken to be Hermitian matrices.
Let |(0) = |0 be the initial state of the time-dependent state vector
|(t). Solving (9.2.7), we obtain
i
U (t) = e h tH .
|(t) = U (t)|0 ,
(9.2.10)
i
Since H is Hermitian, H is anti-Hermitian. Hence U (t) is unitary,
h
U (t)U (t) = I,
(9.2.11)
which ensures the conservation of the normality of the state vector |(t).
That is,
(t)|(t) = 0 |0 = 1.
(9.2.12)
n
0,i |ui ,
(9.2.13)
i=1
n
i=1
0,i e h tH |ui =
n
i=1
e h i t 0,i |ui =
n
i=1
(9.2.14)
255
where
i =
i
,
h
i = 1, . . . , n,
(9.2.15)
i = 1, . . . , n.
(9.2.16)
(9.2.17)
256
where
[H, A] = H A AH
(9.2.19)
(9.2.20)
A = i
0
i
3
1i
1+i
(9.2.21)
as an observable of a system.
(i) Find the eigenvalues of A as possible readings when measuring the
observable A.
(ii) Find the corresponding unit eigenvectors of A, say |u1 , |u2 , |u3 ,
which form an orthonormal basis of the state space C3 .
(iii) Assume that the state the system occupies is given by the vector
i
1
(9.2.22)
| = 2 + i
15
3
and resolve | in terms of |u1 , |u2 , |u3 .
(iv) If the system stays in the state |, determine the probability distribution function of the random variable XA , which is the random
value read each time when a measurement about A is made.
(v) If the system stays in the state |, evaluate the expected value, A,
of XA .
9.2.2 Consider the Schrdinger equation
d
ih | = H |,
dt
H =
(9.2.23)
257
(9.2.25)
where 1 is known as one of the Pauli matrices. Show that the commutator of H and H is
1 0
[H, H ] = 2i3 , 3 =
,
(9.2.26)
0 1
where 3 is another Pauli matrix, and use it and (9.2.18) to evaluate
the rate of change of the time-dependent expected value H (t) =
(t)|H |(t).
(iii) Establish the formula
(t)|H |(t) = 0 |H |0 + (t)|1 |(t)
(9.2.27)
d
and use it to verify the result regarding H (t) obtained in (ii)
dt
through the commutator identity (9.2.18).
n
(i A)2 |i |2
i=1
(9.3.1)
258
which is given in a form free of the choice of the basis {|ui }. Thus,
with (9.3.1), if A and B are two observables, then the Schwarz inequality
implies that
A2 B2 ||(A AI )|(B BI )||2 |c|2 ,
(9.3.2)
(9.3.3)
(9.3.4)
Therefore, we obtain
1
1
(c c) = [A, B].
2i
2i
Inserting (9.3.5) into (9.3.2), we arrive at the inequality
2
1
[A, B] ,
A2 B2
2i
(c) =
(9.3.5)
(9.3.6)
B| = B |,
A , B R.
(9.3.7)
If the system lies in the state |, then, with simultaneous full certainty, the
measured values of the observables A and B are A and B , respectively.
Let k 1 be an integer and define the kth moment of an observable A in the
state | by
Ak = |Ak |.
(9.3.8)
259
(9.3.9)
In probability theory, the radical root of the variance, A = A2 , is called
standard deviation. In quantum mechanics, A is also called uncertainty,
which measures the randomness of the observed values of the observable A.
It will be instructive to identify those states which will render the maximum
uncertainty. To simplify our discussion, we shall assume that A C(n, n) has
n distinct eigenvalues 1 , . . . , n . As before, we use |u1 , . . . , |un to denote
the corresponding eigenstates of A which form an orthonormal basis of the
state space Cn . In order to emphasize the dependence of the uncertainty on the
2
to denote the associated variance. We are
underlying state |, we use A,|
to solve the problem
2
max{A,|
| | = 1}.
(9.3.10)
n
i |ui .
(9.3.11)
i=1
n
2i |i |2 ,
A =
i=1
n
i |i |2 .
(9.3.12)
i=1
Hence, we have
2
A,|
n
i=1
2i |i |2
n
2
i |i |
(9.3.13)
i=1
n
2
n
max
2i xi2
i xi2
,
i=1
i=1
(9.3.14)
n
xi = 1.
i=1
260
Thus, using calculus, the maximum points are to be sought among the solutions
of the equations
n
j xj2 = 0, i = 1, . . . , n,
xi 2i 2i
(9.3.15)
j =1
(9.3.16)
i = 1, . . . , n.
(9.3.17)
(
A2 A2 = A A
(9.3.18)
(9.3.19)
in the nontrivial situation A > 0, we see that there are at least n 2 values of
i = 1, . . . , n such that
2i 2Ai (A2 2A2 )
= 0,
(9.3.20)
#
$2
max 2 x 2 + 2 x 2 1 x 2 + 2 x 2
,
1 1
2 2
1
2
(9.3.21)
x 2 + x 2 = 1.
1
2
Using the constraint in (9.3.21), we may simplify the objective function of the
problem (9.3.21) into the form
(1 2 )2 (x12 x14 ),
(9.3.22)
(9.3.23)
261
In this case, it is straightforward to check that 1 and 2 are indeed the two
roots of equation (9.3.19). In particular,
2
=
A,|
1
(1 2 )2 .
4
(9.3.24)
Consequently, if we use min and max to denote the smallest and largest eigenvalues of A, then we see in view of (9.3.24) that the maximum uncertainty is
given by
A,max =
1
(max min ),
2
(9.3.25)
which is achieved when the system occupies the maximum uncertainty state
|max = a|umax + b|umin ,
a, b C,
1
|a| = |b| = .
2
(9.3.26)
(9.3.27)
Exercises
9.3.1 Let A, B C(2, 2) be two observables given by
1 i
2 1 i
A=
, B=
.
i 2
1+i
3
Evaluate the quantities A2 , B2 , and [A, B] in the state
1
1
,
| =
2
5
(9.3.28)
(9.3.29)
(9.3.30)
i=1
262
1
2+i 0
A = 2 i 3 0 .
0
(9.3.31)
(9.4.2)
(9.4.3)
(9.4.4)
263
(9.4.5)
which indicates how an observable should evolve itself with respect to time.
Differentiating (9.4.5), we are led to the equation
i
d
A(t) = [H, A(t)],
(9.4.6)
dt
h
which spells out the dynamical law that a time-dependent observable must
follow and is known as the Heisenberg equation.
We next show that (9.4.6) implies (9.2.7) as well.
As preparation, we establish the following Gronwall inequality which is
useful in the study of differential equations: If f (t) and g(t) are continuous
non-negative functions in t 0 and satisfy
t
f ( )g( ) d, t 0,
(9.4.7)
f (t) a +
0
f (t) a exp
g( ) d ,
t 0.
(9.4.8)
t 0,
(9.4.9)
t 0,
(9.4.10)
and set
h(t) = a + +
f ( )g( ) d,
0
t 0.
(9.4.11)
g( ) d ,
t 0.
(9.4.12)
However, (9.4.10) and (9.4.11) indicate that f (t) < h(t) (t 0). Hence we
may use (9.4.12) to get
t
f (t) < (a + ) exp
g( ) d , t 0.
(9.4.13)
0
264
We now turn our attention back to the Heisenberg equation (9.4.6). Suppose
that B(t) is another solution of the equation. Then C(t) = A(t)B(t) satisfies
d
i
C(t) = [H, C(t)],
dt
h
Therefore, we have
2
H C(t).
h
On the other hand, we may use the triangle inequality to get
C (t)
h > 0,
(9.4.14)
(9.4.15)
(9.4.16)
C(t) C(0)e h H t ,
t 0.
(9.4.19)
The same argument may be carried out in the domain t 0 with the time
flipping t # t. Thus, in summary, we obtain the collective conclusion
2
t R.
(9.4.20)
(9.4.21)
265
holds for any Hermitian matrix A C(n, n) for some |u, |v Cn and we
investigate how |u and |v are related.
If |v = 0, then u|A|u = 0 for any Hermitian matrix A, which implies
|u = 0 as well. Thus, in the following, we only consider the nontrivial situation |u
= 0, |v
= 0.
If v
= 0, we set V = Span{|v} and W = V . Choose a Hermitian matrix
A so that |x # A|x (|x Cn ) defines the projection of Cn onto V along
W . Write |u = a|v + |w, where a C and |w W . Then A|u = a|v.
Thus
u|A|u = |a|2 v|v = |a|2 v|A|v,
(9.4.22)
which leads to
|a| = 1
or
a = ei ,
R.
(9.4.23)
(9.4.24)
which gives us the result w = 0. In other words, that (9.4.21) holds for any
Hermitian matrix A C(n, n) implies that |u and |v differ from each other
by a phase factor a satisfying (9.4.23). That is,
|u = ei |v,
R.
(9.4.25)
(9.4.26)
where
i
|(t) = e h tH |0 .
(9.4.27)
|(t) = ei(t) e h tH |0 ,
(9.4.28)
266
Exercises
9.4.1 Consider the Heisenberg equation (9.4.6) subject to the initial condition
A(0) = A0 . An integration of (9.4.6) gives
i t
A(t) = A0 +
[H, A( )] d.
(9.4.29)
h 0
(i) From (9.4.29) derive the result
[H, A(t)] = [H, A0 ] +
i
h
(9.4.30)
(ii) Use (9.4.30) and the Gronwall inequality to come up with an alternative proof that H and A(t) commute for all time if and only if
they do so initially.
9.4.2 Consider the Hamiltonian H and an observable A given by
2 i
1
1i
H =
, A=
.
i 2
1+i
1
(9.4.31)
(i) Solve the Schrdinger equation (9.2.7) to obtain the timedependent state |(t) evolving from the initial state
1
1
.
(9.4.32)
|0 =
i
2
(ii) Evaluate the expected value of the observable A assuming that the
system lies in the state |(t).
(iii) Solve the Heisenberg equation (9.4.6) with the initial condition
A(0) = A and use it to evaluate the expected value of the same
observable within the Heisenberg picture. That is, compute the
quantity 0 |A(t)|0 . Compare the result with that obtained in (ii)
and explain.
Section 1.1
1.1.2 Let n satisfy 1 n < p and write 2n = kp + l for some integers k and
l where 0 l < p. Then n + (n l) = kp. Thus [n l] = [n].
If [n]
= [0], then n is not a multiple of p. Since p is a prime, so the
greatest common divisor of n and p is 1. Thus there are integers k, l such
that
kp + ln = 1.
(S1)
268
Section 1.2
1.2.7 We have
a1
.
ut v =
.. (b1 , . . . , bn ) =
an
a1 b1
a2 b1
an b1
a1 b2
a2 b2
an b2
a1 bn
a2 bn
,
an bn
aj (b1 , b2 , . . . , bn ),
(S3)
w2 = 2u + v,
...,
wm+1 = (m + 1)u + v.
269
Section 1.4
1.4.3 Without loss of generality assume f
= 0. Then there is some u U
such that f (u)
= 0. For any v U consider
w=v
f (v)
u.
f (u)
g(u)
f (v) af (v),
f (u)
v U.
That is, g = af .
1.4.4 Let {v1 , . . . , vn1 } be a basis of V . Extend it to get a basis of Fn , say
{v1 , . . . , vn1 , vn }. Let {f1 , . . . , fn1 , fn } be a basis of (Fn ) dual to
{v1 , . . . , vn1 , vn }. It is clear that V 0 = Span{fn }. On the other hand,
for any (x1 , . . . , xn ) Fn we have
v = (x1 , x2 , . . . , xn ) (x1 + x2 + + xn , 0, . . . , 0) V .
So
0 = fn (v) = fn (x1 , . . . , xn ) (x1 + + xn )fn (e1 ),
(x1 , . . . , xn ) Fn .
(S4)
n
xi ,
(x1 , . . . , xn ) Fn .
i=1
Section 1.7
1.7.2 Assume the nontrivial situation u
= 0. It is clear that up u for
any p 1. Thus lim sup up u .
p
270
1.7.3
1.7.4
(i) Positivity: Of course u 0. If u = 0, then |u (u)| = 0 for all
u =
1. Then for any v U , v
= 0, we
u U satisfying
1
have v (u) = v
v (u) = 0, which shows v (u) = 0 for
v
any v U . Hence u = 0.
(ii) Homogeneity: Let a F and u U . We have
au = sup{|u (au)| | u U , u = 1}
= |a| sup{|u (u)| | u U , u = 1} = |a|u.
(iii) Triangle inequality: For u, v U , we have
u + v = sup{|u (u + v)| | u U , u = 1}
sup{|u (u)| + |u (v)| | u U , u = 1}
sup{|u (u)| | u U , u = 1}
+ sup{|v (v)| | v U , v = 1}
= u + v.
Section 2.1
2.1.7 (i) Since f, g
= 0 we have dim(f 0 ) = dim(g 0 ) = n 1. Let
{u1 , . . . , un1 } be a basis of f 0 and take v g 0 but v
f 0 .
Then u1 , . . . , un1 , v are linearly independent, and hence, form a
basis of U . In particular, U = f 0 + g 0 .
271
272
k
yi vi +
l
zj w j ,
j =1
i=1
(y1 , . . . , yk , z1 , . . . , zl ) Fk+l .
Then N(T ) = S. From the rank equation we have n(T ) + r(T ) = k + l
or dim(S) + r(T ) = k + l. On the other hand, it is clear that R(T ) =
V + W . So r(T ) = dim(V + W ). From the dimensionality equation
(1.5.6) we have dim(V + W ) = dim(V ) + dim(W ) dim(V W ), or
r(T ) + dim(V W ) = k + l. Therefore dim(S) = dim(V W ).
Section 2.2
2.2.4 Define T L(R2 ) by setting
a b
T (x) =
x,
c d
x=
Then we have
T (e1 ) = ae1 + ce2 ,
T (e2 ) = be1 + de2 ,
x1
x2
R2 .
273
and
1
1
(a + b + c + d)u1 + (a + b c d)u2 ,
2
2
1
1
T (u2 ) = (a b + c d)u1 + (a b c + d)u2 ,
2
2
so that the problem follows.
2.2.5 Let A = (aij ) and define T L(Fn ) by setting T (x) = Ax where
x Fn is taken to be a column vector. Then
n
aij ei , j = 1, . . . , n.
T (ej ) =
T (u1 ) =
i=1
f1 = en ,
f2 = en1 ,
...,
fn = e1 ,
n
ani+1,nj +1 fi
i=1
n
bij fi ,
j = 1, . . . , n.
i=1
...,
T ([uk ]) = [vk ]Y .
274
Section 2.5
2.5.12 (i) Assume R(S) = R(T ). Since S projects U onto R(S) = R(T )
along N (S), we have for any u U the result S(T (u)) = T (u).
Thus S T = T . Similarly, T S = S.
Furthermore, from S T = T , then T (U ) = (S T )(U ) implies
R(T ) R(S). Likewise, T S = S implies R(S) R(T ). So
R(S) = R(T ).
(ii) Assume N (S) = N (T ). For any u U , rewrite u as u = v + w
with v R(T ) and w N (T ). Then T (v) = v, T (w) = 0, and
S(w) = 0 give us
(S T )(u) = S(T (v) + T (w)) = S(v) = S(v + w) = S(u).
Hence S T = S. Similarly, T S = T .
Assume S T = S, T S = T . Let u N (T ). Then S(u) =
S(T (u)) = 0. So u N (S). So N (T ) N (S). Interchange S
and T . We get N (S) N (T ).
2.5.14 (i) We have
T 2 (u) = T (u, u1 u1 + u, u2 u2 )
= u, u1 T (u1 ) + u, u2 T (u2 )
= u, u1 (u1 , u1 u1 + u1 , u2 u2 )
+ u, u2 (u2 , u1 u1 + u2 , u2 u2 )
= (u, u1 u1 , u1 + u, u2 u2 , u1 )u1
+ (u, u1 u1 , u2 + u, u2 u2 , u2 )u2
= T (u) = u, u1 u1 + u, u2 u2 .
(S6)
275
u2 , u1 = 0,
u1 , u2 = 0,
u2 , u2 = 1.
(S7)
u U.
1
1
(aI T )(u) + T (u) v + w.
a
a
(S8)
1
(aI T )(u) = 0,
a
1
T (u) = 0.
(aI T )(w) = (aI T )
a
T (v) = T
276
a0 , a1 , . . . , ak1 F.
(S9)
277
(i) Let C be a closed curve around but away from the origin of R2 and
consider
u : C S1,
u(x, y) =
f (x, y)
,
f (x, y)
(x, y) C.
g(x, y)
1
= 2 (a 2 x 2 b2 y 2 , 2abxy)
g(x, y)
R
= (cos2 sin2 , 2 cos sin ).
278
2
u u
1
u
d dr
4 0
r
0
n 0
sin f df = n,
2
which indicates that the map covers the 2-sphere, while preserving the
orientation, n times.
Section 3.2
3.2.6 Consider the matrix C = aA + bB. The entries of C not in the j th
column are the corresponding entries of A multiplied by (a + b) but the
j th column of C is equal to the sum of the a-multiple of the j th column
of A and b-multiple of the j th column of B. So by the properties of
determinants, we have
.
..
an1
= (a + b)
n1
aanj + bbnj
(a det(A) + b det(B)).
ann
279
3.2.7 Let the column vectors in A(t) be denoted by A1 (t), . . . , An (t). Then
det(A(t + h)) det(A(t))
= |A1 (t + h), A2 (t + h), . . . , An (t + h)|
|A1 (t), A2 (t), . . . , An (t)|
= |A1 (t + h), A2 (t + h), . . . , An (t + h)|
|A1 (t), A2 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t + h), A3 (t + h) . . . , An (t + h)|
|A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|
+ + |A1 (t), A2 (t), . . . , An (t + h)|
|A1 (t), A2 (t), . . . , An (t)|
= |A1 (t + h) A1 (t), A2 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t + h) A2 (t), A3 (t + h), . . . , An (t + h)|
+ + |A1 (t), A2 (t), . . . , An (t + h) An (t)|.
Dividing the above by h
= 0, we get
1
(det(A(t + h)) det(A(t)))
h
1
= (A1 (t + h) A1 (t)), A2 (t + h), . . . , An (t + h)
h
1
+ A1 (t), (A2 (t + h) A2 (t)), . . . , An (t + h)
h
1
+ + A1 (t), A2 (t), . . . , (An (t + h) An (t)) .
h
Now taking the h 0 limits on both sides of the above, we arrive at
d
det(A(t)) = |A1 (t), A2 (t), . . . , An (t)|
dt
+ |A1 (t), A2 (t), . . . , An (t)|
+ + |A1 (t), A2 (t), . . . , An (t)|.
Finally, expanding the j th determinant along the j th column on the
right-hand side of the above by the cofactors, j = 1, 2, . . . , n,
we have
280
d
det(A(t)) =
dt
n
ai1
(t)Ci1 (t) +
i=1
+ +
n
ai2
(t)Ci2 (t)
i=1
n
ain
(t)Cin (t).
i=1
3.2.8 Adding all columns to the first column of the matrix, we get
n
x+
ai a1 a2 an
i=1
n
x+
ai x a2 an
i=1
n
x + a a
x an
i
2
i=1
..
..
.. . .
..
. .
.
.
.
n
x+
ai a2 a3 x
i=1
1 a1 a2 an
1 x a a
2
n
n
ai 1 a2 x an .
= x+
. .
.. . .
..
i=1
. .
.
. .
.
.
1 a a x
2
3
Consider the (n + 1) (n + 1) determinant on the right-hand side of
the above. We now subtract row n from row n + 1, row n 1 from
row n,..., row 2 from row 3, row 1 from row 2. Then that determinant
becomes
1
a2
an
a1
0 xa
0
0
1
0
a
2
a
.
.
..
..
..
..
.
.
.
.
.
.
0
x a
n
Since the minor of the entry at the position (1, 1) is the determinant of
n
'
(x ai ).
a lower triangular matrix, we get a =
i=1
281
3.2.9 If c1 , . . . , cn+2 are not all distinct then it is clear that the determinant is
zero since there are two identical columns. Assume now c1 , . . . , cn+2
are distinct and consider the function of x given by
p(x) = det
p1 (x)
p1 (c2 )
p1 (cn+2 )
p2 (x)
..
.
p2 (c2 )
..
.
..
.
p2 (cn+2 )
..
.
pn+2 (c2 )
pn+2 (x)
pn+2 (cn+2 )
By the cofactor expansion along the first column, we see that p(x)
is a polynomial of degree at most n which vanishes at n + 1 points:
c2 , . . . , cn+2 . Hence p(x) = 0 for all x. In particular, p(c1 ) = 0 as well.
3.2.10 We use Dn+1 to denote the determinant and implement induction to do
the computation. When n = 1, we have D2 = a1 x + a0 . At n 1,
we assume Dn = an1 x n1 + + a1 x + a0 . At n, by the cofactor
expansion according to the first column, we get
Dn+1 = xDn + (1)n+2 a0 (1)n
= x(an x n1 + + a2 x + a1 ) + a0 .
3.2.11 Use D() to denote the determinant on the left-hand side of (3.2.46). It
is clear that D(0) = 0. So (3.2.46) is true at = 0.
Now assume
= 0 and rewrite D() as
D() = det
a1
a2
an
1
..
.
a1
..
.
a2
..
.
..
.
an
..
.
a1
a2
an
Adding the (a1 ) multiple of column 1 to column 2, the (a2 ) multiple of column 1 to column 3,..., and the (an ) multiple of column 1
to column n + 1, we see that D() becomes
282
D() = det
1 a1
a2
an
1
..
.
0
..
.
..
.
..
.
0
..
.
a1
1
1 1
0
= n det
1
.
..
.
.
.
1
a2
an
0
1
..
.
..
.
0
..
.
1
ai
1
i=1
n
D() = det
..
=
1
1
n
i=1
a2
0
..
.
1
..
.
..
.
an
..
a1
ai (1) = (1)
n
n n1
n
ai .
i=1
283
(S10)
1j n
Then |xi | > 0. On the other hand, the ith component of the equation
Ax = 0 reads
ai1 x1 + + aii xi + + ain xn = 0.
(S11)
1j n,j
=i
|xi | |aii |
|aij | > 0,
1j n,j
=i
which is a contradiction.
3.2.15 Consider the modified matrix
A(t) = D + t (A D),
0 t 1,
a1 bn
a2 b1
1 a2 b2
a2 bn
t
det(In ) =
,
..
..
..
..
.
.
.
.
an b1
an b2
1 an bn
we artificially enlarge it into an (n + 1) (n + 1) determinant of
the form
284
b1
b2
1 a1 b1
a1 b2
0
..
.
a2 b1
..
.
1 a2 b2
..
.
..
.
an b1
an b2
a1 bn
a2 bn .
..
.
1 an bn
bn
0
i i
i=1
n
1 0 0
a1
=1
det(In t ) =
ai bi .
0 1 0
a2
i=1
..
.. .. . .
.
. ..
.
. .
a
0 0 1
n
n
2
x
0
0
0
100
i=1
1 0 0
x1
f (x1 , . . . , xn ) = det
0 1 0
x2
..
.. .. . .
..
. .
.
. .
xn
= 100
n
i=1
xi2 .
0 0
285
(S12)
...,
adj(A)An = 0,
(S13)
286
3.3.7 For A = (aij ), we use MijA and CijA to denote the minor and cofactor of
the entry aij , respectively. Then we have
t
MijA = MjAi ,
(S14)
n
aij2 ,
i = 1, . . . , n.
j =1
i=1
287
3.4.2 It suffices to show that M = C(n, n) \ D is closed in C(n, n). Let {Ak }
be a sequence in M which converges to some A C(n, n). We need to
prove that A M. To this end, consider pAk () and let {k } be multiple
roots of pAk (). Then we have
pAk (k ) = 0,
pA
(k ) = 0,
k
k = 1, 2, . . . .
(S15)
On the other hand, we know that the coefficients of pA () are continuously dependent on the entries of A. So the coefficients of pAk ()
converge to those of pA (), respectively. In particular, the coefficients
k
}, . . . , {a1k }, {a0k }, are bounded sequences. Thus
of pAk (), say {an1
|k |
n
|nk
pAk (k )| + |pAk (k )|
n1
|aik ||k |i ,
i=0
which indicates that {k } is a bounded sequence in C. Passing to a subsequence if necessary, we may assume without loss of generality that
k some 0 C as k . Letting k in (S15), we obtain
(0 ) = 0. Hence 0 is a multiple root of pA () which
pA (0 ) = 0, pA
proves A M.
3.4.4 Using (3.4.31), we have
adj(In A) = An1 n1 + + A1 + A0 ,
where the matrices An1 , An2 , . . . , A1 , A0 are determined through
(3.4.32). Hence, setting = 0 in the above, we get
adj(A) = A0 = a1 In + a2 A + + an1 An2 + An1 .
That is,
adj(A) = (1)n1 (a1 In + a2 A+ +an1 An2 + An1 ).
(S16)
Note that (S16) may be used to prove some known facts more
easily. For example, the relation adj(At ) = (adj(A))t (see Exercise
3.3.7) follows immediately.
3.4.6 First assume that A is invertible. Then
pAB () = det(In AB) = det(A[A1 B])
= det(A) det(A1 B) = det([A1 B]A)
= det(In BA) = pBA ().
288
k = 1, 2, . . . .
In
In
In
In
In
In
In
In
=
=
In AB
In
B
A
In
In BA
,
.
The determinants of the left-hand sides of the above are the same. The
determinants of the right-hand sides of the above are
(1)n n det(In AB),
b1
b2
a1
a2
..
.
0
..
.
..
.
..
.
an
bn
0
0 .
..
.
Assume
= 0. Subtracting the b1 / multiple of row 2 from row 1,
b2 / multiple of row 3 from row 1,..., and bn / multiple of the last row
from row 1, we obtain
1
ai bi
289
a1
a2
..
.
0
..
.
..
.
..
.
0
..
.
i=1
0 0
n
1
ai bi n = n1 ( t ).
= 1
an
i=1
Since both sides of the above also agree at = 0, they are identical for
all .
3.4.8 (i) We have pA () = 3 22 + 2. From pA (A) = 0 we have
A(A2 2A In ) = 2In . So
1 1 0
1 2
1
0
0
.
A = (A 2A In ) =
2
2
3
1
2
2
(ii) We divide 10 by pA () to find
10 =
pA ()(7 + 27 + 55 + 104 + 213 + 422 + 85 + 170)
+ 3412 340.
Consequently, inserting A in the above, we have
1 2046
10
2
A = 341A 340In = 0 1024
0
1705
0 .
1
Section 4.1
4.1.8 Let u1 and u2 be linearly independent vectors in U and assume
(u1 , u1 )
= 0 and (u2 , u2 )
= 0 otherwise there is nothing to show. Set
u3 = a1 u1 + u2 where a1 = (u1 , u2 )/(u1 , u1 ). Then u3
= 0 and is
perpendicular to u1 . So u1 , u3 are linearly independent as well. Consider
u = cu1 +u3 (c C). Of course u
= 0 for any c. Since (u1 , u3 ) = 0, we
290
Section 4.2
4.2.6 We first show that the definition has no ambiguity. For this purpose,
assume [u1 ] = [u] and [v1 ] = [v]. We need to show that (u1 , v1 ) =
(u, v). In fact, since u1 [u] and v1 [v], we know that u1 u U0
and v1 v U0 . Hence there are x, y U0 such that u1 u = x, v1
v = y. So (u1 , v1 ) = (u + x, v + y) = (u, v) + (u, y) + (x, v + y) =
(u, v).
It is obvious that ([u], [v]) is bilinear and symmetric.
To show that the scalar product is non-degenerate, we assume [u]
U/U0 satisfies ([u], [v]) = 0 for any [v] U/U0 . Thus (u, v) = 0 for
all v U which implies u U0 . In other words, [u] = [0].
4.2.8 Let w V . Then (v, w) = 0 for any v V . Hence 0 = (v, w) =
v, (w), v V , which implies (w) V 0 . Hence (V ) V 0 .
On the other hand, take w V 0 . Then v, w = 0 for all v V .
Since : U U is an isomorphism, there is a unique w U such
that (w) = w . Thus (v, w) = v, (w) = v, w = 0, v V . That
is, w V . Thus w (V ), which proves V 0 (V ).
4.2.10 Recall (4.1.26). Replace V , W in (4.1.26) by V , W and use
(V ) = V , (W ) = W . We get (V + W ) = V W. Taking
2(u, w)2
= 0,
(w, w)
291
Section 4.3
4.3.1 We consider the real case for simplicity. The complex case is similar.
Suppose M is nonsingular and consider a1 u1 + + ak uk = 0, where
a1 , . . . , ak are scalars. Taking scalar products of this equation with the
vectors u1 , . . . , uk consecutively, we get
a1 (u1 , ui ) + + ak (uk , ui ) = 0,
i = 1, . . . , k.
i=1
...,
ak
i=1
k
ai ui , uk
=0
i=1
k
k
ai ui ,
aj uj = 0.
i=1
j =1
k
i=1
(u1 , u1 )
(u1 , u)
.
..
Span
..
.
(uk , u)
(uk , u1 )
i = 1, . . . , k,
(u1 , uk )
..
,...,
.
.
(uk , uk )
292
0
(u1 , u)
.
..
= .
.
.
0
(uk , u)
(u1 , u1 )
..
Span
.
(uk , u1 )
(u1 , uk )
..
,...,
,
.
(uk , uk )
but u
Span{u1 , . . . , uk }.
4.3.4 For any u U , we have
(I S)u2 = ((I S)u, (I S)u)
= u2 + Su2 (u, Su) (Su, u)
= u2 + Su2 (S u, u) (Su, u)
= u2 + Su2 (Su, u) (Su, u)
= u2 + Su2 .
So (I S)u
= 0 whenever u
= 0. Thus n(I S) = 0 and I S must
be invertible.
4.3.6 Set R(A) = {u Cm | u = Ax for some x Cn }. Rewrite Cm as Cm =
R(A) (R(A)) and assert (R(A)) = {y Cm | A y = 0} = N (A ).
In fact, if A y = 0, then (y, Ax) = y Ax = (x A y) = 0 for any
x Cn . That is, N (A ) R(A) . On the other hand, take z R(A) .
Then for any x Cn we have 0 = (z, Ax) = z Ax = (A z) x =
(A z, x), which indicates A z = 0. So z N (A ). Thus R(A)
N(A ).
Therefore the equation Ax = b has a solution if and only if b R(A)
or, if and only if b is perpendicular to R(A) = N (A ).
Section 4.4
4.4.2 It is clear that P L(U ). Let {w1 , . . . , wl } be an orthogonal basis of
W = V . Then {v1 , . . . , vk , w1 , . . . , wl } is an orthogonal basis of U .
Now it is direct to check that P (vj ) = vj , j = 1, . . . , k, P (ws ) =
0, s = 1, . . . , l. So P 2 (vj ) = P (vj ), j = 1, . . . , k, P 2 (ws ) =
P (ws ), s = 1, . . . , l. Hence P 2 = P , R(P ) = V , and N (P ) = W
as claimed. In other words, P : U U is the projection of U onto V
along W .
293
4.4.4 (i) We know that we have the direct sum U = V V . For any x, y [u]
we have x = v1 + w1 , y = v2 + w2 , vi V , wi V , i = 1, 2. So
x y = (v1 v2 ) + (w1 w2 ). However, since x y V , we have
w1 = w2 . In other words, for any x [u], there is a unique w V
so that x = v + w for some v V . Of course, [x] = [w] = [u] and
x2 = v2 + w2 whose minimum is attained at v = 0. This proves
[u] = w.
(ii) So we need to find the unique w V such that [w] = [u]. First
find an orthogonal basis {v1 , . . . , vk } of V . Then for u we know that the
projection of u onto V along V is given by the Fourier expansion (see
Exercise 4.4.2)
v=
k
(vi , u)
vi .
(vi , vi )
i=1
k
(vi , u)
vi .
(vi , vi )
i=1
Section 4.5
4.5.5 (i) It is direct to check that T (x) = x for x Rn .
(ii) Since T (T (x)) = x for x Rn , we have T 2 = I . Consequently,
T = T . So if is an eigenvalue of T then 2 = 1. Thus the eigenvalues of T may only be 1. In fact, for u = (1, 0, . . . , 0, 1)t and
v = (1, 0, . . . , 0, 1)t , we have T (u) = u and T (v) = v. So 1
are both eigenvalues of T .
(iii) Let x, y Rn satisfy T (x) = x and T (y) = y. Then (x, y) =
(T (x), T (y)) = (x, y) = (x, y). So (x, y) = 0.
(iv) Since T 2 I = 0 and T
= I , we see that the minimal polynomial of T is mT () = 2 1.
Section 5.2
5.2.7 We can write A as A = P t DP where P is orthogonal and
D = diag{d1 , . . . , dn },
di = 1,
i = 1, . . . , n.
294
ut Q = ut Q = ut (u1 , u2 , . . . , un )
= (ut u1 , ut u2 , . . . , ut un ) = (u, 0, . . . , 0).
Therefore
Qt (uut )Q = (ut Q)t (ut Q)
u
0
Section 5.3
5.3.8 For definiteness, assume m < n. So r(A) = m. Use to denote the
standard norm of Rl . Now check
q(x) = x t (At A)x = (Ax)t (Ax) = Ax2 0,
x Rn ,
y Rm .
x1
x2
295
R2 x12
x22
=0 ,
n
i yi2 0 y2 ,
i=1
n
i yi cy2 =
i=1
n
(i c)yi2 0.
i=1
296
n
'
(1 + i )
i=1
det(A)(1 + 1 n )
= det(A) + det(A) det(C).
However, recall the relation between B and C we get B = P t CP
which gives us det(B) = det(P )2 det(C) = det(A) det(C) so that the
desired inequality is established.
If the inequality becomes an equality, then all 1 , . . . , n vanish. So
C = 0 which indicates that B = 0.
5.3.17 Since A is positive definite, there is an invertible matrix P R(n, n)
such that P t AP = In . Hence we have
(det(P ))2 det(A B) = det(P t (( 1)A (B A))P )
= det(( 1)In P t (B A)P ).
Since P t (B A)P is positive semi-definite whose eigenvalues are all
non-negative, we see that the roots of the equation det(A B) = 0
must satisfy 1 0.
5.3.18 (i) Let x U be a minimum point of f . Then for any y U we have
g() g(0), where g() = f (x + y) ( R). Thus
1
1
dg
= (y, T (x)) + (x, T (y)) (y, b)
0=
d =0
2
2
= (y, T (x) b).
Since y U is arbitrary, we get T (x)b = 0 and (5.3.25) follows.
Conversely, if x U satisfies (5.3.25), then
1
1
(u, T (u)) (u, T (x)) (x, T (x)) + (x, T (x))
2
2
1
= (u x, T (u x)) 0 u x2 , u U,
2
(S17)
f (u) f (x) =
for some constant 0 > 0. That is, f (u) f (x) for all u U .
297
(ii) If x, y U solve (5.3.25), then (i) indicates that x, y are the minimum points of f . Hence f (x) = f (y). Replacing u by y in (S17)
we arrive at x = y.
5.3.19 Let S L(U ) be positive semi-definite so that S 2 = T . Then q(u) =
(S(u), S(u)) = S(u)2 for any u U . Hence the Schwarz inequality
(4.3.10) gives us
q(u + v) = (S(u + v), S(u + v))
= 2 S(u)2 + 2(S(u), S(v)) + 2 S(v)2
2 S(u)2 + 2S(u)S(v) + 2 S(v)2
2 S(u)2 + (S(u)2 + S(v)2 )
+ 2 S(v)2
= ( + )S(u)2 + ( + )S(v)2
= q(u) + q(v),
u, v U,
n1
i=1
x1
.
n
x=
.. R .
xn
n
yi2 with
i=1
yi = xi + ai xi+1 ,
i = 1, . . . , n 1,
yn = xn + an x1 .
(S18)
It is seen that q(x) is positive definite if and only if the change of variables given in (S18) is invertible, which is equivalent to the condition
1 + (1)n+1 a1 a2 an
= 0.
5.4.3 If A is positive semi-definite, then A + In is positive definite for all
> 0. Hence all the leading principal minors of A + In are positive
298
299
a0 , a1 , . . . , an1 F. (S19)
Then
)ui ,
R(ui ) = (a0 + a1 i + + an1 n1
i
i = 1, . . . , n.
i = 1, . . . , n,
(S20)
c0 , c1 , . . . , cn1 F.
i = 1, . . . , n,
300
Section 6.2
6.2.5 Since P 2 = P , we know that P projects U onto R(P ) along N (P ). If
N(P ) = R(P ) , we show that P = P . To this end, for any u1 , u2
U we rewrite them as u1 = v1 + w1 , u2 = v2 + w2 where
v1 , v2 R(P ), w1 , w2 R(P ) . Then we have P (v1 ) = v1 , P (v2 ) =
v2 , P (w1 ) = P (w2 ) = 0. Hence (u1 , P (u2 )) = (v1 + w1 , v2 ) =
(v1 , v2 ), (P (u1 ), u2 ) = (v1 , v2 + w2 ) = (v1 , v2 ). This proves
(u1 , P (u2 )) = (P (u1 ), u2 ), which indicates P = P . Conversely, we
note that for any T L(U ) there holds R(T ) = N (T ) (cf. Exercise
5.6.3). If P = P , we have R(P ) = N (P ).
6.2.8 (i) If T k = 0 for some k 1 then eigenvalues of T vanish. Let
{u1 , . . . , un } be a basis of U consisting of eigenvectors of T . Then
T (ui ) = 0 for all i = 1, . . . , n. Hence T = 0.
(ii) If k = 1, then there is nothing to show. Assume that the statement
is true up to k = l 1. We show that the statement is true at k = l +1. If
l = odd, then k = 2m for some integer m 1. Hence T 2m (u) = 0 gives
us (u, T 2m (u)) = (T m (u), T m u) = 0. So T m (u) = 0. Since m l, we
deduce T (u) = 0. If l = even, then l = 2m for some integer m 1 and
T 2m+1 (u) = 0 gives us T 2m (v) = 0 where v = T (u). Hence T (v) = 0.
That is, T 2 (u) = 0. So it follows again that T (u) = 0.
Section 6.3
6.3.11 Since A A is positive definite, there is a positive definite Hermitian
matrix B C(n, n) such that A A = B 2 . Thus we can rewrite A as
A = P B with P = (A )1 B which may be checked to satisfy
P P = (A )1 BB A1 = (A )1 B 2 A1 = (A )1 A AA1 = In .
301
A
( A = B(. So if 1 , . . . , n > 0 are the eigenvalues of A A then
1 , . . . , n are those of B. Let Q C(n, n) be unitary such that
B = Q DQ, where D is given in (6.3.26). Then A = CQ DQ. Let
P = CQ . We see that P is unitary and the problem follows.
6.3.13 Since T is positive definite, (u, v)T (u, T (v)), u, v U , also defines a positive definite scalar product over U . So the problem follows
from the Schwarz inequality stated in terms of this new scalar product.
Section 6.4
6.4.4 Assume that A = (aij ) is upper triangular. Then we see that the diagonal entries of A A and AA are
|a11 |2 , . . . ,
i
|aj i |2 , . . . ,
j =1
n
|a1j |2 , . . . ,
n
|aj n |2 ,
j =1
n
|aij |2 , . . . , |ann |2 ,
j =i
j =1
(S21)
Then T = S R = R S . So T T = T T and T is normal. Now assume that T is normal. Then U has an orthonormal basis,
say {u1 , . . . , un }, consisting of eigenvectors of T . Set T (ui ) = i ui
(i = 1, . . . , n). Make polar decompositions of these eigenvalues,
i = |i |eii , i = 1, . . . , n, with the convention that i = 0 if i = 0.
Define R, S L(U ) by setting
R(ui ) = |i |ui ,
S(ui ) = eii ui ,
i = 1, . . . , n.
302
i = 1, . . . , k.
303
6.5.8 Note that some special forms of this problem have appeared as Exercises 6.3.11 and 6.3.12. By the singular value decomposition for A we
may rewrite A as A = P Q where Q, P C(n, n) are unitary and
R(n, n) is diagonal whose diagonal entries are all nonnegative.
Alternatively, we also have A = (P Q)(Q Q) = (P P )(P Q), as
products of a unitary and positive semi-definite matrices, expressed in
two different orders.
Section 7.1
7.1.4 Since g
= 0, we also have f
= 0. Assume f n |g n . We show that f |g. If
n = 1, there is nothing to do. Assume n 2. Let h = gcd(f, g). Then
we have f = hp, g = hq, p, q P, and gcd(p, q) = 1. If p is a scalar,
then f |g. Suppose otherwise that p is not a scalar. We rewrite f n and g n
as f n = hn pn and g n = hn q n . Since f n |g n , we have g n = f n r, where
r P. Hence hn q n = hn pn r. Therefore q n = pn r. In particular, p|q n .
However, since gcd(p, q) = 1, we have p|q n1 . Arguing repeatedly we
arrive at p|q, which is a contradiction.
7.1.5 Let h = gcd(f, g). Then f = hp, g = hq, and gcd(p, q) = 1.
Thus f n = hn pn , g n = hn q n , and gcd(p n , q n ) = 1, which implies
gcd(f n , g n ) = hn .
Section 7.2
7.2.1 If pS () are pT () are relatively prime, then there are polynomials f, g
such that f ()pS () + g()pT () = 1. Thus, I = f (T )pS (T ) +
g(T )pT (T ) = f (T )pS (T ) which implies pS (T ) is invertible and
pS (T )1 = f (T ). Similarly we see that pT (S) is also invertible.
7.2.2 Using the notation of the previous exercise, we have I = f (T )pS (T ).
Thus, applying R S = T R, we get
R = f (T )pS (T ) R = R f (S)pS (S) = 0.
7.2.3 It is clear that N (T ) R(I T ). It is also clear that R(I T )
N(T ) when T 2 = T . So N (T ) = R(I T ) if T 2 = T . Thus (7.2.14)
follows from the rank equation r(T ) + n(T ) = dim(U ). Conversely,
assume (7.2.14) holds. From this and the rank equation again we get
r(I T ) = n(T ). So N (T ) = R(I T ) which establishes T 2 = T .
7.2.4 We have f1 g1 + f2 g2 = 1 for some polynomials f1 , f2 . So
I = f1 (T )g1 (T ) + f2 (T )g2 (T ).
(S22)
304
W = N (T1 ) N (Tk1 ).
(S23)
n1
n1
(u) = a1 n1
1 u1 + + an n un .
305
a1 (x1 + 1 x2 + + n1
1 xn ) = 0,
a (x + x + + n1 x ) = 0,
2 1
2 2
n
2
n1
an (x1 + n x2 + + n xn ) = 0.
(S24)
(S25)
306
7.4.11
7.4.12
7.4.17
7.4.18
ui , vi Rn ,
i = 1, . . . , m.
(S26)
ai ui +
i=1
m
bi vi = 0,
307
a1 , . . . , am , b1 , . . . , bm R. (S27)
i=1
1
(wi + w i ),
2
vi =
1
(wi wi ),
2i
i = 1, . . . , m.
(S28)
i=1
308
u U,
(S29)
v V.
(S30)
309
n
aij aj ,
i = 1, . . . , n.
(S31)
j =1
al = max{ai | i = 1, . . . , n}.
n
akj ,
ral al
j =1
n
alj .
j =1
Since all the row vectors of K and K t are identical, we see that all
entries of K are identical. By the condition (8.4.13), we deduce (8.4.24).
310
8.4.5 It is clear that all the entries of A and At are non-negative. It remains
to show that 1 and u = (1, . . . , 1)t Rn are a pair of eigenvalues and
eigenvectors of both A and At . In fact, applying Ai u = u and Ati u = u
(i = 1, . . . , k) consecutively, we obtain Au = A1 Ak u = u and
At u = Atk At1 u = u, respectively.
Section 9.3
9.3.2 (i) In the uniform state (9.3.30), we have
1
i ,
n
n
A =
1 2
i .
n
n
A2 =
i=1
i=1
i=1
i=1
Bibliographic notes
We end the book by mentioning a few important but more specialized subjects
that are not touched in this book. We point out only some relevant references
for the interested.
Convex sets. In Lang [23] basic properties and characterizations of convex
sets in Rn are presented. For a deeper study of convex sets using advanced
tools such as the HahnBanach theorem see Lax [25].
Tensor products and alternating forms. These topics are covered elegantly
by Halmos [18]. In particular, there, the determinant is seen to arise as
the unique scalar, associated with each linear mapping, defined by the onedimensional space of top-degree alternating forms, over a finite-dimensional
vector space.
Minmax principle for computing the eigenvalues of self-adjoint mappings.
This is a classical variational resolution of the eigenvalue problem known
as the method of the RayleighRitz quotients. For a thorough treatment see
Bellman [6], Lancaster and Tismenetsky [22], and Lax [25].
Calculus of matrix-valued functions. These techniques are useful and powerful in applications. For an introduction see Lax [25].
Irreducible matrices. Such a notion is crucial for extending the Perron
Frobenius theorem and for exploring the Markov matrices further under more
relaxed conditions. See Berman and Plemmons [7], Horn and Johnson [21],
Lancaster and Tismenetsky [22], Meyer [29], and Xu [38] for related studies.
Transformation groups and bilinear forms. Given a non-degenerate bilinear form over a finite-dimensional vector space, the set of all linear mappings
on the space which preserve the bilinear form is a group under the operation
of composition. With a specific choice of the bilinear form, a particular such
transformation group may thus be constructed and investigated. For a concise
introduction to this subject in the context of linear algebra see Hoffman and
Kunze [19].
311
312
Bibliographic notes
References
313
314
References
Index
1-form, 16
characteristic roots, 107
addition, 3
adjoint mapping, 50, 122
adjoint matrix, 103
adjugate matrix, 103
adjunct matrix, 103
algebraic multiplicity, 216
algebraic number, 13
angular frequencies, 255
annihilating polynomials, 221
annihilator, 19
anti-Hermitian mappings, 195
anti-Hermitian matrices, 7
anti-lower triangular matrix, 99
anti-self-adjoint, 126
anti-self-adjoint mappings, 195
anti-self-dual, 126
anti-symmetric, 4
anti-symmetric forms, 230
anti-upper triangular matrix, 99
basis, 13
basis change matrix, 15
basis orthogonalization, 117
basis transition matrix, 15, 45
Bessel inequality, 140
bilinear form, 147, 180
bilinear forms, 147
boxed diagonal matrix, 57
boxed lower triangular form, 99
boxed upper triangular form, 98
boxed upper triangular matrix, 56
bracket, 248
315
316
Index
diagonal matrix, 6
diagonalizable, 222
diagonally dominant condition, 101
dimension, 13
direct product of vector spaces, 22
direct sum, 21
direct sum of mappings, 58
dominant, 237
dot product, 5
doubly Markov matrices, 247
dual basis, 17
dual mapping, 122
dual space, 16
eigenmodes, 255
eigenspace, 57
eigenvalue, 57
eigenvector, 57
Einstein formula, 255
entry, 4
equivalence of norms, 30
equivalence relation, 48
equivalent polynomials, 207
Euclidean scalar product, 124, 127
ideal, 205
idempotent linear mappings, 60
identity matrix, 6
image, 38
indefinite, 161
index, 83
index of negativity, 119
index of nilpotence, 62
index of nullity, 119
index of positivity, 119
infinite dimensional, 13
injective, 38
invariant subspace, 55
inverse mapping, 42
invertible linear mapping, 41
invertible matrix, 7
irreducible linear mapping, 56
irreducible polynomial, 207
isometric, 142
isometry, 142
isomorphic, 42
isomorphism, 42
isotropic subspace, 235
Index
linear transformation, 55
linearly dependent, 8, 9
linearly independent, 10
linearly spanned, 8
locally nilpotent mappings, 62
lower triangular matrix, 6
mapping addition, 35
Markov matrices, 243
matrix, 4
matrix exponential, 76
matrix multiplication, 5
matrix-valued initial value problem, 76
maximum uncertainty, 261
maximum uncertainty state, 261
Measurement postulate, 252
metric matrix, 136
minimal polynomial, 67, 221
Minkowski metric, 120
Minkowski scalar product, 120
Minkowski theorem, 101
minor, 89
mutually complementary, 22
negative definite, 161
negative semi-definite, 161
nilpotent mappings, 62
non-definite, 161
non-degenerate, 120
non-negative, 161
non-negative matrices, 237
non-negative vectors, 237
non-positive, 161
nonsingular matrix, 7
norm, 28
norm that is stronger, 29
normal equation, 176
normal mappings, 172, 194, 195
normal matrices, 197
normed space, 28
null vector, 115
null-space, 38
nullity, 39
nullity-rank equation, 40
observable postulate, 252
observables, 252
one-parameter group, 74
one-to-one, 38
onto, 38
open, 73
orthogonal, 115
orthogonal mapping, 123, 132
317
orthogonal matrices, 7
orthogonal matrix, 134
orthonormal basis, 119, 130
Parseval identity, 140
Pauli matrices, 257
period, 62
permissible column operations, 95
permissible row operations, 92
perpendicular, 115
PerronFrobenius theorem, 237
photoelectric effect, 255
Planck constant, 254
polar decomposition, 198
polarization identities, 149
polarization identity, 132
positive definite Hermitian mapping, 188
positive definite Hermitian matrix, 188
positive definite quadratic form, 188
positive definite quadratic forms, 158
positive definite scalar product over a complex
vector space, 128
positive definite scalar product over a real
vector space, 128
positive definite self-adjoint mapping, 188
positive definite self-adjoint mappings, 158
positive definite symmetric matrices, 158
positive diagonally dominant condition, 101
positive matrices, 237
positive semi-definite, 161
positive vectors, 237
preimages, 38
prime polynomial, 207
principal minors, 166
product of matrices, 5
projection, 60
proper subspace, 8
Pythagoras theorem, 129
QR factorization, 134
quadratic form, 148, 181
quotient space, 26
random variable, 252
range, 38
rank, 39
rank equation, 40
rank of a matrix, 52
reducibility, 56
reducible linear mapping, 56
reflective, 19
reflectivity, 19
regular Markov matrices, 243
318
Index